Turn GHC into a frontend for miniSTG

March 15, 2016
Yuras Shumovich

TLDR: Here is a patch for GHC to dump STG in a form compatible with miniSTG.

MiniSTG

The Spineless Tagless G-machine (STG) is an abstract machine designed to run lazy functional languages. An abstract code for it, an STG language, is a minimal lazy functional language, GHC uses it as an intermediate representation for haskell code.

MiniSTG is an interpreter for STG language. It is a bit limited, only integral literals are supported, and very few primops are provided. But it is very useful if you want to study how haskell code is executed by GHC. The most interesting its feature is execution tracing.

To show few examples, lets define boxed integers:

zero = CON(Int 0);

one = CON(Int 1);

plus = FUN(x y -> case x of {
	Int i -> case y of {
		Int j -> case plus# i j of {
			k -> let {
				r = CON(Int k);
				} in r;
		}
	}
});

main = THUNK(plus one one);

Here Int is a constructor for boxed integers (you don’t have to define constructors upfront), zero and one define boxed literals. The next declaration is more interesting, plus is a function that takes two arguments. They are expected to be boxed integers, and case is used to unboxed them. Then built-in primop plus# performs the real work to add two unboxed integers. Finally the result is reboxed and returned. The last line defines an entry point for the program. The output:

$ ministg --noprelude -s EA main.stg
(Int 2)

Now lets define a list:

nil = CON(Nil);

con = FUN(x xs -> let {
	r = CON(List x xs)
	} in r
);

length = FUN(l -> case l of {
	Nil -> zero;
	List x xs -> let {
		n = THUNK(length xs);
		} in plus one n;
});

Here nil is an empty list, con prepends a value to a list, and length calculates length of a list using the plus function we already defined. Lets test it:

list1 = THUNK(con zero nil);
list2 = THUNK(con zero list1);
list3 = THUNK(con zero list2);
list4 = THUNK(con zero list3);

main = THUNK(length list4);

Output:

$ ministg --noprelude -s EA main.stg
(Int 4)

I hope you got the idea. There are few restrictions: the only way to allocate anything on heap is to use CON, FUN or THUNK inside let or on top level, so the next code is invalid because it tries to allocate a list cell outside of let (compare with the definition above):

con = FUN(x xs ->
	CON(List x xs)
);

Also all function arguments should be allocated on heap, you can’t pass a complex expression to a function. So the next code is invalid:

length = FUN(l -> case l of {
	Nil -> zero;
	List x xs -> plus one (length xs);
});

MiniSTG doesn’t support pattern matching on literals, so you have to use eq# and intToBool# primops to compare boxed integers:

eq = FUN(x y -> case x of {
	Int i -> case y of {
		Int j -> case eq# i j of {
			k -> intToBool# k
		};
	};
});

main = THUNK(eq one one);

Output:

$ ministg --noprelude -s EA main.stg
True

GHC

Writing in STG is fun, but how real haskell code looks when translated into STG? GHC has a -ddump-stg flag to dump it, but unfortunately it looks far from what miniSTG accepts:

f_rn0 :: GHC.Types.Int -> GHC.Types.Int
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
    sat-only \r [ds_s1Mn]
        case ds_s1Mn of wild_s1Mo {
          GHC.Types.I# ds1_s1Mp [Occ=Once!] ->
              case ds1_s1Mp of _ [Occ=Dead] {
                __DEFAULT ->
                    let {
                      sat_s1Ms [Occ=Once] :: GHC.Types.Int
                      [LclId, Str=DmdType] =
                          \u []
                              let {
                                sat_s1Mr [Occ=Once] :: GHC.Types.Int
                                [LclId, Str=DmdType] =
                                    \u [] GHC.Enum.pred GHC.Enum.$fEnumInt wild_s1Mo;
                              } in  f_rn0 sat_s1Mr;
                    } in  GHC.Num.* GHC.Num.$fNumInt wild_s1Mo sat_s1Ms;
                0# -> GHC.Types.I# [1#];
              };
        };

Even if I remove all irrelevant information, you will probably not recognize factorial function, but at least now it looks similar to miniSTG syntax:

f_rn0 = [ds_s1Mn]
        case ds_s1Mn of wild_s1Mo {
          GHC.Types.I# ds1_s1Mp ->
              case ds1_s1Mp of _ {
                __DEFAULT ->
                    let {
                      sat_s1Ms = []
                              let {
                                sat_s1Mr = []
                                    GHC.Enum.pred GHC.Enum.$fEnumInt wild_s1Mo;
                              } in  f_rn0 sat_s1Mr;
                    } in  GHC.Num.* GHC.Num.$fNumInt wild_s1Mo sat_s1Ms;
                0# -> GHC.Types.I# [1#];
              };
        };

The biggest difference, except pattern matching on unboxed literals, is case expression:

case a of b {
	__DEFAULT -> ...
	...
}

Here b is an alias for the result of a (note that a could be an expression), it corresponds to “as patter” in haskell. In miniSTG it can be represented as two nested cases:

case a of {
	b -> case b of {
		...
	}
}

Otherwise the conversion is trivial. The patch mentioned at the beginning tries to automatically convert GHC syntax to miniSTG one.

Using GHC and miniSTG together

First we need custom prelude that defines all basic declarations (Prelude.hs):

{-# LANGUAGE PackageImports #-}

module Prelude
( Int
, zero
, one
, plus
, sub
, mul
, eqInt
, Bool (..)
)
where

import "base" Prelude (Bool (..))

data Int = Int

{-# NOINLINE zero #-}
zero :: Int
zero = Int

{-# NOINLINE one #-}
one :: Int
one = Int

{-# NOINLINE plus #-}
plus :: Int -> Int -> Int
plus _ _ = Int

{-# NOINLINE sub #-}
sub :: Int -> Int -> Int
sub _ _ = Int

{-# NOINLINE mul #-}
mul :: Int -> Int -> Int
mul _ _ = Int

{-# NOINLINE eqInt #-}
eqInt :: Int -> Int -> Bool
eqInt _ _ = False

Note that implementation for the declarations is not important, we are going to implement them directly in miniSTG anyway. Now a test module (Test.hs):

module Test
(test
)
where

pred :: Int -> Int
pred n = n `sub` one

f :: Int -> Int
f n | n `eqInt` zero
    = one
    | True
    = n `mul` f (pred n)

two = plus one one
five = two `plus` two `plus` one

test :: Int
test = f five

If you compile this module with -ddump-ministg, you get the next:

f = FUN(n_sJm -> case eqIntz1Prelude n_sJm z0eroz1Prelude of {
	 wild_sJn -> case wild_sJn of {
	     Falsez1GHCz1Types -> let {
		  sat_sJp = THUNK(let {
			  sat_sJo = THUNK(subz1Prelude n_sJm onez1Prelude)
			  } in
			  f sat_sJo)
		  } in
		  mulz1Prelude n_sJm sat_sJp;
	     Truez1GHCz1Types -> let {
		 res_var_ = THUNK(onez1Prelude)
		 } in
		 res_var_;
	     };
	 });

two = THUNK(plusz1Prelude onez1Prelude onez1Prelude);

sat_sJr = THUNK(let {
                sat_sJq = THUNK(plusz1Prelude two two)
                } in
                plusz1Prelude sat_sJq onez1Prelude);

testz1Test = THUNK(f sat_sJr);

Names are encoded, e.g. Falsez1GHCz1Types is a False constructor from GHC.Types module. Lets add an entry point:

main = THUNK(testz1Test);

and run it:

$ ministg --noprelude -s EA test.stg
ministg: undefined variable: "eqIntz1Prelude"

Oops, we need to define prelude. Create Prelude.stg with the next content:

z0eroz1Prelude = CON(Int 0);
onez1Prelude = CON(Int 1);

plusz1Prelude = FUN(x y -> case x of {
	Int i -> case y of {
		Int j -> case plus# i j of {
			k -> let {
				r = CON(Int k)
				} in r;
		};
	};
});

subz1Prelude = FUN(x y -> case x of {
	Int i -> case y of {
		Int j -> case sub# i j of {
			k -> let {
				r = CON(Int k)
				} in r;
		};
	};
});

mulz1Prelude = FUN(x y -> case x of {
	Int i -> case y of {
		Int j -> case mult# i j of {
			k -> let {
				r = CON(Int k)
				} in r;
		};
	};
});

eqIntz1Prelude = FUN(x y -> case x of {
	Int i -> case y of {
		Int j -> case eq# i j of {
			k -> case intToBool# k of {
				True -> true;
				False -> false;
			};
		};
	};
});

true = CON(Truez1GHCz1Types);
false = CON(Falsez1GHCz1Types);

Note that here we converted built-in True and False to the corresponding constructors from GHC.Types. Now we can run the program (note that we removed --no-prelude flag):

$ ministg -s EA test.stg
(Int 120)

Yay! We have working frontend for miniSTG! Note that nothing stops us from using advanced haskell features like type classes because they are compiled out to simple constructs. For example, lets write the same factorial function using Eq type class:

class Eq a where
  eq :: a -> a -> Bool

instance Eq Int where
  eq = eqInt

f :: Int -> Int
f n | n `eq` zero
    = one
    | True
    = n `mul` f (pred n)

If you don’t yet know how type classes work in haskell, then you have a good chance to figure it out from STG dump!

More posts

Atom feed

Atom feed