Turn GHC into a frontend for miniSTG
Yuras Shumovich
TLDR: Here is a patch for GHC to dump STG in a form compatible with miniSTG.
MiniSTG
The Spineless Tagless G-machine (STG) is an abstract machine designed to run lazy functional languages. An abstract code for it, an STG language, is a minimal lazy functional language, GHC uses it as an intermediate representation for haskell code.
MiniSTG is an interpreter for STG language. It is a bit limited, only integral literals are supported, and very few primops are provided. But it is very useful if you want to study how haskell code is executed by GHC. The most interesting its feature is execution tracing.
To show few examples, lets define boxed integers:
zero = CON(Int 0);
one = CON(Int 1);
plus = FUN(x y -> case x of {
Int i -> case y of {
Int j -> case plus# i j of {
k -> let {
r = CON(Int k);
} in r;
}
}
});
main = THUNK(plus one one);
Here Int
is a constructor for boxed integers (you don’t have to define constructors upfront), zero
and one
define boxed literals. The next declaration is more interesting, plus
is a function that takes two arguments. They are expected to be boxed integers, and case
is used to unboxed them. Then built-in primop plus#
performs the real work to add two unboxed integers. Finally the result is reboxed and returned. The last line defines an entry point for the program. The output:
$ ministg --noprelude -s EA main.stg
(Int 2)
Now lets define a list:
nil = CON(Nil);
con = FUN(x xs -> let {
r = CON(List x xs)
} in r
);
length = FUN(l -> case l of {
Nil -> zero;
List x xs -> let {
n = THUNK(length xs);
} in plus one n;
});
Here nil
is an empty list, con
prepends a value to a list, and length
calculates length of a list using the plus
function we already defined. Lets test it:
list1 = THUNK(con zero nil);
list2 = THUNK(con zero list1);
list3 = THUNK(con zero list2);
list4 = THUNK(con zero list3);
main = THUNK(length list4);
Output:
$ ministg --noprelude -s EA main.stg
(Int 4)
I hope you got the idea. There are few restrictions: the only way to allocate anything on heap is to use CON
, FUN
or THUNK
inside let
or on top level, so the next code is invalid because it tries to allocate a list cell outside of let
(compare with the definition above):
con = FUN(x xs ->
CON(List x xs)
);
Also all function arguments should be allocated on heap, you can’t pass a complex expression to a function. So the next code is invalid:
length = FUN(l -> case l of {
Nil -> zero;
List x xs -> plus one (length xs);
});
MiniSTG doesn’t support pattern matching on literals, so you have to use eq#
and intToBool#
primops to compare boxed integers:
eq = FUN(x y -> case x of {
Int i -> case y of {
Int j -> case eq# i j of {
k -> intToBool# k
};
};
});
main = THUNK(eq one one);
Output:
$ ministg --noprelude -s EA main.stg
True
GHC
Writing in STG is fun, but how real haskell code looks when translated into STG? GHC has a -ddump-stg
flag to dump it, but unfortunately it looks far from what miniSTG accepts:
f_rn0 :: GHC.Types.Int -> GHC.Types.Int
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
sat-only \r [ds_s1Mn]
case ds_s1Mn of wild_s1Mo {
GHC.Types.I# ds1_s1Mp [Occ=Once!] ->
case ds1_s1Mp of _ [Occ=Dead] {
__DEFAULT ->
let {
sat_s1Ms [Occ=Once] :: GHC.Types.Int
[LclId, Str=DmdType] =
\u []
let {
sat_s1Mr [Occ=Once] :: GHC.Types.Int
[LclId, Str=DmdType] =
\u [] GHC.Enum.pred GHC.Enum.$fEnumInt wild_s1Mo;
} in f_rn0 sat_s1Mr;
} in GHC.Num.* GHC.Num.$fNumInt wild_s1Mo sat_s1Ms;
0# -> GHC.Types.I# [1#];
};
};
Even if I remove all irrelevant information, you will probably not recognize factorial function, but at least now it looks similar to miniSTG syntax:
f_rn0 = [ds_s1Mn]
case ds_s1Mn of wild_s1Mo {
GHC.Types.I# ds1_s1Mp ->
case ds1_s1Mp of _ {
__DEFAULT ->
let {
sat_s1Ms = []
let {
sat_s1Mr = []
GHC.Enum.pred GHC.Enum.$fEnumInt wild_s1Mo;
} in f_rn0 sat_s1Mr;
} in GHC.Num.* GHC.Num.$fNumInt wild_s1Mo sat_s1Ms;
0# -> GHC.Types.I# [1#];
};
};
The biggest difference, except pattern matching on unboxed literals, is case
expression:
case a of b {
__DEFAULT -> ...
...
}
Here b
is an alias for the result of a
(note that a
could be an expression), it corresponds to “as patter” in haskell. In miniSTG it can be represented as two nested cases:
case a of {
b -> case b of {
...
}
}
Otherwise the conversion is trivial. The patch mentioned at the beginning tries to automatically convert GHC syntax to miniSTG one.
Using GHC and miniSTG together
First we need custom prelude that defines all basic declarations (Prelude.hs
):
{-# LANGUAGE PackageImports #-}
module Prelude
( Int
, zero
, one
, plus
, sub
, mul
, eqInt
, Bool (..)
)
where
import "base" Prelude (Bool (..))
data Int = Int
{-# NOINLINE zero #-}
zero :: Int
zero = Int
{-# NOINLINE one #-}
one :: Int
one = Int
{-# NOINLINE plus #-}
plus :: Int -> Int -> Int
plus _ _ = Int
{-# NOINLINE sub #-}
sub :: Int -> Int -> Int
sub _ _ = Int
{-# NOINLINE mul #-}
mul :: Int -> Int -> Int
mul _ _ = Int
{-# NOINLINE eqInt #-}
eqInt :: Int -> Int -> Bool
eqInt _ _ = False
Note that implementation for the declarations is not important, we are going to implement them directly in miniSTG anyway. Now a test module (Test.hs
):
module Test
(test
)
where
pred :: Int -> Int
pred n = n `sub` one
f :: Int -> Int
f n | n `eqInt` zero
= one
| True
= n `mul` f (pred n)
two = plus one one
five = two `plus` two `plus` one
test :: Int
test = f five
If you compile this module with -ddump-ministg
, you get the next:
f = FUN(n_sJm -> case eqIntz1Prelude n_sJm z0eroz1Prelude of {
wild_sJn -> case wild_sJn of {
Falsez1GHCz1Types -> let {
sat_sJp = THUNK(let {
sat_sJo = THUNK(subz1Prelude n_sJm onez1Prelude)
} in
f sat_sJo)
} in
mulz1Prelude n_sJm sat_sJp;
Truez1GHCz1Types -> let {
res_var_ = THUNK(onez1Prelude)
} in
res_var_;
};
});
two = THUNK(plusz1Prelude onez1Prelude onez1Prelude);
sat_sJr = THUNK(let {
sat_sJq = THUNK(plusz1Prelude two two)
} in
plusz1Prelude sat_sJq onez1Prelude);
testz1Test = THUNK(f sat_sJr);
Names are encoded, e.g. Falsez1GHCz1Types
is a False
constructor from GHC.Types
module. Lets add an entry point:
main = THUNK(testz1Test);
and run it:
$ ministg --noprelude -s EA test.stg
ministg: undefined variable: "eqIntz1Prelude"
Oops, we need to define prelude. Create Prelude.stg
with the next content:
z0eroz1Prelude = CON(Int 0);
onez1Prelude = CON(Int 1);
plusz1Prelude = FUN(x y -> case x of {
Int i -> case y of {
Int j -> case plus# i j of {
k -> let {
r = CON(Int k)
} in r;
};
};
});
subz1Prelude = FUN(x y -> case x of {
Int i -> case y of {
Int j -> case sub# i j of {
k -> let {
r = CON(Int k)
} in r;
};
};
});
mulz1Prelude = FUN(x y -> case x of {
Int i -> case y of {
Int j -> case mult# i j of {
k -> let {
r = CON(Int k)
} in r;
};
};
});
eqIntz1Prelude = FUN(x y -> case x of {
Int i -> case y of {
Int j -> case eq# i j of {
k -> case intToBool# k of {
True -> true;
False -> false;
};
};
};
});
true = CON(Truez1GHCz1Types);
false = CON(Falsez1GHCz1Types);
Note that here we converted built-in True
and False
to the corresponding constructors from GHC.Types
. Now we can run the program (note that we removed --no-prelude
flag):
$ ministg -s EA test.stg
(Int 120)
Yay! We have working frontend for miniSTG! Note that nothing stops us from using advanced haskell features like type classes because they are compiled out to simple constructs. For example, lets write the same factorial function using Eq
type class:
class Eq a where
eq :: a -> a -> Bool
instance Eq Int where
eq = eqInt
f :: Int -> Int
f n | n `eq` zero
= one
| True
= n `mul` f (pred n)
If you don’t yet know how type classes work in haskell, then you have a good chance to figure it out from STG dump!