Embedding a DSL in Julia

— Gavin Gray

When cre­at­ing a domain-spe­cific lan­guage the nat­ural intu­ition is (or should be) to reach for Racket. How­ever, today I want to explore a lesser known func­tion­al­ity of Julia, macros! Julia has a lisp legacy and the longest stand­ing evi­dence of this is its metapro­gram­ming capa­bil­i­ties. Sim­i­lar to Lisp, Julia macros work on the AST itself rather than a tex­tual pre­pro­cess­ing step like that of C/C++. This aspect of Julia makes it a good choice for metapro­gram­ming, and by embed­ding a domain-spe­cific lan­guage into Julia, we can take advan­tage of the Julia run­time and its per­for­mance. Curly is a very sim­ple Lisp-like lan­guage and is the DSL that we will be embed­ding into Julia. I was first intro­duced to Curly at the Uni­ver­sity of Utah à la Matthew Flatt’s Pro­gram­ming Lan­guage course. The name of Curly comes from its syn­tax, so instead of writ­ing expres­sions such as (+ 1 2), as you would in Lisp, you write it with curly braces like so {+ 1 2}. In order to make this sim­ple, we will keep Curly func­tion­al­ity to a min­i­mum. The gram­mar for Curly is as fol­lows:

<symbol> ::= Symbol

<expr> ::= Int64
          | <symbol>
          | { + <expr> <expr> }
          | { - <expr> <expr> }
          | { if0 <expr> <expr> <expr> }
          | { let  { <symbol> <expr> } <expr> }
          | { lambda { <symbol> } <expr> }
          | { <expr> <expr> }

With this, we have just enough to make this inter­est­ing but not so much that our imple­men­ta­tion gets out of hand! The only oper­a­tors that Curly sup­ports are addi­tion and sub­trac­tion but the goal for our Curly imple­men­ta­tion is that we will even­tu­ally be able to define a recur­sive mul­ti­pli­ca­tion func­tion like the fol­low­ing:

curly"""
    {let {Y {lambda {f}
              {{lambda {x}
                {x x}}
              {lambda {x}
              {f {lambda {y}
                    {{x x} y}}}}}}}
      {let {mult {Y {lambda {f}
                  {lambda {x}
                    {lambda {y}
                      {if0 y
                        0
                        {+ x {{f x } {- y 1}}}}}}}}}
        {{mult 7} 6}}}
"""

If the above code is ter­ri­fy­ing to you, con­sider read­ing about Fixed-point Com­bi­na­tors. How­ever, under­stand­ing the Y com­bi­na­tor is unim­por­tant for the rest of this arti­cle, so please, read on. As demon­strated above, we want to embed our Curly code in non-stan­dard string lit­er­als that will be parsed at com­pile time and eval­u­ated at run­time. To keep this sim­ple, parse errors will not be han­dled so the omi­nous can­not parse excep­tion is all the feed­back Curly progam­mers will get.

Start with a new mod­ule curly.

module curly

export @curly_str, run_tests

# TODO data representation

# TODO parser

# TODO transformer

# TODO test suite

macro curly_str(source)
    # TODO
end

function run_tests()
  # TODO write tests
end

end # end curly module

The macro curly_str is how you intro­duce a non-stan­dard string lit­eral into Julia and the argu­ment source is the string con­tents that we will get to play with. I’ve also included a few TODO items above to out­line what will be accom­plished in the com­ing sec­tions. With such a sim­ple design, cre­at­ing the nec­es­sary struc­tures for Curly code will be very sim­ple. Each branch of the above gram­mar will get its own struct and each of these structs will be a sub­type of the over­ar­ch­ing abstract Node type. Let’s start with the sim­plest:

abstract type Node end

struct CInt <: Node
    val
end

struct CSymb <: Node
    sym
end

The remain­ing structs are unsur­pris­ingly equally as triv­ial. Sim­ply put, for each non-ter­mi­nal in the gram­mar, we add one more field.

struct Plus <: Node
    lhs
    rhs
end

struct Sub <: Node
    lhs
    rhs
end

struct IfZero <: Node
    cnd
    true_e
    false_e
end

struct Let <: Node
    arg
    val
    bdy
end

struct Lambda <: Node
    param
    bdy
end

struct App <: Node
    f
    arg
end

The syn­tax for Curly is, unfor­tu­nately, not Julia syn­tax. There­fore, we will need to write our own parser. For a more inten­sive project, like a com­piler, it would be best to use an exter­nal parser gen­er­a­tor like ANTLR. For the pur­poses of Curly, it’s best to write a quick and dirty parser using the Parser­Com­bi­na­tor pack­age. This pack­age is sim­i­lar to Haskell’s Par­sec library and will allow us to get a parser up and run­ning in no time.

First, users should be allowed to include white­space as they wish. I.e. { + 1     2   } should parse the same as {+ 1 2}. The parser should also ignore new­line char­ac­ters. Parser com­bi­na­tors work by cre­at­ing a series ofmatch­ers” then com­bin­ing them together to build parsers for more com­plex forms. To remove white­space, we start by con­sum­ing the lead­ing white­space for each matcher. (Note, we still have to explicitely con­sume trail­ing white­space).

using ParserCombinator

ws = Drop(Star(Space()))

@with_pre ws begin
    # ... TODO, matchers ...
end

This code sim­ply drops prepended white­space before each matcher.

To cre­ate our Curly AST from the parser com­bi­na­tor match­ers, we will first match the pat­tern and then use the > syn­tac­tic sugar to pass the matched val­ues to an anony­mous func­tion which will con­struct the AST node. Again start­ing with inte­gers and sym­bols, we will restrict inte­gers to be just pos­i­tive, i.e. a sim­ple non-empty string of dig­its.

curly_int = p"\d+" > (x -> CInt(parse(Int64, x)))

To elab­o­rate a lit­tle more on match­ers with the above as an exam­ple: the p" ... " syn­tax forms a matcher out of a reg­u­lar expres­sion and passes the matched string to the lambda, which, in this case, parses the inte­ger and then calls the CInt con­struc­tor. While in tra­di­tional Lisp you can use some non-tra­di­tional char­ac­ters within a sym­bol, in Curly we will restrict sym­bols to begin with an alpha­bet­i­cal char­ac­ter, fol­lowed by any num­ber of char­ac­ters or under­scores.

sym = p"[a-zA-Z]+[a-zA-Z_]*" > CSymb

That was inte­gers and sym­bols. Easy. Now let’s move on to expres­sions.

To avoid a lit­tle bit of ver­bosity (empha­sis on the lit­tle), we’ll cre­ate two types of match­ers, a cexpr and an expr. A cexpr will be the outer expres­sion that is of the form { ... } | <int> | <sym>, and the expr is some­thing that is inside of curly braces (e.g. if0, let, lambda, and the default case if func­tion appli­ca­tion).

expr = Delayed()

cexpr = Delayed()

group = E"{" + expr + ws + E"}"

plus = E"+" + cexpr + cexpr > Plus

sub = E"-" +  cexpr + cexpr > Sub

ifzero = E"if0" + cexpr + cexpr +  cexpr > IfZero

lete = E"let" + ws + E"{" + sym + cexpr + ws + E"}" + cexpr > Let

lambda = E"lambda" + ws + E"{" + sym + ws + E"}" + cexpr > Lambda

app = cexpr + cexpr > App

expr.matcher = plus | sub | ifzero | lete | lambda | app

cexpr.matcher = (group | curly_int | sym) + ws

prog = cexpr + ws + Eos()

A few things to note from above.

Other than a few pecu­liar­i­ties of the Parser Com­bi­na­tor pack­age, pars­ing a Curly pro­gram should be straight­for­ward. Test­ing that a Curly pro­gram parses to the right AST isn’t so eas­ily done with tests and we will reserve the test­ing suite for full Curly pro­gram tests. In the mean­time, we can dump the Curly AST in the Julia REPL and inspect a few exam­ples by hand. We need to add the pars­ing logic into the curly_str macro and then we will use the show macro to view the parsed AST.

macro curly_str(source)
    # Parse the source as one Curly program
    curly_expr = parse_one(source, prog)[1]
    # view the Curly AST
    @show curly_expr
end

Now, load the curly mod­ule into the REPL:

(@v1.6) pkg> dev curly
[ Info: Resolving package identifier `curly` as a directory at `~/codefiles/projects/curly/julia/curly`.
  Resolving package versions...
  No Changes to `~/.julia/environments/v1.6/Project.toml`
  No Changes to `~/.julia/environments/v1.6/Manifest.toml`

julia> import curly
[ Info: Precompiling curly [7486cc49-d4d6-458f-8e1b-b99820654d8b]

julia> using curly

Let’s start with some sim­ple exam­ples:

julia> curly"6"
curly_expr = curly.CInt(6)

julia> curly"{+ 1 {- 0 1}}"
curly_expr = curly.Plus(curly.CInt(1), curly.Sub(curly.CInt(0), curly.CInt(1)))

julia> curly"""
      {let {f {lambda {x} {+ 1 x}}}
          {- {f 6} 1}}
      """
curly_expr = curly.Let(curly.CSymb("f"), curly.Lambda(curly.CSymb("x"), curly.Plus(curly.CInt(1), curly.CSymb("x"))), curly.Sub(curly.App(curly.CSymb("f"), curly.CInt(6)), curly.CInt(1)))

julia> curly"""
          {
          let
          {
          x
          1
          }
          {
          let
          {
          f
          {
          lambda
          {
          x
          }
          {
          -
          x
          1
          }
          }
          }
          {
          +
          x
          {
          f
          x
          }
          }
          }
          }
          """
curly_expr = curly.Let(curly.CSymb("x"), curly.CInt(1), curly.Let(curly.CSymb("f"), curly.Lambda(curly.CSymb("x"), curly.Sub(curly.CSymb("x"), curly.CInt(1))), curly.Plus(curly.CSymb("x"), curly.App(curly.CSymb("f"), curly.CSymb("x")))))

The last test was a fun way to test arbi­trary white­space; we will also add more tests later. Let’s move on so we can even­tu­ally build the whole test­ing suite.

Now we have a Curly AST to play with. There are two direc­tions one could take the imple­men­ta­tion:

  1. eval­u­ate the AST to a value.
  2. trans­form the AST to a Julia expres­sion.

As indi­cated by the sec­tion header, we are going to trans­form the Curly AST into a Julia expres­sion. If instead we eval­u­ated the Curly AST, our Curly pro­grams would eval­u­ate at com­pile time, which as was stated ear­lier, we don’t want. So how will this work? Each trans­form func­tion will need to return an expres­sion rather than a value. We can then piece these expres­sions together to form a full Julia expres­sion that will have the same seman­tics as the Curly source. For the trans­for­ma­tion, we are going to assume that the pro­gram is type safe, and in the case it isn’t, we will rely on the Julia com­piler to raise the excep­tion. Start­ing with the easy cases: when we call trans­form on a CInt we want to return the inte­ger expres­sion. Lucky for us, the inte­ger value is the inte­ger expres­sion. If you’re unsure about this, we can do a sim­ple test in the Julia REPL: Meta.parse("3") == Int(3) which yields true.

transform(n::CInt) = n.val

Now for sym­bols. We want to return a Julia Sym­bol cre­ated from the value that the CSymb.sym holds, that is, from a string. This is eas­ily done by pass­ing the string to the sym­bol con­struc­tor.

transform(s::CSymb) = Symbol(s.sym)

The addi­tion and sub­trac­tion AST nodes will show us how to trans­form the remain­ing nodes. For exam­ple, the Plus node has two fields, lhs and rhs, which are cur­rently Curly AST nodes. We first need to trans­form the fields into Julia expres­sions by call­ing the trans­form func­tion on the lhs and rhs and then insert­ing them into an addi­tion expres­sion of the form :( ... + ...). This may look a lit­tle funky at first; the :( ... ) syn­tax is short­hand for cre­at­ing an expres­sion in Julia. This is sim­i­lar to quasi-quot­ing in Lisp and we can use the $ sym­bol to unquote (or inter­po­late as it’s called in Julia) expres­sions that we’ve eval­u­ated. You see this work­ing below.

function transform(p::Plus)
    l = transform(p.lhs)
    r = transform(p.rhs)
    return :($l + $r)
end

function transform(s::Sub)
    l = transform(s.lhs)
    r = transform(s.rhs)
    return :($l - $r)
end

To reit­er­ate, the dol­lar sign is inter­po­lat­ing the trans­formed left and right hand expres­sions into the new addi­tion (or sub­trac­tion) expres­sion. With this tem­plate, the IfZero node is also straight­for­ward because we are sim­ply inter­po­lat­ing three expres­sions into a Julia ternary expres­sion.

function transform(iz::IfZero)
    cnd = transform(iz.cnd)
    te = transform(iz.true_e)
    fe = transform(iz.false_e)
    return :(($cnd == 0) ? $te : $fe)
end

For now, let’s skip the Let node and move on to the Lambda and App cases which shouldn’t cause any trou­ble. For the Lambda case, we have two fields: the para­me­ter and the body. The para­me­ter was restricted by the parser to be a sym­bol and all we want to do is make sure that this sym­bol is bound in the lambda body expres­sion. That is, the con­text that the lambda body uses has access to this bound sym­bol. To do this, we can use Julia’s syn­tax for anony­mous func­tions to give us this desired func­tion­al­ity.

function transform(lam::Lambda)
    sy = transform(lam.param)
    bdy_ex = transform(lam.bdy)
    return :(($sy -> $bdy_ex))
end

In the App case, we first trans­form the func­tion f and then apply this to the trans­formed argu­ment expres­sion.

function transform(a::App)
    g = transform(a.f)
    p = transform(a.arg)
    return :($g($p))
end

Almost there! Onward to the Let case which we skipped ear­lier. At its core, a let expres­sion is bind­ing an expres­sion to a sym­bol and then eval­u­at­ing a body expres­sion that can use that sym­bol. This is very sim­i­lar to the lambda and appli­ca­tion rules that we dis­cussed above. Turns out, the let form is sim­ply syn­tac­tic sugar for apply­ing a func­tion to an argu­ment. This syn­tax rewrite looks like this:

$$(\text{let sym = arg in bdy}) \iff (\lambda sym. bdy)(arg)$$

With this knowl­edge, we can now fin­ish the Let trans­form func­tion.

function transform(l::Let)
    sym = transform(l.arg)
    arg = transform(l.val)
    bdy = transform(l.bdy)
    return :(($sym -> $bdy)($arg))
end

Sim­ple! Now each of the Curly AST nodes have been mapped to a Julia expres­sion. Before we jump to test­ing, we need to update the curly_str macro to return the trans­formed Julia expres­sion rather than show the Curly AST.

"Curly program string"
macro curly_str(source)
    # parse one program (i.e. cexpr)
    # and then take the first result
    # as there should only be one :)
    curly_expr = parse_one(source, prog)[1]
    return transform(curly_expr)
end

Reload­ing the mod­ule in your REPL and run­ning a small exam­ple should now work!

julia> curly"{let {f {lambda {x} x}} {+ 1 {f 2}}}"
3

To make writ­ing tests sim­ple, it’s best to write a lit­tle macro which will eval­u­ate the Curly pro­gram and com­pare it to some expected out­put. To make devel­op­ers’ lives eas­ier in the case of a failed test, let’s dis­play the Curly pro­gram, expected out­put, and actual out­put.

"Macro for pretty-printing test results"
macro test(prog, expected)
    quote
        if $prog == $expected
            # display green checkmark indicating a test pass
            printstyled("[Pass] \U2713 \n", color=:green)
        else
            printstyled("[Fail] \U0021 \n\t"
                        , $(Meta.quot(prog))
                        , "\n\tExpected value : "
                        , $(Meta.quot(expected)), "\n"
                        , "\tActual value : ", $prog
                        , color=:red)
        end
    end
end

With a quick exam­ple we can see how this pretty-printed out­put will look.

julia> @test curly"{+ 1 2}" 3
[Pass] ✓

julia> @test curly"{+ 1 2}" 4
[Fail] !
        curly"{+ 1 2}"
        Expected value : 4
        Actual : 3

Time to write some real tests! We will insert out new tests into the empty run_tests func­tion so we can run the whole suite at once.

Curly only natively offers addi­tion and sub­trac­tion oper­a­tors, there­fore test­ing is easy! With addi­tion, we can test the com­mu­ta­tive, asso­cia­tive, and iden­tity laws and call it good.

# inserted into the `run_tests` function

# commutative
@test curly"{+ 1 2}" 3
@test curly"{+ 2 1}" 3

# associative
@test curly"{+ 1 {+ 2 3}}" 6
@test curly"{+ {+ 1 2} 3}" 6

# identity
@test curly"{+ 1 0}" 1
@test curly"{+ 0 1}" 1

The com­mu­ta­tive and asso­cia­tive prop­er­ties do not hold for sub­trac­tion, but we can test with the sub­trac­tive prop­erty. This states that if we sub­tract zero from any num­ber $x$ the num­ber remains unchanged, and if we sub­tract $x$ from $0$, then we receive $-x$.

# subtractive properties
@test curly"{- 42 0}" 42
@test curly"{- 0 42}" -42

Mov­ing for­ward to more com­plex forms, we can test lamb­das and appli­ca­tion and then even­tu­ally, lets.

# simple lambdas and application
@test curly"{{lambda {x} x} 6}" 6
@test curly"{{lambda {x} {+ x 1}} 6}" 7
@test curly"""
        {{{lambda {x}
        {lambda {y}
            {+ x y}}}
            7} 4}
        """ 11
@test curly"""
        {let {f {lambda {y}
                    {lambda {x}
                    {+ x y}}}}
            {- {{f 10} 20} {{f 30} 40}}}
        """ -40

Now let’s use the goal pro­gram from the very begin­ning: the recur­sive mul­ti­pli­ca­tion func­tion.

@test curly"""
        {let {Y {lambda {f}
                {{lambda {x}
                    {x x}}
                    {lambda {x}
                    {f {lambda {y}
                        {{x x} y}}}}}}}
        {let {mult {Y {lambda {f}
                    {lambda {x}
                        {lambda {y}
                        {if0 y
                            0
                            {+ x {{f x } {- y 1}}}}}}}}}
            {{mult 7} 6}}}
        """ 42

Using our shiny new imple­men­ta­tion of mul­ti­pli­ca­tion we can even test dis­trib­u­tive prop­er­ties of mul­ti­pli­ca­tion over addi­tion!

@test curly"""
        {let {Y {lambda {f}
                {{lambda {x}
                    {x x}}
                    {lambda {x}
                    {f {lambda {y}
                        {{x x} y}}}}}}}
        {let {mult {Y {lambda {f}
                    {lambda {x}
                        {lambda {y}
                        {if0 y
                            0
                            {+ x {{f x } {- y 1}}}}}}}}}
            {-
              {{mult 7} {+ 4 3}}
              {+ {{mult 7} 4} {{mult 3} 7}}}}}
        """ 0

I’ll throw these next tests in for fun. They are the recur­sive fac­to­r­ial and fibon­nacci func­tions.

@test curly"""
        {let {Y {lambda {f}
                {{lambda {x}
                    {x x}}
                    {lambda {x}
                    {f {lambda {y}
                        {{x x} y}}}}}}}
        {let {mult {Y {lambda {f}
                    {lambda {x}
                        {lambda {y}
                        {if0 y
                            0
                            {+ x {{f x } {- y 1}}}}}}}}}
            {let {fac {Y {lambda {f}
                        {lambda {x}
                            {if0 x
                            1
                            {{mult x} {f {- x 1}}}}}}}}
            {fac 5}}}}
        """ 120

@test curly"""
        {let {Y {lambda {f}
                {{lambda {x}
                    {x x}}
                    {lambda {x}
                    {f {lambda {y}
                        {{x x} y}}}}}}}
        {let {fib {Y {lambda {f}
                        {lambda {x}
                        {if0 {- x 1}
                            1
                            {if0 x
                            1
                            {+ {f {- x 1}} {f {- x 2}}}}}}}}}
            {fib 42}}}
        """ 433494437

Now we can run our test­ing func­tion to make sure it works.

julia> @time run_tests()
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
[Pass] ✓
2.905905 seconds (589 allocations: 23.625 KiB)

All tests pass! Looks like our imple­men­ta­tion of Curly is work­ing out great and can now be used in our Julia pro­grams.

Julia, while typ­i­cally thought of as only used for sci­en­tific com­put­ing (at least in some social cir­cles), also makes a good option for cre­at­ing embed­ded domain-spe­cific lan­guages, and you can rely on the Julia run­time which is very per­for­mant. There are already many pack­ages which lever­age this metapro­gram­ming power to make some sci­en­tific prob­lems eas­ier. These are def­i­nitely worth tak­ing a look at if you haven’t already (e.g. JuMP, Octo, and Lisp­Syn­tax). While Curly was a triv­ial exam­ple, it hope­fully can spark ideas of how to expand Julia syn­tax to make domain-spe­cific prob­lems eas­ier solved! The full source code can be found on GitHub.