Back at the start of the year, I took the F# parser combinator library FParsec out for a spin, writing an extended Small Basic compiler and later a similar parser for a subset of C#. Previously I’d been using hand rolled parsers, for projects like TickSpec, a .Net BDD library, and Cellz, an open source spreadsheet. With FParsec you can construct a parser relatively rapidly and easily using the powerful built-in functions and F# interactive for quick feedback.
FParsec has been used in a number of interesting projects including FunScript, for parsing TypeScript definition files, and FogBugz for search queries in Kiln.
Like any library there is a bit of a learning curve, taking time to get up to speed before you reap the benefits. So with that in mind I put together a short hands on tutorial that I ran at the F#unctional Londoners meetup held at Skills Matter last week.
The tutorial consisted of a short introduction to DSLs and parsing. Then a set of tasks leading to a parser for a subset of the Logo programming language. Followed by examples of scaling out to larger parsers and building a compiler backend, using Small Basic and C# as examples.
Download the tasks from: http://trelford.com/FParsecTutorial.zip
Logo programming language
One of my earliest experiences with programming was a Logo session in the 70s, when my primary school had a short term loan of a turtle robot:
The turtle, either physical or on the screen, can be controlled with simple commands like forward, left, right and repeat, e.g.
> repeat 10 [right 36 repeat 5 [forward 54 right 72]]
Abstract Syntax Tree
The abstract syntax tree (AST) for these commands can be easily described using F#’s discriminated unions type:
type arg = int
type command =
| Forward of arg
| Turn of arg
| Repeat of arg * command list
Note: right and left can simply be represented as Turn with a positive or negative argument.
The main task was to use FParsec to parse the commands in to AST form.
Parsing
A parser for the forward command can be easily constructed using built-in FParsec parser functions and the >>. operator to combine them:
let forward = pstring "forward" >>. spaces1 >>. pfloat
The parsed float value can be used to construct the Forward case using the |>> operator:
let pforward = forward |>> fun n -> Forward(int n)
To parse the forward or the short form fd, the <|> operator can be employed:
let pforward = (pstring "fd" <|> pstring "forward") >>. spaces1 >>. pfloat
|>> fun n -> Forward(int n)
Parsing left and right is almost identical:
let pleft = (pstring "left" <|> pstring "lt") >>. spaces1 >>. pfloat
|>> fun x -> Left(int -x)
let pright = (pstring "right" <|> pstring "right") >>. spaces1 >>. pfloat
|>> fun x -> Right(int x)
To parse a choice of commands, we can use the <|> operator again:
let pcommand = pforward <|> pleft <|> pright
To handle a sequence of commands there is the many function
let pcommands = many (pcommand .>> spaces)
To parse the repeat command we need to parse the repeat count and a block of commands held between square brackets:
let block = between (pstring "[") (pstring "]") pcommands
let prepeat =
pstring "repeat" >>. spaces1 >>. pfloat .>> spaces .>>. block
|>> fun (n, commands) -> Repeat(int n, commands)
Putting this altogether we can parse a simple circle drawing function:
> repeat 36 [forward 10 right 10]
However we cannot yet parse a repeat command within a repeat block, as the command parser does not reference the repeat command.
Forward references
To separate the definition of repeat’s parser function from it’s implementation we can use the createParserForwardedToRef function:
let prepeat, prepeatimpl = createParserForwardedToRef ()
Then we can define the choice of commands to include repeat:
let pcommand = pforward <|> pleft <|> pright <|> prepeat
And finally define the implementation of the repeat parser that refers to itself:
prepeatimpl :=
pstring "repeat" >>. spaces1 >>. pfloat .>> spaces .>>. block
|>> fun (n, commands) -> Repeat(int n, commands)
Allowing us to parse nested repeats, i.e.
> repeat 10 [right 36 repeat 5 [forward 54 right 72]]
Parses to:
> Repeat (10,[Right 36; Repeat (5,[Forward 54; Right 72])])
Interpreter
Evaluation of a program can now be easily achieved using pattern matching over the AST:
let rec perform turtle = function
| Forward n ->
let r = float turtle.A * Math.PI / 180.0
let dx, dy = float n * cos r, float n * sin r
let x, y = turtle.X, turtle.Y
let x',y' = x + dx, y + dy
drawLine (x,y) (x',y')
{ turtle with X = x'; Y = y' }
| Turn n -> { turtle with A=turtle.A + n }
| Repeat(n,commands) ->
let rec repeat turtle = function
| 0 -> turtle
| n -> repeat (performAll turtle commands) (n-1)
repeat turtle n
and performAll = List.fold perform
Check out this snippet for the full implementation as a script: http://fssnip.net/nM
User Commands
Logo lets you define your own commands, e.g.
> to square
repeat 4 [forward 50 right 90]
end
to flower
repeat 36 [right 10 square]
end
to garden
repeat 25 [set-random-position flower]
end
The parser can be easily extended to support this, try the snippet: http://fssnip.net/nN
Small Basic
Small Basic is a Microsoft programming language also aimed at teaching kids, and also featuring turtle functionality. At the beginning of the year I wrote a short series of posts on writing an extended compiler for Small Basic:
The series starts with an AST, internal DSL and interpreter. Then moves on to parsing the language with FParsec and compiling the AST to IL code using Reflection.Emit. Finally the series ends with extensions for functions with arguments and support for tuples and pattern matching.
It’s a fairly short hop from implementing Logo to implementing a larger language like Small Basic.
Parsing C#
A few weeks later as an experiment I knocked up an AST and parser for a fairly large subset of C#, which shares much of the imperative core of Small Basic: http://fssnip.net/lf
Check out Neil Danson’s blog on building a C# compiler in F# to see C# compiled to IL using a similar AST.
DDD North: Write your own compiler in 24 hours
If you’re interested in learning more, I’ll be speaking at DDD North in Leeds on Saturday 18th October about how to write your own compiler in 24 hours.