Microsoft Small Basic is a minimal implementation of the BASIC programming language aimed at beginners. In my last article I described the implementation of an interpreter for Small Basic using an internal DSL to specify the abstract syntax tree (AST) for programs.
With the AST for the language well defined, a text parser for Small Basic programs is now relatively easy. There are quite a few options for writing parsers in F# from FsxLex and FSYyacc to hand rolled recursive descent parsers and parser combinator libraries.
For the expression parser in the open source spreadsheet Cellz, I initially used a simple parser combinator implementation based on an F# Journal article by Jon Harrop. Later I changed it to a hand rolled recursive descent parser using F# Active Patterns. Tomas Petricek has a chapter in F# Deep Dives which uses active patterns for parsing Markdown, the syntax used for Stack Overflow posts.
FParsec
To try something new, I decided to use the open source FParsec parser combinator library. FParsec, written by Stephan Tolksdorf, has great documentation including an in depth tutorial and user guide along with a convenient Nuget package. FogCreek use FParsec for parsing search queries.
With FParsec, parsers for code fragments can be written as simple functions and then composed into larger parsers for values, expressions, statements and programs. A parser can be written incrementally using the F# interactive environment giving quick feedback as you go. The library also gives really helpful error messages with line and column numbers when a parser fails.
I spent around an hour going through the tutorial which provided enough detail to get started on a parser for Small Basic. A Small Basic program is composed of statements, with one statement per line.
Parsing literals
Small Basic supports a small range of value types:
/// Small Basic value
type value =
| Bool of bool
| Int of int
| Double of double
| String of string
A parser for the boolean literal values “true” or “false” can be defined using stringReturn:
let ptrue = stringReturn "true" true
let pfalse = stringReturn "false" false
The parsers can then be combined to be either true or false using the <|> operator:
let pbool = ptrue <|> pfalse
To take the boolean parser to a value type we use the |>> pipeline combinator:
let pbool = (ptrue <|> pfalse) |>> fun x -> Bool(x)
FParsec contains parsers for many primitive types including integers:
let pint = pint32 |>> fun n -> Int(n)
These parsers can then be combined to create a parser for values:
let pvalue = pbool <|> pint // ...
Parsing expressions
Next up are Small Basic’s expressions:
/// Small Basic expression
type expr =
| Literal of value
| Var of identifier
| GetAt of location
| Func of invoke
| Neg of expr
| Arithmetic of expr * arithmetic * expr
| Comparison of expr * comparison * expr
| Logical of expr * logical * expr
A parser for literals can be created using the value parser:
let pliteral = pvalue |>> fun x -> Literal(x)
Identifiers are expected to start with a letter or underscore and may contain numerals. The FParsec tutorial contains a handy example:
let pidentifier =
let isIdentifierFirstChar c = isLetter c || c = '_'
let isIdentifierChar c = isLetter c || isDigit c || c = '_'
many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier"
This can be used to define a parser for variable names:
let pvar = pidentifier |>> fun name -> Var(name)
This can then be used to define a parser for simple expressions:
let pexpr = pliteral <|> pvar
Operators can be easily parsed using the built-in operator precedence parser described in the user guide.
Parsing statements
The longest of Small Basic’s statements is the for loop:
/// Small Basic assignment
type assign =
| Set of identifier * expr
/// Small Basic instruction
type instruction =
| Assign of assign
| For of assign * expr * expr
A for statement is composed of an assignment and expressions for the end value and step:
For A=1 To 100 Step 1
To parse the assignment the pipe3 combinator can be used for the constituent parts:
let pset = pipe3 pidentifier (pstring "=") pexpr (fun id _ e -> Set(id, e))
The parser for the from, to and step components can be combined as:
let pfor =
let pfrom = pstring "For" >>. spaces1 >>. pset
let pto = pstring "To" >>. spaces1 >>. pexpr
let pstep = pstring "Step" >>. spaces1 >>. pexpr
let toStep = function None -> Literal(Int(1)) | Some s -> s
pipe3 pfrom pto (opt pstep) (fun f t s -> For(f, t, toStep s))
It can be tested in F# interactive using the run function:
run pfor "For A=1 To 100"
Which produces a statement from the parser:
val it : ParserResult<instruction,unit> =
Success: For (Set ("A",Literal (Int 1)),Literal (Int 100),Literal (Int 1))
Parsers for statements can be combined using the <|> operator or choice function:
let pstatement =
choice [
attempt pfor
// ... other statements
]
Parsing programs
Small Basic supports comments at the ends of the line:
let pcomment =
pchar '\'' >>. skipManySatisfy (fun c -> c <> '\n') >>. pchar '\n'
Thusly the end of a line is characterized either by a comment or a new line character:
let peol = pcomment <|> (pchar '\n')
The lines of the program can be parsed using the many function:
let plines = many (spaces >>. pstatement .>> peol) .>> eof
Finally the program can be parsed by applying the run function:
let parse (program:string) =
match run plines program with
| Success(result, _, _) -> result
| Failure(errorMsg, e, s) -> failwith errorMsg
Running programs
The generated AST from the parser can be fed directly into the Small Basic interpreter built in the previous article.
The code from this example is available as a gist. The full parser and interpreter are available as an F# snippet.
Here’s FizzBuzz in Small Basic:
Sub Modulus
Result = Dividend
While Result >= Divisor
Result = Result - Divisor
EndWhile
EndSub
For A = 1 To 100
Dividend = A
Divisor = 3
Modulus()
Mod3 = Result
Divisor = 5
Modulus()
Mod5 = Result
If Mod3 = 0 And Mod5 = 0 Then
TextWindow.WriteLine("FizzBuzz")
ElseIf Mod3 = 0 Then
TextWindow.WriteLine("Fizz")
ElseIf Mod5 = 0 Then
TextWindow.WriteLine("Buzz")
Else
TextWindow.WriteLine(A)
EndIf
EndFor
And the generated AST from the parser:
val program : instruction [] =
[|
Sub "Modulus"; Assign (Set ("Result",Var "Dividend"));
While (Comparison (Var "Result",Ge,Var "Divisor"));
Assign (Set ("Result",Arithmetic (Var "Result",Subtract,Var "Divisor")));
EndWhile;
EndSub;
For (Set ("A",Literal (Int 1)),Literal (Int 100),Literal (Int 1));
Assign (Set ("Dividend",Var "A"));
Assign (Set ("Divisor",Literal (Int 3))); GoSub "Modulus";
Assign (Set ("Mod3",Var "Result"));
Assign (Set ("Divisor",Literal (Int 5))); GoSub "Modulus";
Assign (Set ("Mod5",Var "Result"));
If
(Logical
(Comparison (Var "Mod3",Eq,Literal (Int 0)),And,
Comparison (Var "Mod5",Eq,Literal (Int 0))));
Action (Method ("TextWindow","WriteLine",[|Literal (String "FizzBuzz")|]));
ElseIf (Comparison (Var "Mod3",Eq,Literal (Int 0)));
Action (Method ("TextWindow","WriteLine",[|Literal (String "Fizz")|]));
ElseIf (Comparison (Var "Mod5",Eq,Literal (Int 0)));
Action (Method ("TextWindow","WriteLine",[|Literal (String "Buzz")|]));
Else;
Action (Method ("TextWindow","WriteLine",[|Var "A"|]));
EndIf;
EndFor|]
Conclusion
FParsec lets you declaratively build a parser for a programming language incrementally in F# interactive with minimal effort. If you need to write a parser for an external DSL or programming language then FParsec is well worth a look.