Phillip Trelford's Array

POKE 36879,255

Small Basic Compiler

Small Basic, from Microsoft Dev Labs, is a minimal implementation of BASIC, employing only 14 keywords, aimed at beginners. In the last 2 articles I described building an internal DSL and parser to build an abstract syntax tree (AST) representation of the program with an interpreter for execution. In this article I’ll describe how to transform the AST to executable .Net IL code, thus building a compiler.

Small Basic is an imperative programming language where programs are built up of procedures containing statements. All variables are defined in global scope and are of variant type, as per classic VB

A Small Basic program can be modelled in .Net IL as a static type with a static method entry point for the main procedure and further static methods for subroutines. Variables can be defined as static fields holding variants of Primitive type (from the Small Basic library) . The control flow statements: If, For, While and Goto can be implemented as branches and GoSub as a method call.

Compiling a program involves passing over the statements in the AST and emitting IL code using .Net’s Reflection.Emit API to produce a .Net assembly.

Creating an assembly

The Small Basic program will be compiled to an executable .Net assembly, containing a module and underneath a type to hold the fields and methods.

Defining a dynamic assembly:

let assemblyBuilder =
    AppDomain.CurrentDomain.DefineDynamicAssembly(
        AssemblyName(name),
        AssemblyBuilderAccess.RunAndSave)  

Defining a module:

let moduleBuilder = assemblyBuilder.DefineDynamicModule(name+".exe")

Defining a type:

let typeBuilder = moduleBuilder.DefineType("Program", TypeAttributes.Public)

Saving the assembly to disk:

assemblyBuilder.Save(name+".exe")

Variables

The first pass of the transform finds all the variables that are explicitly set in the Small Basic program and then defines static fields for them.

There are 2 statements in the AST with explicit assignment:

type assign = Set of identifier * expr
/// Small Basic instruction
type instruction =
    | Assign of assign
    | For of assign * expr * expr

The variables can be collected by pattern matching over the instructions to yield the names:

[for instruction in instructions do
    match instruction with
    | Assign(Set(name,_)) -> yield name        
    | For(Set(name,_),_,_) -> yield name
    | _ -> ()
]

Static fields are generated of type Primitive:

let generateField name = 
    typeBuilder.DefineField(name, typeof<Primitive>, FieldAttributes.Static)

Procedures

The next pass finds all subroutines, Sub statements, and defines methods for them:

[for instruction in instructions do
    match instruction with
    | Sub(name) -> yield name
    | _ -> ()
]

Subroutines have no return value so they are defined as void:

let generateMethod name = 
    typeBuilder.DefineMethod(
        name, 
        MethodAttributes.Static ||| MethodAttributes.Public,
        typeof<Void>,
        [||])

Statements

The final pass iterates over the instructions and emits the corresponding IL codes:

let emitInstruction (il:ILGenerator) = function       
    | Assign(Set(name,expr)) ->
        let ty = emitExpression il expr
        il.Emit(OpCodes.Stsfld, fieldLookup name)
    // ... other statement types

In the code above the right hand side expression is evaluated and placed at the top of the stack. Then the Stsfld opcode is used to store the value in the named static field.

Expressions can be composed of literal values, variable lookups, function calls and arithmetic, comparison or logical operators:

let rec emitExpression (il:ILGenerator) = function
    | Literal(x) -> emitLiteral il x
    | Var(name) -> il.Emit(OpCodes.Ldsfld, fieldLookup name)
    | Arithmetic(lhs,Add,rhs) -> emitOp il lhs rhs "op_Addition" 
    | Arithmetic(lhs,Subtract,rhs) -> emitOp il lhs rhs "op_Subtraction"
    // ... other expression types

For literals the value is again put on top of the stack and passed to the Primitive constructor:

let emitPrimitive (il:ILGenerator) t =
    let ci = typeof<Primitive>.GetConstructor([|t|])
    il.Emit(OpCodes.Newobj, ci) 
let emitLiteral (il:ILGenerator) = function
    | Int(n) -> 
        il.Emit(OpCodes.Ldc_I4, n)
        emitPrimitive il typeof<int>
    | Double(n) -> 
        il.Emit(OpCodes.Ldc_R8, n)
        emitPrimitive il typeof<double>
    | String(s) -> 
        il.Emit(OpCodes.Ldstr, s)
        emitPrimitive il typeof<string>
    // ... other literal types

That completes transformation of the AST to IL code.

Source code

The full source code for the Small Basic compiler is available on BitBucket, the compiler is just a few hundred lines of code. The code parses and compiles a simple FizzBuzz program that can be run at the command prompt.

Conclusions

In the previous 2 articles we’ve seen that F#’s discriminated unions let you easily describe the abstract syntax tree (AST) of a programming language. Source code can then be easily transformed into AST form by incrementally building a parser in F# interactive using the FParsec parser combinator library. Finally pattern matching makes light work of passing over the AST to interpret the code or transform it to an executable assembly via .Net’s reflection emit API.

The parser, compiler or interpreter could be used for simple runtime scripting of an application and may be easily extended with new keywords and operators.

The technique described can be applied for a variety of DSLs or general purpose programming languages. Here’s the simple steps:

  1. Define an AST
  2. Write an internal DSL or a parser for an external DSL
  3. Build an interpreter or a compiler to execute programs

Further reading: Peter Sestoft’s Programming Language Concepts book, aimed at undergraduates, gives a great introduction to parser, compiler and interpreter technology, including a mini C like language implementation, with examples provided in F#.

Small Basic Parser

Microsoft Small Basic is a minimal implementation of the BASIC programming language aimed at beginners. In my last article I described the implementation of an interpreter for Small Basic using an internal DSL to specify the abstract syntax tree (AST) for programs.

With the AST for the language well defined, a text parser for Small Basic programs is now relatively easy. There are quite a few options for writing parsers in F# from FsxLex and FSYyacc to hand rolled recursive descent parsers and parser combinator libraries.

For the expression parser in the open source spreadsheet Cellz, I initially used a simple parser combinator implementation based on an F# Journal article by Jon Harrop. Later I changed it to a hand rolled recursive descent parser using F# Active Patterns. Tomas Petricek has a chapter in F# Deep Dives which uses active patterns for parsing Markdown, the syntax used for Stack Overflow posts.

FParsec

To try something new, I decided to use the open source FParsec parser combinator library. FParsec, written by Stephan Tolksdorf, has great documentation including an in depth tutorial and user guide along with a convenient Nuget package. FogCreek use FParsec for parsing search queries.

With FParsec, parsers for code fragments can be written as simple functions and then composed into larger parsers for values, expressions, statements and programs. A parser can be written incrementally using the F# interactive environment giving quick feedback as you go. The library also gives really helpful error messages with line and column numbers when a parser fails.

I spent around an hour going through the tutorial which provided enough detail to get started on a parser for Small Basic. A Small Basic program is composed of statements, with one statement per line.

Parsing literals

Small Basic supports a small range of value types:

/// Small Basic value
type value =
    | Bool of bool
    | Int of int
    | Double of double
    | String of string

A parser for the boolean literal values “true” or “false” can be defined using stringReturn:

let ptrue = stringReturn "true" true
let pfalse = stringReturn "false" false

The parsers can then be combined to be either true or false using the <|> operator:

let pbool = ptrue <|> pfalse

To take the boolean parser to a value type we use the |>> pipeline combinator:

let pbool = (ptrue <|> pfalse) |>> fun x -> Bool(x)

FParsec contains parsers for many primitive types including integers:

let pint = pint32 |>> fun n -> Int(n)

These parsers can then be combined to create a parser for values:

let pvalue = pbool <|> pint // ...

Parsing expressions

Next up are Small Basic’s expressions:

/// Small Basic expression
type expr =
    | Literal of value
    | Var of identifier
    | GetAt of location
    | Func of invoke
    | Neg of expr
    | Arithmetic of expr * arithmetic * expr
    | Comparison of expr * comparison * expr
    | Logical of expr * logical * expr

A parser for literals can be created using the value parser:

let pliteral = pvalue |>> fun x -> Literal(x)

Identifiers are expected to start with a letter or underscore and may contain numerals. The FParsec tutorial contains a handy example:

let pidentifier =
    let isIdentifierFirstChar c = isLetter c || c = '_'
    let isIdentifierChar c = isLetter c || isDigit c || c = '_'
    many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier"

This can be used to define a parser for variable names:

let pvar = pidentifier |>> fun name -> Var(name)

This can then be used to define a parser for simple expressions:

let pexpr = pliteral <|> pvar

Operators can be easily parsed using the built-in operator precedence parser described in the user guide.

Parsing statements

The longest of Small Basic’s statements is the for loop:

/// Small Basic assignment
type assign =
    | Set of identifier * expr
/// Small Basic instruction
type instruction =
    | Assign of assign
    | For of assign * expr * expr

A for statement is composed of an assignment and expressions for the end value and step:

For A=1 To 100 Step 1

To parse the assignment the pipe3 combinator can be used for the constituent parts:

let pset = pipe3 pidentifier (pstring "=") pexpr (fun id _ e -> Set(id, e))

The parser for the from, to and step components can be combined as:

let pfor =
    let pfrom = pstring "For" >>. spaces1 >>. pset
    let pto = pstring "To" >>. spaces1 >>. pexpr
    let pstep = pstring "Step" >>. spaces1 >>. pexpr
    let toStep = function None -> Literal(Int(1)) | Some s -> s
    pipe3 pfrom pto (opt pstep) (fun f t s -> For(f, t, toStep s))

It can be tested in F# interactive using the run function:

run pfor "For A=1 To 100"

Which produces a statement from the parser:

val it : ParserResult<instruction,unit> =
  Success: For (Set ("A",Literal (Int 1)),Literal (Int 100),Literal (Int 1))

Parsers for statements can be combined using the <|> operator or choice function:

let pstatement = 
    choice [
        attempt pfor
        // ... other statements
    ]

Parsing programs

Small Basic supports comments at the ends of the line:

let pcomment = 
    pchar '\'' >>. skipManySatisfy (fun c -> c <> '\n') >>. pchar '\n'

Thusly the end of a line is characterized either by a comment or a new line character:

let peol = pcomment <|> (pchar '\n')

The lines of the program can be parsed using the many function:

let plines = many (spaces >>. pstatement .>> peol) .>> eof

Finally the program can be parsed by applying the run function:

let parse (program:string) =    
    match run plines program with
    | Success(result, _, _)   -> result
    | Failure(errorMsg, e, s) -> failwith errorMsg

Running programs

The generated AST from the parser can be fed directly into the Small Basic interpreter built in the previous article.

The code from this example is available as a gist. The full parser and interpreter are available as an F# snippet.

Here’s FizzBuzz in Small Basic:

Sub Modulus
  Result = Dividend
  While Result >= Divisor
    Result = Result - Divisor
  EndWhile
EndSub

For A = 1 To 100    
  Dividend = A
  Divisor = 3
  Modulus()
  Mod3 = Result
  Divisor = 5
  Modulus()
  Mod5 = Result
  If Mod3 = 0 And Mod5 = 0 Then
    TextWindow.WriteLine("FizzBuzz")  
  ElseIf Mod3 = 0 Then
    TextWindow.WriteLine("Fizz")
  ElseIf Mod5 = 0 Then
    TextWindow.WriteLine("Buzz")
  Else
    TextWindow.WriteLine(A)        
  EndIf
EndFor

And the generated AST from the parser:

val program : instruction [] =
  [|
  Sub "Modulus"; Assign (Set ("Result",Var "Dividend"));
  While (Comparison (Var "Result",Ge,Var "Divisor"));
  Assign (Set ("Result",Arithmetic (Var "Result",Subtract,Var "Divisor")));
  EndWhile; 
  EndSub;
  For (Set ("A",Literal (Int 1)),Literal (Int 100),Literal (Int 1));
  Assign (Set ("Dividend",Var "A"));
  Assign (Set ("Divisor",Literal (Int 3))); GoSub "Modulus";
  Assign (Set ("Mod3",Var "Result"));
  Assign (Set ("Divisor",Literal (Int 5))); GoSub "Modulus";
  Assign (Set ("Mod5",Var "Result"));
  If
    (Logical
       (Comparison (Var "Mod3",Eq,Literal (Int 0)),And,
        Comparison (Var "Mod5",Eq,Literal (Int 0))));
  Action (Method ("TextWindow","WriteLine",[|Literal (String "FizzBuzz")|]));
  ElseIf (Comparison (Var "Mod3",Eq,Literal (Int 0)));
  Action (Method ("TextWindow","WriteLine",[|Literal (String "Fizz")|]));
  ElseIf (Comparison (Var "Mod5",Eq,Literal (Int 0)));
  Action (Method ("TextWindow","WriteLine",[|Literal (String "Buzz")|]));
  Else; 
  Action (Method ("TextWindow","WriteLine",[|Var "A"|])); 
  EndIf;
  EndFor|]

Conclusion

FParsec lets you declaratively build a parser for a programming language incrementally in F# interactive with minimal effort. If you need to write a parser for an external DSL or programming language then FParsec is well worth a look.

Small Basic Interpreter

Microsoft’s Small Basic is a minimal implementation of the BASIC programming language using only 14 keywords. It’s aimed at beginners with a very simple development environment and library. My kids have enjoyed playing with it particularly the Turtle API which are reminiscent of Logo. Small Basic programs can be run locally, online via Silverlight or migrated to full fat Visual Basic .Net.

I’m quite interested in building Domain Specific Languages (DSLs), including embedded DSLs, parsers and compilers. For a short exercise/experiment I wanted to recreate a simple imperative language and Small Basic looked like a fun option.

Abstract Syntax Tree

I started by sketching out an abstract syntax tree (AST) for the language which describes the values, expressions and instructions.

F# discriminated union’s make light work of this:

/// Small Basic instruction
type instruction =
    | Assign of assign
    | SetAt of location * expr
    | PropertySet of string * string * expr
    | Action of invoke
    | For of assign * expr * expr
    | EndFor
    | If of expr
    | ElseIf of expr
    | Else
    | EndIf
    | While of expr
    | EndWhile
    | Sub of identifier
    | EndSub
    | GoSub of identifier
    | Label of label
    | Goto of label

A parser or embedded DSL can be used to generate an AST for a program. The AST can then be evaluated by an interpreter, or transformed by a compiler to processor instructions, byte code or even another language.

Embedded DSL

To test the AST I built a small embedded DSL using custom operators and functions:

let I x = Literal(Int(x))
let (!) (name:string) = Var(name)
let FOR(var:identifier, from:expr, ``to``:expr) = 
    For(Set(var, from), ``to``, I(1))
let PRINT x = 
    let writeLine = typeof<Console>.GetMethod("WriteLine",[|typeof<obj>|])
    Action(Call(writeLine, [|x|]))

This can be used to specify a Small Basic program.

let program =
    [|
        FOR("A", I(1), I(100))
        PRINT(!"A")
        ENDFOR        
    |]

The program AST can then be evaluated in F# interactive:

val program : instruction [] =
  [|For (Set ("A",Literal (Int 1)),Literal (Int 100),Literal (Int 1));
    Action (Call (Void WriteLine(System.Object),[|Var "A"|])); 
    EndFor|]

Defining the embedded DSL in F# only took minutes using the interactive REPL environment and looks quite close to the target language.

Interpreter

Programs can now be run by evaluating the AST using an interpreter. The interpreter merely steps through each instruction using pattern matching:

let run (program:instruction[]) =
    /// ... 
    /// Instruction step
    let step () =
        let instruction = program.[!pi]
        match instruction with
        | Action(call) -> invoke state call |> ignore
        | For((Set(identifier,expr) as from), target, step) ->
            assign from
            let index = findIndex (!pi+1) (isFor,isEndFor) EndFor
            forLoops.[index] <- (!pi, identifier, target, step)
            if toInt(variables.[identifier]) > toInt(eval target) 
            then pi := index
        | EndFor ->
            let start, identifier, target, step = forLoops.[!pi]
            let x = variables.[identifier]
            variables.[identifier] <- arithmetic x Add (eval step)
            if toInt(variables.[identifier]) <= toInt(eval target) 
            then pi := start
    while !pi < program.Length do step (); incr pi

Scriptable Small Basic

The AST, embedded DSL and interpreter are available as an F# snippet that you can run in F# interactive or build as an executable. The script includes a Small Basic FizzBuzz sample.

FizzBuzz