Small Basic, from Microsoft Dev Labs, is a minimal implementation of BASIC, employing only 14 keywords, aimed at beginners. In the last 2 articles I described building an internal DSL and parser to build an abstract syntax tree (AST) representation of the program with an interpreter for execution. In this article I’ll describe how to transform the AST to executable .Net IL code, thus building a compiler.
Small Basic is an imperative programming language where programs are built up of procedures containing statements. All variables are defined in global scope and are of variant type, as per classic VB.
A Small Basic program can be modelled in .Net IL as a static type with a static method entry point for the main procedure and further static methods for subroutines. Variables can be defined as static fields holding variants of Primitive type (from the Small Basic library) . The control flow statements: If, For, While and Goto can be implemented as branches and GoSub as a method call.
Compiling a program involves passing over the statements in the AST and emitting IL code using .Net’s Reflection.Emit API to produce a .Net assembly.
Creating an assembly
The Small Basic program will be compiled to an executable .Net assembly, containing a module and underneath a type to hold the fields and methods.
Defining a dynamic assembly:
let assemblyBuilder =
AppDomain.CurrentDomain.DefineDynamicAssembly(
AssemblyName(name),
AssemblyBuilderAccess.RunAndSave)
Defining a module:
let moduleBuilder = assemblyBuilder.DefineDynamicModule(name+".exe")
Defining a type:
let typeBuilder = moduleBuilder.DefineType("Program", TypeAttributes.Public)
Saving the assembly to disk:
assemblyBuilder.Save(name+".exe")
Variables
The first pass of the transform finds all the variables that are explicitly set in the Small Basic program and then defines static fields for them.
There are 2 statements in the AST with explicit assignment:
type assign = Set of identifier * expr
/// Small Basic instruction
type instruction =
| Assign of assign
| For of assign * expr * expr
The variables can be collected by pattern matching over the instructions to yield the names:
[for instruction in instructions do
match instruction with
| Assign(Set(name,_)) -> yield name
| For(Set(name,_),_,_) -> yield name
| _ -> ()
]
Static fields are generated of type Primitive:
let generateField name =
typeBuilder.DefineField(name, typeof<Primitive>, FieldAttributes.Static)
Procedures
The next pass finds all subroutines, Sub statements, and defines methods for them:
[for instruction in instructions do
match instruction with
| Sub(name) -> yield name
| _ -> ()
]
Subroutines have no return value so they are defined as void:
let generateMethod name =
typeBuilder.DefineMethod(
name,
MethodAttributes.Static ||| MethodAttributes.Public,
typeof<Void>,
[||])
Statements
The final pass iterates over the instructions and emits the corresponding IL codes:
let emitInstruction (il:ILGenerator) = function
| Assign(Set(name,expr)) ->
let ty = emitExpression il expr
il.Emit(OpCodes.Stsfld, fieldLookup name)
// ... other statement types
In the code above the right hand side expression is evaluated and placed at the top of the stack. Then the Stsfld opcode is used to store the value in the named static field.
Expressions can be composed of literal values, variable lookups, function calls and arithmetic, comparison or logical operators:
let rec emitExpression (il:ILGenerator) = function
| Literal(x) -> emitLiteral il x
| Var(name) -> il.Emit(OpCodes.Ldsfld, fieldLookup name)
| Arithmetic(lhs,Add,rhs) -> emitOp il lhs rhs "op_Addition"
| Arithmetic(lhs,Subtract,rhs) -> emitOp il lhs rhs "op_Subtraction"
// ... other expression types
For literals the value is again put on top of the stack and passed to the Primitive constructor:
let emitPrimitive (il:ILGenerator) t =
let ci = typeof<Primitive>.GetConstructor([|t|])
il.Emit(OpCodes.Newobj, ci)
let emitLiteral (il:ILGenerator) = function
| Int(n) ->
il.Emit(OpCodes.Ldc_I4, n)
emitPrimitive il typeof<int>
| Double(n) ->
il.Emit(OpCodes.Ldc_R8, n)
emitPrimitive il typeof<double>
| String(s) ->
il.Emit(OpCodes.Ldstr, s)
emitPrimitive il typeof<string>
// ... other literal types
That completes transformation of the AST to IL code.
Source code
The full source code for the Small Basic compiler is available on BitBucket, the compiler is just a few hundred lines of code. The code parses and compiles a simple FizzBuzz program that can be run at the command prompt.
Conclusions
In the previous 2 articles we’ve seen that F#’s discriminated unions let you easily describe the abstract syntax tree (AST) of a programming language. Source code can then be easily transformed into AST form by incrementally building a parser in F# interactive using the FParsec parser combinator library. Finally pattern matching makes light work of passing over the AST to interpret the code or transform it to an executable assembly via .Net’s reflection emit API.
The parser, compiler or interpreter could be used for simple runtime scripting of an application and may be easily extended with new keywords and operators.
The technique described can be applied for a variety of DSLs or general purpose programming languages. Here’s the simple steps:
- Define an AST
- Write an internal DSL or a parser for an external DSL
- Build an interpreter or a compiler to execute programs
Further reading: Peter Sestoft’s Programming Language Concepts book, aimed at undergraduates, gives a great introduction to parser, compiler and interpreter technology, including a mini C like language implementation, with examples provided in F#.