Phillip Trelford's Array

POKE 36879,255

Parsing C#

C# is a mainstream programming language based on Java, that runs on Windows via .Net and cross platform via Mono. On .Net , C# is used primarily in the enterprise for web and desktop applications. With Mono, C# can target a range of popular platforms including iOS and Andriod, and is primarily employed in mobile application development via Xamarin’s tools and game development via Unity.

Microsoft’s current shipping C# compiler is proprietary code written in C++. Mono’s C# compiler, is open source and written in C#. In 2008 Microsoft announced the development of a new proprietary C# compiler written in C#, codenamed Roslyn, with the last preview released back in 2012. One of Roslyn’s key aims to expose the output of the parser as an abstract syntax tree (AST) that can be inspected by third party developers for tooling.

To fill the void, it feels like every man and his dog has written a parser for C#, most commonly to provide developer productivity tools in Visual Studio.

Third party vendors with C# parsers include:

On the open source side, Irony is a particularly interesting project allowing grammars to be specified in C# and includes samples for parsing many languages including C# and Java.

Parsing C#

You can think of C# and Java as managed C with classes. That is C# at it’s core is an imperative programming language, composed of statement blocks with additional syntax for class declaration and a dot operator for invoking members on a class.

Under the hood instance members simply take the instance as the first argument:

C C#
struct Point { 
  float X; float Y; 
};
float length(Point p) {
  return sqrt(p.X*p.X + p.Y*p.Y);
}
class Point {
  float X; float Y;
  public float Length() {
    return Math.Sqrt(X * X + Y * Y);
  }
}

The dot operator is probably *the* killer feature of class based languages like Java and C# as it allows developers to explore existing APIs via code completion in an IDE.

Parsing C# statements is not hugely dissimilar to parsing Small Basic statements which I have covered in a recent article. The only real difference is that C style languages typically employ semicolons between statements and curly braces to delimit statement blocks.

Parsing C# files involves finding using directives and namespace blocks. Then inside the namespace blocks finding type blocks including class, interface, struct, enum and delegate constructs. Inside the class blocks the member fields, properties, methods and constructors. Finally inside the members the statement blocks.

A few weeks back I was stuck on a a slow running train for a couple of hours, and had a go at writing a simple C# parser in F# using discriminated unions for the abstract syntax tree (AST) and FParsec for parsing. The same approach I’d used to parse Small Basic. I managed to sketch out a rough AST and a parser for namespaces and class declarations with around 100 lines of code. After a few more train journeys I managed to cover members and statement blocks, getting close to parsing C# 1.1. The prototype C# AST and parser is available as an F# snippet that can be run as a script inside F# interactive.

As an aside Neil Danson has recently constructed a compiler for the AST in F# that builds .Net assemblies. Basically this requires a number of passes, including building classes and members and then emitting IL code for statements, which is mostly a one-to-one mapping.

AST

The smallest building block in the AST is an expression which includes values, variables, member invocations and operators:

type Expr = 
    | Value of Literal
    | Variable of VarName
    | MethodInvoke of MemberName * Arg list
    | PropertyGet of MemberName
    | Cast of TypeName * Expr
    | InfixOp of Expr * string * Expr
    | PrefixOp of string * Expr
    | PostfixOp of Expr * string
    | TernaryOp of Expr * Expr * Expr

Then comes statements including the standard imperative loops and control flow:

type Statement =
    | Assignment of Init
    | PropertySet of MemberName * Expr
    | Action of Expr
    | If of Expr * Block
    | IfElse of Expr * Block * Block
    | Switch of Expr * Case list
    | For of Init list * Condition * Iterator list * Block
    | ForEach of Define * Expr * Block
    | While of Expr * Block
    | DoWhile of Block * Expr
    | Throw of Expr
    | Try of Block
    | Catch of TypeName * Block
    | Finally of Block
    | Lock of Expr * Block    
    | Using of Expr * Block
    | Label of LabelName
    | Goto of LabelName
    | Break
    | Continue
    | Return of Expr

Statements are organized inside members with a myriad of optional modifiers:

// Modifiers
type Access = Public | Private | Protected | Internal
type Modifier = Static | Sealed | Override | Virtual | Abstract
type ParamType = ByValue | ByRef | Out | Params 
// Members
type Member =
    | Field of Access * Modifier option * IsReadOnly * 
               ReturnType * Name * Expr option
    | Property of MemberInfo * Block option * Block option
    | Method of MemberInfo * Param list * Block
    | Constructor of Access * Modifier option * Name * Param list * 
                     PreConstruct option * Block

Members are organized inside types:

type CSharpType = 
    | Class of Access * Modifier option * Name * Implements * Members
    | Struct of Access * Name * Member list
    | Interface of Access * Name * Implements * Member list
    | Enum of Access * TypeName * EnumValue list
    | Delegate of Access * Name * ReturnType * Param list    

Types are optionally organized inside namespaces:

type Import = 
    | Import of Name list
    | Alias of Name * Name list
type NamespaceScope =
    | Namespace of Import list * Name list * NamespaceScope list
    | Types of Import list * CSharpType list

You can see the full source for the AST and parser in the F# snippet.

Parser

The basic structure of the parser is very similar to the Small Basic parser implementation. The main difference I found is that C# has a lot more syntactic noise to support classes.

As an example the following code fragment parses property declarations:

let pget = str_ws "get" >>. pstatementblock
let pset = str_ws "set" >>. pstatementblock
let ppropertyblock =
    between (str_ws "{") (str_ws "}") ((opt pget) .>>. (opt pset))
let pproperty = 
    pipe2 pmemberinfo ppropertyblock
     (fun mi (gblock,sblock) -> Property(mi,gblock,sblock))

Building with FParsec iteratively in F# interactive made light work of sketching out a parser and refining the AST.

Conclusions

In the spirit of build one to throw away from the Mythical Man Month, it feels like the combination of F# and FParsec is ideal for quickly putting together an AST and proving it with a working prototype parser. Which I feel would be a good approach to take if you wanted to build a production C# compiler in a timely manner.

Sketching out an AST and parser has been an interesting, but not overly taxing exercise and has given me a few insights into C#’s syntax. My overriding feeling is that the original layout of the language was more influenced by ease of parsing than ease of use for the developer, with minimal type inference, and the use of constructs like ref and out parameters over support for returning multiple values via tuples.

C#’s var keyword gives simple type inference for local variables, however given the type of the right hand expression is already known inference here seems like a nop.

I don’t plan to take the C# parser any further, there is already a lot of offerings out there in this space. But if you’re interested in understanding how imperative languages can be constructed I think it’s probably a fun exercise to try.

Basic Tuples & Pattern Matching

Over the last couple of weeks I’ve been building my own parser, interpreter and compiler for Small Basic, a dialect of BASIC with only 14 keywords aimed at beginners. Despite, or perhaps because of, Small Basic’s simplicity some really fun programs have been developed, from games like Tetris and 3D Maze to a parser for the language itself.

Small Basic provides primitive types for numbers, strings and associative arrays. There is no syntax provided for structures, but these can be easily modelled with the associative arrays. For example a 3D point can be constructed with named items or ordinals:

Named items Ordinals
Point["X"] = 1.0
Point["Y"] = 2.0
Point["Z"] = 3.0
Point[0] = 1.0
Point[1] = 2.0
Point[2] = 3.0

In languages like Erlang and Python this could be more concisely expressed as a tuple:

Erlang Python
Point = {1.0, 2.0, 3.0}
point = (1.0, 2.0, 3.0)

In fact sophisticated Erlang programs are built entirely from tuples and lists, there is no explicit class or inheritance syntax in the language. Messages can be easily expressed with tuples and behaviour via pattern matching.

Alan Kay, inventor of the Smalltalk language has said:

The notion of object oriented programming is completely misunderstood. It's not about objects and classes, it's all about messages.

In Erlang a hierarchy of shapes can simply be modelled using tuples with atoms for names:

Circle = { circle, 5.0 }
Square = { square, 7.0 }
Rectangle = { rectangle, 10.0, 5.0 }

The area of a shape can be expressed using pattern matching:

area(Shape) ->
  case Shape of
    { circle, R } -> pi() * R * R;
    { square, W } -> W * W;
    { rect, W, H } -> W * H
  end.

Select Case

The Visual Basic family’s Select Case functionality is quite rich. More so than the switch/case statements of the mainstream C dialects: Java, C# and C++, which only match literals.

In Visual Basic it is already possible to match values with literals, conditions or ranges:

Select Case agerange
  Case Is < 16
    MsgBox("Minor")
  Case 16 To 21
    MsgBox("Still Young")
  Case 50 To 64
    MsgBox("Start Lying")
  Case Is > 65
    MsgBox("Be Yourself") 
  Case Else
    MsgBox("Inbetweeners")
End Select

Given that Select Case in VB is already quite expressive, it feels support for tuples and pattern matching over them would feel quite natural in the language.

Extended Small Basic

To this end I have extended my Small Basic parser and compiler implementation with tuple and pattern matching support.

Tuples

Inspiration for construction and deconstruction was taken from F# and Python:

F# Python
let person = ("Phil", 27)
let (name, age) = person
person = ("Phil", 27)
name, age = person

So that tuples use explicit parentheses in the extended Small Basic implementation:

Person = ("Phil", 27)
(Name, Age) = Person

Internally tuples are represented using Small Basic’s built-in associative arrays.

Pattern Matching

First I implemented VB’s Select Case statements, which is not hugely dissimilar to parsing and compiling Small Basic’s If/ElseIf/Else statements.

Then I extended Select Case to support matching tuples with similar functionality to F#:

F# Extended Small Basic
let xor x y =
  match (x,y) with
  | (1,1) -> 0
  | (1,0) -> 1
  | (0,1) -> 1
  | (0,0) -> 0
Function Xor(a,b)
  Select Case (a,b)
    Case (1,1)
      Xor = 0
    Case (1,0)
      Xor = 1
    Case (0,1)
      Xor = 1
    Case (0,0)
      Xor = 0
  EndSelect
EndFunction

Constructing, deconstructing and matching nested tuples is also supported.

Example

Putting it altogether, FizzBuzz can now be expressed in my extended Small Basic implementation with functions, tuples and pattern matching:

Function Mod(Dividend,Divisor)
  Mod = Dividend
  While Mod >= Divisor
    Mod = Mod - Divisor
  EndWhile
EndFunction

Sub Echo(s)
  TextWindow.WriteLine(s)
EndSub

For A = 1 To 100 ' Iterate from 1 to 100
  Select Case (Mod(A,3),Mod(A,5))
    Case (0,0)
      Echo("FizzBuzz")
    Case (0,_)
      Echo("Fizz")
    Case (_,0)
      Echo("Buzz")
    Case Else
      Echo(A)
  EndSelect
EndFor

Conclusions

Extending Small Basic with first class support for tuples was relatively easy, and I feel quite natural in the language. It provides object orientated programming without the need for a verbose class syntax. I think this is something that would probably work pretty well in other BASIC dialects including Visual Basic.

Source code is available on BitBucket: https://bitbucket.org/ptrelford/smallbasiccompiler

Extending Small Basic with Function Procedures

Microsoft Small Basic is a minimal implementation of the BASIC language aimed at beginners, with only 14 keywords.

A few years back I used Small Basic as an early introduction to programming for my 2 young children (at the time 9 and 5). Small Basic’s library has a nice simple API for drawing shapes and moving a turtle around the screen which the kids loved. However I found the lack of function arguments and return values was quite limiting from a teaching perspective. After creating shapes we wanted to refactor the code so that the drawing procedures could be parameterized, which can only be achieved with global state. Not wanting to get my kids in to bad habits early we swiftly moved on to F#.

Coincidentally Small Basic’s library is a .Net assembly that can be easily consumed from C#, F# and VB.Net, which I’ve used once while introducing C# programming to adult beginners.

Inspired by Small Basic’s simplicity, I’ve also built an open source library along similar lines called SmallSharp which I’ve used on occasion to introduce programming in F#. I feel the ability to start drawing shapes on the screen with just a few commands gives a real sense of excitement and achievement. The Processing language is another good option in this space. In contrast large frameworks like WinForms and WPF, although highly customizable, typically require a lot of work up front before you see any results.

In the previous three articles I’ve described steps to building a Small Basic compiler. First creating an abstract syntax tree (AST) with F# discriminated unions, then parsing with the FParsec parser combinator library and finally transforming to CIL with reflection emit.

In this article I’ll give some details on how I’ve extended the compiler with function procedures, bringing the functionality of Small Basic closer to that of VBScript.

The source code including the function procedure extension is available on BitBucket.

Extending the AST

First off the AST must be extended to support function declarations:

    | Sub of identifier * string list
    | EndSub
    // Language extensions
    | Function of identifier * string list
    | EndFunction

Next a way is needed to call functions:

and invoke =
    | Method of string * string * expr list
    | PropertyGet of string * string
    | Call of string * expr list // Language extension

Extending the Parser

Now the parser needs to recognise the new syntax:

let pparams = 
    between (str_ws "(") (str_ws ")") (sepBy pidentifier_ws (str_ws ","))
let pmethod = 
    pidentifier_ws .>>. opt pparams
    |>> (fun (name,ps) -> name, match ps with Some ps -> ps | None -> [])
let pfunction = 
    str_ws1 "Function" >>. pmethod |>> (fun (name,ps) -> Function(name,ps))
let pendfunction = 
    str_ws "EndFunction" |>> (fun _ -> EndFunction)

The Function keyword expects a method declaration which is composed of an identifier and optional parameters between parentheses.

Calls in expressions are recognized as a single identifier with a list of arguments:

let pcall = pidentifier_ws .>>. ptuple 
            |>> (fun (name,args) -> Call(name, args))


Code Generation

The code generator now needs to generate methods for both Sub and Function statements, with void and Primitive return values respectively. The generated methods may also have named arguments, passed by value. When an identifier is referenced in a statement the generator checks the method arguments in precedence to global fields. Finally return values are taken from the field with the same name as the method, as is the convention with the Visual Basic series of languages.

Sample

Here’s FizzBuzz in Small Basic utilizing the new extension:

' Returns Modulus
Function Modulus(Dividend,Divisor)
  Modulus = Dividend
  While Modulus >= Divisor
    Modulus = Modulus - Divisor
  EndWhile
EndFunction

For A = 1 To 100 ' Iterate from 1 to 100
  Mod3 = Modulus(A,3) ' A % 3
  Mod5 = Modulus(A,5) ' A % 5
  If Mod3 = 0 And Mod5 = 0 Then
    TextWindow.WriteLine("FizzBuzz")
  ElseIf Mod3 = 0 Then
    TextWindow.WriteLine("Fizz")
  ElseIf Mod5 = 0 Then
    TextWindow.WriteLine("Buzz")
  Else
    TextWindow.WriteLine(A)
  EndIf
EndFor

Conclusions

Extending the language touched small parts of the AST, parser and code generation steps. The whole process took only a couple of hours.

With the simple addition of function procedures with arguments and return values, Small Basic starts to approach the functionality of VBScript, and feel more like a grown up language for scripting.

It’s also starting to feel like the Small Basic AST and parser could be easily extended to support any of the Visual Basic family of languages, from VBScript to VBA to VB.Net.

Future

Just as Small Basic currently has an export to VB.Net option, another interesting future direction might be to transform Small Basic programs to JavaScript allowing truly cross platform deployment.