Phillip Trelford's Array

POKE 36879,255

NorDevCon 2014

NorDevCon is a one day agile and tech conference held annually in the historic English town of Norwich, also the setting for the recent British comedy Alan Partridge: Alpha Papa.

Last Friday I took the short hop across the Fens to Norwich to talk about Data Science and Machine Learning with F#. First a talk on F# Type Providers entitled All your base types are belong to us (thanks to Ross McKinlay for the meme), and then after lunch a hands on Machine Learning workshop exploring `Titanic passengers using data from Kaggle (a data science company based in San Francisco).

The conference attracted a great range of speakers with some really interesting sessions:

NorDevCont

In the morning I got to chat with Jon Skeet about programming with kids, and then watched him answer stack overflow questions at a frightening pace.

Jason Gorman gave a lively and warm opening keynote on Software Apprenticeships, which included a Skype session with his brave apprentice, Will Price. Immediately followed by Chris O’Dell on Continuous Delivery at 7Digital which ended with a lot of interested questions.

In the afternoon I caught Phil Nash’s thought provoking session on Agile and Mobile – do they work together as well as they should. Phil talked about a schism in testing approaches on mobile platforms, particularly iOS, with some in the community advocating unit testing and TDD and others none at all. Check out Phil’s links from the talk to learn more.

The closing keynote came from the highly respected Nat Pryce and Steve Freeman on Building SOLID Foundations. The talk focused on design principles for addressing complexity in mid-scale codebases. Nat gave examples of successfully taming complexity in an unnamed risk management project using immutability and DSLs, in effect a functional-style approach. An approach that Jessica Kerr also explored in some depth in her popular Functional Principles for Object-Oriented Developers talk at last year’s Øredev.

The day was capped off was a hearty dinner with a fun format where people moved around between the courses.

All your types are belong to us!

My first talk show cased typed access to a vast array of data sources through to software environments like Hadoop and R, via F# Type Providers:


Type Providers covered:

  • JSON, CSV, HTML Tables and the World Bank (via FSharp.Data)
  • SQLite (via Ross’s SQL Provider which also supports MySQL, Oracle, Postgre & SQL Server)
  • R (via BlueMountain Capital’s R-Type Provider)
  • Hadoop (in the browser via Try FSharp)

All of the Type Providers shown are open source projects developed by the F# community, can be easily integrated into projects via Nuget and run on Linux, Mac and Windows.

The HTML Table type provider being the most recent, developed by Colin Bull, it gives immediate typed access to data in tables on web pages.

After the talk Jon Skeet suggested a Protocol Buffers Type Provider:


For which it turns out Cameron Taggart already has a project called Froto.

If you are interested in creating your own Type Provider I’d recommend reading Michael Newtons’s Type Provider tutorial. Michael will be running a free workshop on building Type Providers at the F#unctional Londoners meetup on May 1st.

Hands On Machine Learning workshop

This session gave an introduction to machine learning, using F#’s REPL and the CSV Type Provider to easily explore data on Titanic and predict survival outcomes:


It was a great group and everyone managed to complete the task and produce good prediction results using decision tree learning in just 1.5 hours.

Wrapping Up

Thanks to Paul Grenyer for organizing an excellent conference and giving me the opportunity to speak. Paul put together a really interesting programme which was extremely well-executed.

A++ would recommend :)

Parsing C#

C# is a mainstream programming language based on Java, that runs on Windows via .Net and cross platform via Mono. On .Net , C# is used primarily in the enterprise for web and desktop applications. With Mono, C# can target a range of popular platforms including iOS and Andriod, and is primarily employed in mobile application development via Xamarin’s tools and game development via Unity.

Microsoft’s current shipping C# compiler is proprietary code written in C++. Mono’s C# compiler, is open source and written in C#. In 2008 Microsoft announced the development of a new proprietary C# compiler written in C#, codenamed Roslyn, with the last preview released back in 2012. One of Roslyn’s key aims to expose the output of the parser as an abstract syntax tree (AST) that can be inspected by third party developers for tooling.

To fill the void, it feels like every man and his dog has written a parser for C#, most commonly to provide developer productivity tools in Visual Studio.

Third party vendors with C# parsers include:

On the open source side, Irony is a particularly interesting project allowing grammars to be specified in C# and includes samples for parsing many languages including C# and Java.

Parsing C#

You can think of C# and Java as managed C with classes. That is C# at it’s core is an imperative programming language, composed of statement blocks with additional syntax for class declaration and a dot operator for invoking members on a class.

Under the hood instance members simply take the instance as the first argument:

C C#
struct Point { 
  float X; float Y; 
};
float length(Point p) {
  return sqrt(p.X*p.X + p.Y*p.Y);
}
class Point {
  float X; float Y;
  public float Length() {
    return Math.Sqrt(X * X + Y * Y);
  }
}

The dot operator is probably *the* killer feature of class based languages like Java and C# as it allows developers to explore existing APIs via code completion in an IDE.

Parsing C# statements is not hugely dissimilar to parsing Small Basic statements which I have covered in a recent article. The only real difference is that C style languages typically employ semicolons between statements and curly braces to delimit statement blocks.

Parsing C# files involves finding using directives and namespace blocks. Then inside the namespace blocks finding type blocks including class, interface, struct, enum and delegate constructs. Inside the class blocks the member fields, properties, methods and constructors. Finally inside the members the statement blocks.

A few weeks back I was stuck on a a slow running train for a couple of hours, and had a go at writing a simple C# parser in F# using discriminated unions for the abstract syntax tree (AST) and FParsec for parsing. The same approach I’d used to parse Small Basic. I managed to sketch out a rough AST and a parser for namespaces and class declarations with around 100 lines of code. After a few more train journeys I managed to cover members and statement blocks, getting close to parsing C# 1.1. The prototype C# AST and parser is available as an F# snippet that can be run as a script inside F# interactive.

As an aside Neil Danson has recently constructed a compiler for the AST in F# that builds .Net assemblies. Basically this requires a number of passes, including building classes and members and then emitting IL code for statements, which is mostly a one-to-one mapping.

AST

The smallest building block in the AST is an expression which includes values, variables, member invocations and operators:

type Expr = 
    | Value of Literal
    | Variable of VarName
    | MethodInvoke of MemberName * Arg list
    | PropertyGet of MemberName
    | Cast of TypeName * Expr
    | InfixOp of Expr * string * Expr
    | PrefixOp of string * Expr
    | PostfixOp of Expr * string
    | TernaryOp of Expr * Expr * Expr

Then comes statements including the standard imperative loops and control flow:

type Statement =
    | Assignment of Init
    | PropertySet of MemberName * Expr
    | Action of Expr
    | If of Expr * Block
    | IfElse of Expr * Block * Block
    | Switch of Expr * Case list
    | For of Init list * Condition * Iterator list * Block
    | ForEach of Define * Expr * Block
    | While of Expr * Block
    | DoWhile of Block * Expr
    | Throw of Expr
    | Try of Block
    | Catch of TypeName * Block
    | Finally of Block
    | Lock of Expr * Block    
    | Using of Expr * Block
    | Label of LabelName
    | Goto of LabelName
    | Break
    | Continue
    | Return of Expr

Statements are organized inside members with a myriad of optional modifiers:

// Modifiers
type Access = Public | Private | Protected | Internal
type Modifier = Static | Sealed | Override | Virtual | Abstract
type ParamType = ByValue | ByRef | Out | Params 
// Members
type Member =
    | Field of Access * Modifier option * IsReadOnly * 
               ReturnType * Name * Expr option
    | Property of MemberInfo * Block option * Block option
    | Method of MemberInfo * Param list * Block
    | Constructor of Access * Modifier option * Name * Param list * 
                     PreConstruct option * Block

Members are organized inside types:

type CSharpType = 
    | Class of Access * Modifier option * Name * Implements * Members
    | Struct of Access * Name * Member list
    | Interface of Access * Name * Implements * Member list
    | Enum of Access * TypeName * EnumValue list
    | Delegate of Access * Name * ReturnType * Param list    

Types are optionally organized inside namespaces:

type Import = 
    | Import of Name list
    | Alias of Name * Name list
type NamespaceScope =
    | Namespace of Import list * Name list * NamespaceScope list
    | Types of Import list * CSharpType list

You can see the full source for the AST and parser in the F# snippet.

Parser

The basic structure of the parser is very similar to the Small Basic parser implementation. The main difference I found is that C# has a lot more syntactic noise to support classes.

As an example the following code fragment parses property declarations:

let pget = str_ws "get" >>. pstatementblock
let pset = str_ws "set" >>. pstatementblock
let ppropertyblock =
    between (str_ws "{") (str_ws "}") ((opt pget) .>>. (opt pset))
let pproperty = 
    pipe2 pmemberinfo ppropertyblock
     (fun mi (gblock,sblock) -> Property(mi,gblock,sblock))

Building with FParsec iteratively in F# interactive made light work of sketching out a parser and refining the AST.

Conclusions

In the spirit of build one to throw away from the Mythical Man Month, it feels like the combination of F# and FParsec is ideal for quickly putting together an AST and proving it with a working prototype parser. Which I feel would be a good approach to take if you wanted to build a production C# compiler in a timely manner.

Sketching out an AST and parser has been an interesting, but not overly taxing exercise and has given me a few insights into C#’s syntax. My overriding feeling is that the original layout of the language was more influenced by ease of parsing than ease of use for the developer, with minimal type inference, and the use of constructs like ref and out parameters over support for returning multiple values via tuples.

C#’s var keyword gives simple type inference for local variables, however given the type of the right hand expression is already known inference here seems like a nop.

I don’t plan to take the C# parser any further, there is already a lot of offerings out there in this space. But if you’re interested in understanding how imperative languages can be constructed I think it’s probably a fun exercise to try.

Extending Small Basic with Function Procedures

Microsoft Small Basic is a minimal implementation of the BASIC language aimed at beginners, with only 14 keywords.

A few years back I used Small Basic as an early introduction to programming for my 2 young children (at the time 9 and 5). Small Basic’s library has a nice simple API for drawing shapes and moving a turtle around the screen which the kids loved. However I found the lack of function arguments and return values was quite limiting from a teaching perspective. After creating shapes we wanted to refactor the code so that the drawing procedures could be parameterized, which can only be achieved with global state. Not wanting to get my kids in to bad habits early we swiftly moved on to F#.

Coincidentally Small Basic’s library is a .Net assembly that can be easily consumed from C#, F# and VB.Net, which I’ve used once while introducing C# programming to adult beginners.

Inspired by Small Basic’s simplicity, I’ve also built an open source library along similar lines called SmallSharp which I’ve used on occasion to introduce programming in F#. I feel the ability to start drawing shapes on the screen with just a few commands gives a real sense of excitement and achievement. The Processing language is another good option in this space. In contrast large frameworks like WinForms and WPF, although highly customizable, typically require a lot of work up front before you see any results.

In the previous three articles I’ve described steps to building a Small Basic compiler. First creating an abstract syntax tree (AST) with F# discriminated unions, then parsing with the FParsec parser combinator library and finally transforming to CIL with reflection emit.

In this article I’ll give some details on how I’ve extended the compiler with function procedures, bringing the functionality of Small Basic closer to that of VBScript.

The source code including the function procedure extension is available on BitBucket.

Extending the AST

First off the AST must be extended to support function declarations:

    | Sub of identifier * string list
    | EndSub
    // Language extensions
    | Function of identifier * string list
    | EndFunction

Next a way is needed to call functions:

and invoke =
    | Method of string * string * expr list
    | PropertyGet of string * string
    | Call of string * expr list // Language extension

Extending the Parser

Now the parser needs to recognise the new syntax:

let pparams = 
    between (str_ws "(") (str_ws ")") (sepBy pidentifier_ws (str_ws ","))
let pmethod = 
    pidentifier_ws .>>. opt pparams
    |>> (fun (name,ps) -> name, match ps with Some ps -> ps | None -> [])
let pfunction = 
    str_ws1 "Function" >>. pmethod |>> (fun (name,ps) -> Function(name,ps))
let pendfunction = 
    str_ws "EndFunction" |>> (fun _ -> EndFunction)

The Function keyword expects a method declaration which is composed of an identifier and optional parameters between parentheses.

Calls in expressions are recognized as a single identifier with a list of arguments:

let pcall = pidentifier_ws .>>. ptuple 
            |>> (fun (name,args) -> Call(name, args))


Code Generation

The code generator now needs to generate methods for both Sub and Function statements, with void and Primitive return values respectively. The generated methods may also have named arguments, passed by value. When an identifier is referenced in a statement the generator checks the method arguments in precedence to global fields. Finally return values are taken from the field with the same name as the method, as is the convention with the Visual Basic series of languages.

Sample

Here’s FizzBuzz in Small Basic utilizing the new extension:

' Returns Modulus
Function Modulus(Dividend,Divisor)
  Modulus = Dividend
  While Modulus >= Divisor
    Modulus = Modulus - Divisor
  EndWhile
EndFunction

For A = 1 To 100 ' Iterate from 1 to 100
  Mod3 = Modulus(A,3) ' A % 3
  Mod5 = Modulus(A,5) ' A % 5
  If Mod3 = 0 And Mod5 = 0 Then
    TextWindow.WriteLine("FizzBuzz")
  ElseIf Mod3 = 0 Then
    TextWindow.WriteLine("Fizz")
  ElseIf Mod5 = 0 Then
    TextWindow.WriteLine("Buzz")
  Else
    TextWindow.WriteLine(A)
  EndIf
EndFor

Conclusions

Extending the language touched small parts of the AST, parser and code generation steps. The whole process took only a couple of hours.

With the simple addition of function procedures with arguments and return values, Small Basic starts to approach the functionality of VBScript, and feel more like a grown up language for scripting.

It’s also starting to feel like the Small Basic AST and parser could be easily extended to support any of the Visual Basic family of languages, from VBScript to VBA to VB.Net.

Future

Just as Small Basic currently has an export to VB.Net option, another interesting future direction might be to transform Small Basic programs to JavaScript allowing truly cross platform deployment.