Phillip Trelford's Array

POKE 36879,255

NorDevCon 2014

NorDevCon is a one day agile and tech conference held annually in the historic English town of Norwich, also the setting for the recent British comedy Alan Partridge: Alpha Papa.

Last Friday I took the short hop across the Fens to Norwich to talk about Data Science and Machine Learning with F#. First a talk on F# Type Providers entitled All your base types are belong to us (thanks to Ross McKinlay for the meme), and then after lunch a hands on Machine Learning workshop exploring `Titanic passengers using data from Kaggle (a data science company based in San Francisco).

The conference attracted a great range of speakers with some really interesting sessions:

NorDevCont

In the morning I got to chat with Jon Skeet about programming with kids, and then watched him answer stack overflow questions at a frightening pace.

Jason Gorman gave a lively and warm opening keynote on Software Apprenticeships, which included a Skype session with his brave apprentice, Will Price. Immediately followed by Chris O’Dell on Continuous Delivery at 7Digital which ended with a lot of interested questions.

In the afternoon I caught Phil Nash’s thought provoking session on Agile and Mobile – do they work together as well as they should. Phil talked about a schism in testing approaches on mobile platforms, particularly iOS, with some in the community advocating unit testing and TDD and others none at all. Check out Phil’s links from the talk to learn more.

The closing keynote came from the highly respected Nat Pryce and Steve Freeman on Building SOLID Foundations. The talk focused on design principles for addressing complexity in mid-scale codebases. Nat gave examples of successfully taming complexity in an unnamed risk management project using immutability and DSLs, in effect a functional-style approach. An approach that Jessica Kerr also explored in some depth in her popular Functional Principles for Object-Oriented Developers talk at last year’s Øredev.

The day was capped off was a hearty dinner with a fun format where people moved around between the courses.

All your types are belong to us!

My first talk show cased typed access to a vast array of data sources through to software environments like Hadoop and R, via F# Type Providers:


Type Providers covered:

  • JSON, CSV, HTML Tables and the World Bank (via FSharp.Data)
  • SQLite (via Ross’s SQL Provider which also supports MySQL, Oracle, Postgre & SQL Server)
  • R (via BlueMountain Capital’s R-Type Provider)
  • Hadoop (in the browser via Try FSharp)

All of the Type Providers shown are open source projects developed by the F# community, can be easily integrated into projects via Nuget and run on Linux, Mac and Windows.

The HTML Table type provider being the most recent, developed by Colin Bull, it gives immediate typed access to data in tables on web pages.

After the talk Jon Skeet suggested a Protocol Buffers Type Provider:


For which it turns out Cameron Taggart already has a project called Froto.

If you are interested in creating your own Type Provider I’d recommend reading Michael Newtons’s Type Provider tutorial. Michael will be running a free workshop on building Type Providers at the F#unctional Londoners meetup on May 1st.

Hands On Machine Learning workshop

This session gave an introduction to machine learning, using F#’s REPL and the CSV Type Provider to easily explore data on Titanic and predict survival outcomes:


It was a great group and everyone managed to complete the task and produce good prediction results using decision tree learning in just 1.5 hours.

Wrapping Up

Thanks to Paul Grenyer for organizing an excellent conference and giving me the opportunity to speak. Paul put together a really interesting programme which was extremely well-executed.

A++ would recommend :)

F# Eye for the VB Guy

Are you a VB developer curious about functional-first programming. F# is a statically typed language built into Visual Studio. It is a multi-paradigm language with both functional and object-oriented constructs.

F# has powerful type inference which reduces the amount of typing you need to do without reducing performance or correctness.

F# projects are easily referenced from VB and vice versa. Like VB, F# makes minimal use of curly braces, and for many operations the syntax will feel quite familiar.

Here’s my cut-out-and-keep guide to common operations in both languages:

Declaring values

VB.Net F#
' Fully qualified
Dim greeting As String = "Hello"
' Type inferred
Dim greeting = "Hello"
// Fully qualified 
let greeting : string = "Hello"
// Type inferred 
let greeting = "Hello"

 

Declaring functions

VB.Net F#
Sub Print(message As String)
  Console.WriteLine(message)
End Sub

Function Add _
  (a As Integer, b As Integer) _
  As Integer
  Return a + b
End Function
let print( message: string ) = 
  Console.WriteLine(message) 

// Return type is inferred
let add(a:int, b:int) = a + b

 

Loops

VB.Net F#
For i = 1 To 10
  Console.WriteLine("Hello")
Next

For Each c In "Hello"
  Console.WriteLine(c)
Next
for i = 1 to 10 do 
   Console.WriteLine("Hello") 


for c in "Hello" do 
  Console.WriteLine(c)

 

If/Then/Else

VB.Net F#
Dim ageGroup As String
If age < 18 Then
  ageGroup = "Junior"
Else
  ageGroup = "Senior"
End If
let ageGroup = 
  if age < 18 then  
    "Junior"
  else  
    "Senior"
 

 

Pattern Matching

VB.Net F#
' Score Scrabble letter
Select Case
c Case "A", "E", "I", "L", "N", _ "O", "R", "S", "T", "U" Return 1 Case "D", "G" Return 2 Case "B", "C", "M", "P" Return 3 Case "F", "H", "V", "W", "Y" Return 4 Case "K" Return 5 Case "J", "X" Return 8 Case "Q", "Z" Return 10 Case Else Throw New _
      InvalidOperationException()
  End Select
// Score scrabble letter
match letter with
| 'A' | 'E' | 'I' | 'L' | 'N' 
| 'O' | 'R' | 'S' | 'T' | 'U' -> 1
| 'D' | 'G' -> 2
| 'B' | 'C' | 'M' | 'P' -> 3
| 'F' | 'H' | 'V' | 'W' | 'Y' -> 4
| 'K' -> 5
| 'J' | 'X' -> 8
| 'Q' | 'Z' -> 10
| _ -> invalidOp ""

 

Try/Catch

VB.Net F#
Dim i As Integer = 5
Try
  Throw New ArgumentException()
Catch e As OverflowException _
      When i = 5
  Console.WriteLine("First handler")
Catch e As ArgumentException _
      When i = 4
  Console.WriteLine("Second handler")
Catch When i = 5
  Console.WriteLine("Third handler")
End Try
let i = 5
try
  raise (ArgumentException())
with
| :? OverflowException when i = 5 ->
  Console.WriteLine("First handler")
| :? ArgumentException when i = 4 ->
  Console.WriteLine("Second handler")
| _ when i = 5 ->
  Console.WriteLine("Third handler")

 

Modules

VB.Net F#
Module Math
  ' Raise to integer power
  Function Pown( _
    x As Double, y As Integer)
    Dim result = 1
    For i = 1 To y
      result = result * x
    Next
    Return result
  End Function
End Module
module Maths =
  // Raise to integer power
 let pown (x:float,y:int) =
   let mutable result = 1.0
   for i = 1 to y do
     result <- result * x
   result

 

Classes

VB.Net F#
' Immutable class
Public Class Person
  Private ReadOnly myName As String

  Public Sub New(name As String)
    myName = name
  End Sub

  ReadOnly Property Name() As String
    Get
      Return myName
    End Get
  End Property
End Class

' Inheritance
Public Class MyWindow _
  : Inherits Window
End Class
// Immutable class
type Person (name : string) = 
  member my.Name = name

// Inheritance
type MyWindow() =
   inherit Window()

 

Summary

Interested in learning more? Give F# a try in Visual Studio with the built-in F# Tutorial project or in your browser with http://tryfsharp.org

Parsing C#

C# is a mainstream programming language based on Java, that runs on Windows via .Net and cross platform via Mono. On .Net , C# is used primarily in the enterprise for web and desktop applications. With Mono, C# can target a range of popular platforms including iOS and Andriod, and is primarily employed in mobile application development via Xamarin’s tools and game development via Unity.

Microsoft’s current shipping C# compiler is proprietary code written in C++. Mono’s C# compiler, is open source and written in C#. In 2008 Microsoft announced the development of a new proprietary C# compiler written in C#, codenamed Roslyn, with the last preview released back in 2012. One of Roslyn’s key aims to expose the output of the parser as an abstract syntax tree (AST) that can be inspected by third party developers for tooling.

To fill the void, it feels like every man and his dog has written a parser for C#, most commonly to provide developer productivity tools in Visual Studio.

Third party vendors with C# parsers include:

On the open source side, Irony is a particularly interesting project allowing grammars to be specified in C# and includes samples for parsing many languages including C# and Java.

Parsing C#

You can think of C# and Java as managed C with classes. That is C# at it’s core is an imperative programming language, composed of statement blocks with additional syntax for class declaration and a dot operator for invoking members on a class.

Under the hood instance members simply take the instance as the first argument:

C C#
struct Point { 
  float X; float Y; 
};
float length(Point p) {
  return sqrt(p.X*p.X + p.Y*p.Y);
}
class Point {
  float X; float Y;
  public float Length() {
    return Math.Sqrt(X * X + Y * Y);
  }
}

The dot operator is probably *the* killer feature of class based languages like Java and C# as it allows developers to explore existing APIs via code completion in an IDE.

Parsing C# statements is not hugely dissimilar to parsing Small Basic statements which I have covered in a recent article. The only real difference is that C style languages typically employ semicolons between statements and curly braces to delimit statement blocks.

Parsing C# files involves finding using directives and namespace blocks. Then inside the namespace blocks finding type blocks including class, interface, struct, enum and delegate constructs. Inside the class blocks the member fields, properties, methods and constructors. Finally inside the members the statement blocks.

A few weeks back I was stuck on a a slow running train for a couple of hours, and had a go at writing a simple C# parser in F# using discriminated unions for the abstract syntax tree (AST) and FParsec for parsing. The same approach I’d used to parse Small Basic. I managed to sketch out a rough AST and a parser for namespaces and class declarations with around 100 lines of code. After a few more train journeys I managed to cover members and statement blocks, getting close to parsing C# 1.1. The prototype C# AST and parser is available as an F# snippet that can be run as a script inside F# interactive.

As an aside Neil Danson has recently constructed a compiler for the AST in F# that builds .Net assemblies. Basically this requires a number of passes, including building classes and members and then emitting IL code for statements, which is mostly a one-to-one mapping.

AST

The smallest building block in the AST is an expression which includes values, variables, member invocations and operators:

type Expr = 
    | Value of Literal
    | Variable of VarName
    | MethodInvoke of MemberName * Arg list
    | PropertyGet of MemberName
    | Cast of TypeName * Expr
    | InfixOp of Expr * string * Expr
    | PrefixOp of string * Expr
    | PostfixOp of Expr * string
    | TernaryOp of Expr * Expr * Expr

Then comes statements including the standard imperative loops and control flow:

type Statement =
    | Assignment of Init
    | PropertySet of MemberName * Expr
    | Action of Expr
    | If of Expr * Block
    | IfElse of Expr * Block * Block
    | Switch of Expr * Case list
    | For of Init list * Condition * Iterator list * Block
    | ForEach of Define * Expr * Block
    | While of Expr * Block
    | DoWhile of Block * Expr
    | Throw of Expr
    | Try of Block
    | Catch of TypeName * Block
    | Finally of Block
    | Lock of Expr * Block    
    | Using of Expr * Block
    | Label of LabelName
    | Goto of LabelName
    | Break
    | Continue
    | Return of Expr

Statements are organized inside members with a myriad of optional modifiers:

// Modifiers
type Access = Public | Private | Protected | Internal
type Modifier = Static | Sealed | Override | Virtual | Abstract
type ParamType = ByValue | ByRef | Out | Params 
// Members
type Member =
    | Field of Access * Modifier option * IsReadOnly * 
               ReturnType * Name * Expr option
    | Property of MemberInfo * Block option * Block option
    | Method of MemberInfo * Param list * Block
    | Constructor of Access * Modifier option * Name * Param list * 
                     PreConstruct option * Block

Members are organized inside types:

type CSharpType = 
    | Class of Access * Modifier option * Name * Implements * Members
    | Struct of Access * Name * Member list
    | Interface of Access * Name * Implements * Member list
    | Enum of Access * TypeName * EnumValue list
    | Delegate of Access * Name * ReturnType * Param list    

Types are optionally organized inside namespaces:

type Import = 
    | Import of Name list
    | Alias of Name * Name list
type NamespaceScope =
    | Namespace of Import list * Name list * NamespaceScope list
    | Types of Import list * CSharpType list

You can see the full source for the AST and parser in the F# snippet.

Parser

The basic structure of the parser is very similar to the Small Basic parser implementation. The main difference I found is that C# has a lot more syntactic noise to support classes.

As an example the following code fragment parses property declarations:

let pget = str_ws "get" >>. pstatementblock
let pset = str_ws "set" >>. pstatementblock
let ppropertyblock =
    between (str_ws "{") (str_ws "}") ((opt pget) .>>. (opt pset))
let pproperty = 
    pipe2 pmemberinfo ppropertyblock
     (fun mi (gblock,sblock) -> Property(mi,gblock,sblock))

Building with FParsec iteratively in F# interactive made light work of sketching out a parser and refining the AST.

Conclusions

In the spirit of build one to throw away from the Mythical Man Month, it feels like the combination of F# and FParsec is ideal for quickly putting together an AST and proving it with a working prototype parser. Which I feel would be a good approach to take if you wanted to build a production C# compiler in a timely manner.

Sketching out an AST and parser has been an interesting, but not overly taxing exercise and has given me a few insights into C#’s syntax. My overriding feeling is that the original layout of the language was more influenced by ease of parsing than ease of use for the developer, with minimal type inference, and the use of constructs like ref and out parameters over support for returning multiple values via tuples.

C#’s var keyword gives simple type inference for local variables, however given the type of the right hand expression is already known inference here seems like a nop.

I don’t plan to take the C# parser any further, there is already a lot of offerings out there in this space. But if you’re interested in understanding how imperative languages can be constructed I think it’s probably a fun exercise to try.