Phillip Trelford's Array

POKE 36879,255

Parsing C#

C# is a mainstream programming language based on Java, that runs on Windows via .Net and cross platform via Mono. On .Net , C# is used primarily in the enterprise for web and desktop applications. With Mono, C# can target a range of popular platforms including iOS and Andriod, and is primarily employed in mobile application development via Xamarin’s tools and game development via Unity.

Microsoft’s current shipping C# compiler is proprietary code written in C++. Mono’s C# compiler, is open source and written in C#. In 2008 Microsoft announced the development of a new proprietary C# compiler written in C#, codenamed Roslyn, with the last preview released back in 2012. One of Roslyn’s key aims to expose the output of the parser as an abstract syntax tree (AST) that can be inspected by third party developers for tooling.

To fill the void, it feels like every man and his dog has written a parser for C#, most commonly to provide developer productivity tools in Visual Studio.

Third party vendors with C# parsers include:

On the open source side, Irony is a particularly interesting project allowing grammars to be specified in C# and includes samples for parsing many languages including C# and Java.

Parsing C#

You can think of C# and Java as managed C with classes. That is C# at it’s core is an imperative programming language, composed of statement blocks with additional syntax for class declaration and a dot operator for invoking members on a class.

Under the hood instance members simply take the instance as the first argument:

C C#
struct Point { 
  float X; float Y; 
};
float length(Point p) {
  return sqrt(p.X*p.X + p.Y*p.Y);
}
class Point {
  float X; float Y;
  public float Length() {
    return Math.Sqrt(X * X + Y * Y);
  }
}

The dot operator is probably *the* killer feature of class based languages like Java and C# as it allows developers to explore existing APIs via code completion in an IDE.

Parsing C# statements is not hugely dissimilar to parsing Small Basic statements which I have covered in a recent article. The only real difference is that C style languages typically employ semicolons between statements and curly braces to delimit statement blocks.

Parsing C# files involves finding using directives and namespace blocks. Then inside the namespace blocks finding type blocks including class, interface, struct, enum and delegate constructs. Inside the class blocks the member fields, properties, methods and constructors. Finally inside the members the statement blocks.

A few weeks back I was stuck on a a slow running train for a couple of hours, and had a go at writing a simple C# parser in F# using discriminated unions for the abstract syntax tree (AST) and FParsec for parsing. The same approach I’d used to parse Small Basic. I managed to sketch out a rough AST and a parser for namespaces and class declarations with around 100 lines of code. After a few more train journeys I managed to cover members and statement blocks, getting close to parsing C# 1.1. The prototype C# AST and parser is available as an F# snippet that can be run as a script inside F# interactive.

As an aside Neil Danson has recently constructed a compiler for the AST in F# that builds .Net assemblies. Basically this requires a number of passes, including building classes and members and then emitting IL code for statements, which is mostly a one-to-one mapping.

AST

The smallest building block in the AST is an expression which includes values, variables, member invocations and operators:

type Expr = 
    | Value of Literal
    | Variable of VarName
    | MethodInvoke of MemberName * Arg list
    | PropertyGet of MemberName
    | Cast of TypeName * Expr
    | InfixOp of Expr * string * Expr
    | PrefixOp of string * Expr
    | PostfixOp of Expr * string
    | TernaryOp of Expr * Expr * Expr

Then comes statements including the standard imperative loops and control flow:

type Statement =
    | Assignment of Init
    | PropertySet of MemberName * Expr
    | Action of Expr
    | If of Expr * Block
    | IfElse of Expr * Block * Block
    | Switch of Expr * Case list
    | For of Init list * Condition * Iterator list * Block
    | ForEach of Define * Expr * Block
    | While of Expr * Block
    | DoWhile of Block * Expr
    | Throw of Expr
    | Try of Block
    | Catch of TypeName * Block
    | Finally of Block
    | Lock of Expr * Block    
    | Using of Expr * Block
    | Label of LabelName
    | Goto of LabelName
    | Break
    | Continue
    | Return of Expr

Statements are organized inside members with a myriad of optional modifiers:

// Modifiers
type Access = Public | Private | Protected | Internal
type Modifier = Static | Sealed | Override | Virtual | Abstract
type ParamType = ByValue | ByRef | Out | Params 
// Members
type Member =
    | Field of Access * Modifier option * IsReadOnly * 
               ReturnType * Name * Expr option
    | Property of MemberInfo * Block option * Block option
    | Method of MemberInfo * Param list * Block
    | Constructor of Access * Modifier option * Name * Param list * 
                     PreConstruct option * Block

Members are organized inside types:

type CSharpType = 
    | Class of Access * Modifier option * Name * Implements * Members
    | Struct of Access * Name * Member list
    | Interface of Access * Name * Implements * Member list
    | Enum of Access * TypeName * EnumValue list
    | Delegate of Access * Name * ReturnType * Param list    

Types are optionally organized inside namespaces:

type Import = 
    | Import of Name list
    | Alias of Name * Name list
type NamespaceScope =
    | Namespace of Import list * Name list * NamespaceScope list
    | Types of Import list * CSharpType list

You can see the full source for the AST and parser in the F# snippet.

Parser

The basic structure of the parser is very similar to the Small Basic parser implementation. The main difference I found is that C# has a lot more syntactic noise to support classes.

As an example the following code fragment parses property declarations:

let pget = str_ws "get" >>. pstatementblock
let pset = str_ws "set" >>. pstatementblock
let ppropertyblock =
    between (str_ws "{") (str_ws "}") ((opt pget) .>>. (opt pset))
let pproperty = 
    pipe2 pmemberinfo ppropertyblock
     (fun mi (gblock,sblock) -> Property(mi,gblock,sblock))

Building with FParsec iteratively in F# interactive made light work of sketching out a parser and refining the AST.

Conclusions

In the spirit of build one to throw away from the Mythical Man Month, it feels like the combination of F# and FParsec is ideal for quickly putting together an AST and proving it with a working prototype parser. Which I feel would be a good approach to take if you wanted to build a production C# compiler in a timely manner.

Sketching out an AST and parser has been an interesting, but not overly taxing exercise and has given me a few insights into C#’s syntax. My overriding feeling is that the original layout of the language was more influenced by ease of parsing than ease of use for the developer, with minimal type inference, and the use of constructs like ref and out parameters over support for returning multiple values via tuples.

C#’s var keyword gives simple type inference for local variables, however given the type of the right hand expression is already known inference here seems like a nop.

I don’t plan to take the C# parser any further, there is already a lot of offerings out there in this space. But if you’re interested in understanding how imperative languages can be constructed I think it’s probably a fun exercise to try.

Progressive F# Tutorials London 2013

Last week’s Progressive F# tutorials conference seems to have struck a note:

Progressive F# 2013 best conference ever. Fact. #progfsharp

A big thanks to Skills Matter for organizing this 2-day hands on community event:

progfsharp-2013-800x300px1

You can see pictures from the event on Skills Matter’s Facebook photo stream:

Progressive F# Tutorials 2013 - Jon, Zach, Martin and Gustavo Progressive F# Tutorials 2013 - Coding in The Crypt
Progressive F# Tutorials 2013 - Coding Progressive F# Tutorials 2013 - Practising the Koans

 

Following on from events in 2011 and 2012 this year we welcomed 79 F#ers:

#progFsharp seems to grow in size every year :)

If you’d like to join us next year there’s currently an early bird discount:

If you enjoyed #progfsharp this year, why not get in early & get your ticket for next year? Tickets just £95 http://ow.ly/qoV5Y

This year’s F# Tutorials ran over 2 days with 2 tracks of hands on sessions.

Day 1

Jon Harrop kicked off proceedings with a talk on using F# to change the way we work:

Progressive FSharp Tutorials - Using F# to change the way we work

Followed by an F# Koans session with Rachel Reese, Dan Mohl and Robert Pickering. The F# Koans are a great way to get started with F#.

At the same time we ran a Programming with the Stars session with Rich Minerich and volunteer Michael Newton. This year‘s task was the Bank OCR Kata:

 1: let zeroToNineText = """
 2:  _     _  _     _  _  _  _  _ 
 3: | |  | _| _||_||_ |_   ||_||_|
 4: |_|  ||_  _|  | _||_|  ||_| _|                          
 5: """
 6: 
 7: let getDigits (source:string) len =
 8:     let lines = source.Split([|'\n'|]).[1..3]
 9:     [for digit in 0..len-1 ->
10:         let index = digit*3
11:         [for line in lines -> line.[index..index+2]]
12:     ]
13: 
14: let zeroToNine = getDigits zeroToNineText 10
15: 
16: let toNumber text =
17:     getDigits text 9
18:     |> List.fold (fun acc digit ->
19:         let n = zeroToNine |> List.findIndex ((=) digit)
20:         acc * 10 + n
21:     ) 0
22:         
23: let accountText = """
24:     _  _  _  _  _  _     _ 
25: |_||_|| || ||_   |  |  ||_ 
26:   | _||_||_||_|  |  |  | _| 
27: """
28: 
29: let n = toNumber accountText
val zeroToNineText : 'a

Full name: Snippet.zeroToNineText
Multiple items
val string : 'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------
type string = System.String

Full name: Microsoft.FSharp.Core.string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>
Multiple items
module List

from Microsoft.FSharp.Collections

--------------------
type List<'T> =
  | ( [] )
  | ( :: ) of 'T * 'T list
  with
    interface System.Collections.IEnumerable
    interface System.Collections.Generic.IEnumerable<'T>
    member Head : 'T
    member IsEmpty : bool
    member Item : index:int -> 'T with get
    member Length : int
    member Tail : 'T list
    static member Cons : head:'T * tail:'T list -> 'T list
    static member Empty : 'T list
  end

Full name: Microsoft.FSharp.Collections.List<_>

  type: List<'T>
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<List<'T>>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.Generic.IEnumerable<'T>
  implements: System.Collections.IEnumerable
val fold : ('State -> 'T -> 'State) -> 'State -> 'T list -> 'State

Full name: Microsoft.FSharp.Collections.List.fold
val findIndex : ('T -> bool) -> 'T list -> int

Full name: Microsoft.FSharp.Collections.List.findIndex

The code above was my attempt, check out more solutions on F# Snippets:

In the afternoon Rachel Reese ran a hands on session on Try F# from Zero to Data Science:

Looks like we broke http://tryfsharp.org during @rachelreese great workshop session ;) #progfsharp

Thought - @rachelreese should rename her session to DOSing around on TryFSharp ;) #progfsharp :D

Meanwhile Robert Pickering’s tutorial was on programmatically generating sounds and music with Undertone.

Playing with Undertone with @alanmgreen and @DaveHoganIW at #progfsharp... did I mention I know nothing about music! :> /cc: @robertpi

After some pizza and drinks, the coding spilt over to the local pub:

Progressive FSharp Tutorials - Don and Dan

And by all accounts the fun continued on through the night:

Progressive F# Tutorials 2013 - Halloween Fun

Day 2

In the morning there was a choice between Machine Learning for Fun and Profit with Matt Moloney and Time for Functions in the Enterprise with Simon Cousins (which was my 11yo son’s favourite session!).

In the Machine Learning session we used Deep Belief Networks to categorize digits and to predict fluctuations on the Bitcoin/USD exchange rate.

The code sample and tasks are here: http://trelford.com/ProgFSharpML.zip

Simon’s slides and samples are here: https://github.com/simontcousins/pft2013

In the afternoon Dan Mohl and Zach Bray led a tutorial on Web Programming while Rich Minerich ran a Pacman AI competition.

Dan and Zach’s slides and tasks: https://github.com/dmohl/FsOnTheWeb-Workshop

Here’s a screenshot from the Pacman session:

MattDrivenDev Pacman

Ben Lynch won the competition with a really strong AI:

Probably the best #fsharp conference in the world #progfsharp - and not just 'cos I won the graphman contest! (which I did, by the by ...)

If you fancy having a go at writing your own Pacman AI, pop down to the F#unctional Londoners meetup on November 14th.

I’m already looking forward to next year, if you can’t wait that long for some more conference action with F# check out:

The Promise of Polygot Programming on .Net

.Net was Microsoft’s answer to Java. Java promised a single language that could target multiple platforms. .Net offered multiple languages that targeted Windows. Back then that meant C#, VB.Net and C++/CLI,

Virtual Machines

Things have moved on for both VMs, the JVM now has a plethora of languages like Groovy, Clojure and Scala. Meanwhile .Net code has become cross platform via Mono, in fact C# and F# now have arguably offer a better story for iOS than Java. For iOS, Xamarin compiles C# and F# code to native ARM binaries.

Both VMs have been hugely successful and have seen a wealth of libraries and frameworks built on them, allowing them to interoperate with a vast array of software and devices. This makes languages built on these VMs compelling as you don’t need throw away existing enterprise investments and build everything again from scratch.

Enterprise Java

Java 8

Progress on the Java programming language has been slow, which has in part helped accelerate the adoption of new functional-first languages like Clojure and Scala. Java will be the last mainstream programming language to add lambdas when Java 8 is released potentially in March 2014.

Question marks have also been hanging over Java’s open source credentials since Oracle’s acquisition of Sun. Back in March 2012, Thoughtworks put the Future of Java in the Assess ring of their Technology Radar.

That said in London at least the Java community seems to be strong in numbers with over 3,500 members in the LJC meetup compared to just over 400 for the London .Net meetup.

C# Convergence

C#, VB.Net and C++/CLI all share similar features and so the vast majority of developers have converged over time on the C# programming language.

The dynamic CLR languages IronPython and IronRuby have lost much of their momentum and Nemerle seems to have been subsumed by JetBrains for N2.

That really just leaves F# under active development, built-in to both Visual Studio and Xamarin Studio, it uses the same compiler as F# is open source under Apache 2.0.

C# 5

On .Net, the C# programming language has seen a little more innovation than Java, adding rudimentary functional programming constructs in C# 3 for LINQ like lambdas and extension methods for expressing higher-order functions.

Progress in C# 4 and 5 has been lacklustre since Microsoft departed on a rewrite of the compiler in C#, a closed source project called Rosyln, which is yet to surface.

In C# 4 the focus was on dynamic support, where the main new feature in C# 5 was async, a feature introduced in F# in 2007.

Linux

In the enterprise one of the biggest threats to .Net code bases is migration to Linux, for which outside of Mono there is no .Net story. Xamarin, key contributors to Mono, are overwhelmingly focused on mobile. Recent improvements in Mono, particularly in garbage collection, I feel make it a viable server-side platform, but it currently lacks the support and investment seen on the JVM.

As large enterprises continue to look for cost savings, I suspect more .Net share could be lost to switches to Linux. Lower operating costs combined with the promise of higher productivity with powerful functional-first languages like Clojure and Scala, are hard for IT managers to ignore. Nowadays many finance companies use the JVM on the server-side and reserve .Net, specifically WPF, solely for their front-end applications.

I’d like to see Microsoft join forces with Xamarin to deliver a well supported server-side platform with Mono for C# and F# on Linux, to compete directly with Clojure and Scala. That also means .Net open source authors should be supporting Mono deployments too.

Polyglot Programming

The promise of polyglot programming on .Net, where you pick appropriate languages for specific tasks, now rests primarily with familiar C# and functional-first F#.

As C# and F# are statically typed languages on .Net, interop between them is seamless. Both languages also offer bridges to external libraries via C APIs with P/Invoke. F# also offers rich interop to data sources and programming languages on other platforms via Type Providers.

With F# Type Providers you can interact with languages like R, MATLAB and TypeScript and their rich libraries with intellisense support in your editor. Type Providers are a real enabler for true polyglot programming.

Edge.js also bridges the gap between the .Net and JavaScript worlds with Node.js interop.

The combination of C# with it’s familiar syntax and object-oriented constructs with F# and it’s functional-first constructs and type providers, in my opinion, offers arguably the most powerful and productive programming environment currently available.

In the same Thoughtworks Technlogy Radar that questions the Future of Java, they had this to say on F# and C#:

F# is excellent at concisely expressing business and domain logic. Developers trying to achieve explicit business logic within an application
may opt to express their domain in F# with the majority of plumbing code in C#.

I’m seeing more and more .Net shops adopt F# alongside C#, take for example Kaggle a data science consultancy based in San Francisco:

At Kaggle we initially chose F# for our core data analysis algorithms because of its expressiveness. We’ve been so happy with the choice that we’ve found ourselves moving more and more of our application out of C# and into F#. The F# code is consistently shorter, easier to read, easier to refactor, and, because of the strong typing, contains far fewer bugs.

As our data analysis tools have developed, we’ve seen domain-specific constructs emerge very naturally; as our codebase gets larger, we become more productive.

The fact that F# targets the CLR was also critical - even though we have a large existing code base in C#, getting started with F# was an easy decision because we knew we could use new modules right away.

The point here is not to throw the baby out with the bath water and rewrite your entire solution in a new language, instead leverage your existing investments and use the right tool for the job on new features.

Functional Programming

Functional programming has permeated the majority of mainstream programming languages, and is gaining increasing mind share. Here’s a nice slide from Grant Crafton’s recent session on functional programming at DDD North:

Who likes FP

It’s clear to me that many of the original protagonists of object-oriented programming are migrating towards functional. Recently I met Eric Evans, author of Domain Driven Design, in New York who spoke at length on how he’d gone all-in on functional.

At conferences like Strangeloop, I’d go as far as saying this is a done deal, very few people are talking about OOP anymore.

Functional Learning

There’s been quite a lot of FUD around functional programming being hard to learn. In general people encounter two initial speed bumps:

  • Unfamiliar syntax
  • Unlearning 

Neither of these speed bumps are insurmountable, otherwise there wouldn’t be such a large number of developers turning to functional languages.

Lack of familiarity with syntax can be overcome with just a few hours of immersion. For F# I’d recommend the F# Koans to guide you through the language syntax step-by-step and a Cheat Sheet to lookup common operations as you code. F#’s syntax is based on the ML programming language, once you’ve got the hang of it you’ll be able to read a host of other ML based languages like Erlang and Haskell.

Unlearning tends to take longer than learning, so unfortunately if you’ve been programming in an imperative or object-oriented style for a long time, it will be slightly harder to pick up functional programming. That it to say learning functional programming is not inherently hard, just that thinking differently can take some time. I’d recommend tackling short programming exercises like Code Katas or Project Euler problems to break the back of it.

Be patient, once you’re over the hump you could find the experience fun and rewarding:

Happy learning!