Phillip Trelford's Array

POKE 36879,255

DDD East Anglia 2014

This Saturday saw the Developer Developer Developer! (DDD) East Anglia conference in Cambridge. DDD events are organized by the community for the community with the agenda for the day set through voting.

T-Shirts

The event marked a bit of a personal milestone for me, finally completing a set of DDD regional speaker T-Shirts, with a nice distinctive green for my local region. Way back in 2010 I chanced a first appearance at a DDD event with a short grok talk on BDD in the lunch break at DDD Reading. Since then I’ve had the pleasure of visiting and speaking in Glasgow, Belfast, Sunderland, Dundee and Bristol.

Talks

There were five F# related talks on the day, enough to fill an entire track:

Tomas kicked off the day, knocking up a simple e-mail validation library with tests using FsUnit and FsCheck. With the help of Project Scaffold, by the end of the presentation he’d generated a Nuget package, continuous build with Travis and Fake and HTML documentation using FSharp.Formatting.

Anthony’s SkyNet slides are already available on SlideShare:


ASP.Net was also a popular topic with a variety of talks including:

All your types are belong to us!

The title for this talk was borrowed from a slide in a talk given by Ross McKinlay which references the internet meme All your base are belong to us.

You can see a video of an earlier incarnation of the talk, which I presented at NorDevCon over on InfoQ, where they managed to capture me teapotting:

teapot

The talk demonstrates accessing a wide variety of data sources using F#’s powerful Type Provider mechanism.

The World at your fingertips

The FSharp.Data library, run by Tomas Petricek and Gustavo Guerra, provides a wide range of type providers giving typed data access to standards like CSV, JSON, XML, through to large data sources Freebase and the World Bank.

With a little help from FSharp.Charting and a simple custom operator based DSL it’s possible to view interesting statistics from the World Bank data with just a few key strokes:


The JSON and XML providers give easy typed access to most internet data, and there’s even a branch of FSharp.Data with an HTML type provider providing access to embedded tables.

Enterprise

The SQLProvider project provides type access with LINQ support to a wide variety of databases including MS SQL Server, PostgreSQL, Oracle, MySQL, ODBC and MS Access.

FSharp.Management gives typed access to the file system, registry, WMI and Powershell.

Orchestration

The R Type Provider lets you access and orchestrate R packages inside F#.

With FCell you can easily access F# functions from Excel and Excel ranges from F#, either from Visual Studio or embedded in Excel itself.

The Hadoop provider allows typed access to data available on Hive instances.

There’s also type providers for MATLAB, Java and TypeScript.

Fun

Type Providers can also be fun, I’ve particularly enjoyed Ross’s Choose Your Own Adventure provider and more recently 2048:

2048 

Write your own Type Provider

With Project Scaffold it’s easier than ever to write and publish your own FSharp type provider. I’d recommend starting with Michael Newton’s Type Provider’s from the Ground Up article and video of his session at Skills Matter.

You can learn more from Michael and others at the Progressive F# Tutorials in London this November:

DDD North

The next DDD event is in Leeds on Saturday October 18th, where I’ll be talking about how to Write your own Compiler, hope to see you there :)

FParsec Tutorial

Back at the start of the year, I took the F# parser combinator library FParsec out for a spin, writing an extended Small Basic compiler and later a similar parser for a subset of C#. Previously I’d been using hand rolled parsers, for projects like TickSpec, a .Net BDD library, and Cellz, an open source spreadsheet. With FParsec you can construct a parser relatively rapidly and easily using the powerful built-in functions and F# interactive for quick feedback.

FParsec has been used in a number of interesting projects including FunScript, for parsing TypeScript definition files, and FogBugz for search queries in Kiln.

Like any library there is a bit of a learning curve, taking time to get up to speed before you reap the benefits. So with that in mind I put together a short hands on tutorial that I ran at the F#unctional Londoners meetup held at Skills Matter last week.

The tutorial consisted of a short introduction to DSLs and parsing. Then a set of tasks leading to a parser for a subset of the Logo programming language. Followed by examples of scaling out to larger parsers and building a compiler backend, using Small Basic and C# as examples.


Download the tasks from: http://trelford.com/FParsecTutorial.zip

Logo programming language

One of my earliest experiences with programming was a Logo session in the 70s, when my primary school had a short term loan of a turtle robot:

1968_LogoTurtle

The turtle, either physical or on the screen, can be controlled with simple commands like forward, left, right and repeat, e.g.

> repeat 10 [right 36 repeat 5 [forward 54 right 72]]

image

Abstract Syntax Tree

The abstract syntax tree (AST) for these commands can be easily described using F#’s discriminated unions type:

type arg = int
type command =
   | Forward of arg
   | Turn of arg
   | Repeat of arg * command list

Note: right and left can simply be represented as Turn with a positive or negative argument.

The main task was to use FParsec to parse the commands in to AST form.

Parsing

A parser for the forward command can be easily constructed using built-in FParsec parser functions and the >>. operator to combine them:

let forward = pstring "forward" >>. spaces1 >>. pfloat

The parsed float value can be used to construct the Forward case using the |>> operator:

let pforward = forward |>> fun n -> Forward(int n)

To parse the forward or the short form fd, the <|> operator can be employed:

let pforward = (pstring "fd" <|> pstring "forward") >>. spaces1 >>. pfloat
               |>> fun n -> Forward(int n)

Parsing left and right is almost identical:

let pleft = (pstring "left" <|> pstring "lt") >>. spaces1 >>. pfloat 
            |>> fun x -> Left(int -x)
let pright = (pstring "right" <|> pstring "right") >>. spaces1 >>. pfloat 
             |>> fun x -> Right(int x)

To parse a choice of commands, we can use the <|> operator again:

let pcommand = pforward <|> pleft <|> pright

To handle a sequence of commands there is the many function

let pcommands = many (pcommand .>> spaces)

To parse the repeat command we need to parse the repeat count and a block of commands held between square brackets:

let block = between (pstring "[") (pstring "]") pcommands
let prepeat = 
    pstring "repeat" >>. spaces1 >>. pfloat .>> spaces .>>. block
    |>> fun (n, commands) -> Repeat(int n, commands)

Putting this altogether we can parse a simple circle drawing function:

> repeat 36 [forward 10 right 10]

However we cannot yet parse a repeat command within a repeat block, as the command parser does not reference the repeat command.

Forward references

To separate the definition of repeat’s parser function from it’s implementation we can use the createParserForwardedToRef function:

let prepeat, prepeatimpl = createParserForwardedToRef ()

Then we can define the choice of commands to include repeat:

let pcommand = pforward <|> pleft <|> pright <|> prepeat

And finally define the implementation of the repeat parser that refers to itself:

prepeatimpl := 
    pstring "repeat" >>. spaces1 >>. pfloat .>> spaces .>>. block
    |>> fun (n, commands) -> Repeat(int n, commands)

Allowing us to parse nested repeats, i.e.

> repeat 10 [right 36 repeat 5 [forward 54 right 72]]

Parses to:

> Repeat (10,[Right 36; Repeat (5,[Forward 54; Right 72])])

Interpreter

Evaluation of a program can now be easily achieved using pattern matching over the AST:

let rec perform turtle = function
    | Forward n ->
        let r = float turtle.A * Math.PI / 180.0
        let dx, dy = float n * cos r, float n * sin r
        let x, y =  turtle.X, turtle.Y
        let x',y' = x + dx, y + dy
        drawLine (x,y) (x',y')
        { turtle with X = x'; Y = y' }
    | Turn n -> { turtle with A=turtle.A + n }
    | Repeat(n,commands) ->
        let rec repeat turtle = function
            | 0 -> turtle
            | n -> repeat (performAll turtle commands) (n-1)
        repeat turtle n
and performAll = List.fold perform

Check out this snippet for the full implementation as a script: http://fssnip.net/nM

User Commands

Logo lets you define your own commands, e.g.

>  to square
     repeat 4 [forward 50 right 90]
   end
   to flower
     repeat 36 [right 10 square]
   end
   to garden
     repeat 25 [set-random-position flower]
   end

garden

The parser can be easily extended to support this, try the snippet: http://fssnip.net/nN

Small Basic

Small Basic is a Microsoft programming language also aimed at teaching kids, and also featuring turtle functionality. At the beginning of the year I wrote a short series of posts on writing an extended compiler for Small Basic:

The series starts with an AST, internal DSL and interpreter. Then moves on to parsing the language with FParsec and compiling the AST to IL code using Reflection.Emit. Finally the series ends with extensions for functions with arguments and support for tuples and pattern matching.

It’s a fairly short hop from implementing Logo to implementing a larger language like Small Basic.

Parsing C#

A few weeks later as an experiment I knocked up an AST and parser for a fairly large subset of C#, which shares much of the imperative core of Small Basic: http://fssnip.net/lf

Check out Neil Danson’s blog on building a C# compiler in F# to see C# compiled to IL using a similar AST.

DDD North: Write your own compiler in 24 hours

If you’re interested in learning more, I’ll be speaking at DDD North in Leeds on Saturday 18th October about how to write your own compiler in 24 hours.

C# Records & Pattern Matching Proposal

Following on from VB.Net’s new basic pattern matching support, the C# team has recently put forward a proposal for record types and pattern matching in C# which was posted in the Roslyn discussion area on CodePlex, quote:

Pattern matching extensions for C# enable many of the benefits of algebraic data types and pattern matching from functional languages, but in a way that smoothly integrates with the feel of the underlying language. The basic features are: records, which are types whose semantic meaning is described by the shape of the data; and pattern matching, which is a new expression form that enables extremely concise multilevel decomposition of these data types. Elements of this approach are inspired by related features in the programming languages F# and Scala.

There has been a very active discussion on the forum ever since, particularly around syntax.

UPDATE: the discussion appears to have dried up in mid-September 2014 and the pattern matching in C# proposal doesn’t appear to have progressed,

Background

Algebraic types and pattern matching have been a core language feature in functional-first languages like ML (early 70s), Miranda (mid 80s), Haskell (early 90s) and F# (mid 00s).

I like to think of records as part of a succession of data types in a language:

Name Example (F#) Description
Scalar
let width = 1.0
let height = 2.0
Single values
Tuple
// Tuple of float * float
let rect = (1.0, 2.0)
Multiple values
Record
type Rect = {Width:float; Height:float}
let rect = {Width=1.0; Height=2.0}
Multiple named fields
Sum type(single case)
type Rect = Rect of float * float
let rect = Rect(1.0,2.0)
Tagged tuple
Sum type(named fields)
type Rect = Rect of width:float*height:float
let rect = Rect(width=1.0,height=2.0)
Tagged tuple with named fields
Sum type(multi case)
type Shape=
   | Circle of radius:float
   | Rect of width:float * height:float
Union of tagged tuples

Note: in F# sum types are also often referred to as discriminated unions or union types, and in functional programming circles algebraic data types tend to refer to tuples, records and sum types.

Thus in the ML family of languages records are like tuples with named fields. That is, where you use a tuple you could equally use a record instead to add clarity, but at the cost of defining a type. C#’s anonymous types fit a similar lightweight data type space, but as there is no type definition their scope is limited (pun intended).

For the most part I find myself pattern matching over tuples and sum types in F# (or in Erlang simply using tuples where the first element is the tag to give a similar effect).

Sum Types

The combination of sum types and pattern matching is for me one of the most compelling features of functional programming languages.

Sum types allow complex data structures to be succinctly modelled in just a few lines of code, for example here’s a concise definition for a generic tree:

type 'a Tree =
    | Tip
    | Node of 'a * 'a Tree * 'a Tree

Using pattern matching the values in a tree can be easily summed:

let rec sumTree tree =
    match tree with
    | Tip -> 0
    | Node(value, left, right) ->
        value + sumTree(left) + sumTree(right)

The technique scales up easily to domain models, for example here’s a concise definition for a retail store:

/// For every product, we store code, name and price
type Product = Product of Code * Name * Price

/// Different options of payment
type TenderType = Cash | Card | Voucher

/// Represents scanned entries at checkout
type LineItem = 
  | Sale of Product * Quantity
  | Cancel of int
  | Tender of Amount * TenderType

Class Hierarchies versus Pattern Matching

In class-based programming languages like C# and Java, classes are the primary data type  where (frequently mutable) data and related methods are intertwined. Hierarchies of related types are typically described via inheritance. Inheritance makes it relatively easy to add new types, but adding new methods or behaviour usually requires visiting the entire hierarchy. That said the compiler can help here by emitting an error if a required method is not implemented.

Sum types also describe related types, but data is typically separated from functions, where functions employ pattern matching to handle separate cases. This pattern matching based approach makes it easier to add new functions, but adding a new case may require visiting all existing functions. Again the compiler helps here by emitting a warning if a case is not covered.

Another subtle advantage of using sum types is being able to see the behaviour for all cases in a single place, which can be helpful for readability. This may also help when attempting to separate concerns, for example if we want to add a method to print to a device to a hierarchy of classes in C# we could end up adding printer related dependencies to all related classes. With a sum type the printer functionality and related dependencies are more naturally encapsulated in a single module

In F# you have the choice of class-based inheritance or sum types and can choose in-situ. In practice most people appear to use sum types most of the time.

C# Case Classes

The C# proposal starts with a simple “record” type definition:

public record class Cartesian(double x: X, double y: Y);

Which is not too dissimilar to an F# record definition, i.e.:

type Cartesian = { X: double, Y: double }

However from there it then starts to differ quite radically. The C# proposal allows a “record” to inherit from another class, in effect allowing sum types to be defined, i.e:

abstract class Expr; 
record class X() : Expr; 
record class Const(double Value) : Expr; 
record class Add(Expr Left, Expr Right) : Expr; 
record class Mult(Expr Left, Expr Right) : Expr; 
record class Neg(Expr Value) : Expr;

which allows pattern matching to be performed using an extended switch case statement:

switch (e) 
{ 
  case X(): return Const(1); 
  case Const(*): return Const(0); 
  case Add(var Left, var Right): 
    return Add(Deriv(Left), Deriv(Right)); 
  case Mult(var Left, var Right): 
    return Add(Mult(Deriv(Left), Right), Mult(Left, Deriv(Right))); 
  case Neg(var Value): 
    return Neg(Deriv(Value)); 
}

This is very similar to Scala case classes, in fact change “record” to case, drop semicolons and voilà:

abstract class Term
case class Var(name: String) extends Term
case class Fun(arg: String, body: Term) extends Term
case class App(f: Term, v: Term) extends Term

To sum up, the proposed C# “record” classes appear to be case classes which support both single and multi case sum types.

Language Design

As someone who has to spend some of their time working in C# and who feels more productive having concise types and pattern matching in their toolbox, overall I welcome our new overlords this proposal.

From my years of experience using F#, I feel it would be nice to see a simple safety feature included, to what is in effect a sum type representation, so that sum types can be exhaustive. This would allow compile time checks to ensure that all cases have been covered in a switch/case statement, and a warning given otherwise.

Then again, I feel this is quite a radical departure from the style of implementation I’ve seen in C# codebases in the wild, to the point where it’s starting to look like an entirely different language… and so this may be a feature that if it does see the light of day is likely to get more exposure in C# shops working on greenfield projects.