Phillip Trelford's Array

POKE 36879,255

Machine Learning from Disaster

Off the back of the popular Machine Learning hands on session at Skills Matter last month where we created a digit recognizer, last night we tackled a new dataset. Again we took a task from Kaggle’s online predictive modelling competitions. This time the data set was passenger details from the Titanic, with the task to analyse who was likely to survive.


Guided Task: http://trelford.com/titanic.zip (unblock the file, unzip to C:\titanic, load in VS2012 and run through the tasks in the titanic.fsx interactive F# script).

Kaggle provide a CSV file with the passenger details, we loaded this using FSharp.Data’s CSV provider which infers the fields and types of the data for you:

let [<Literal>] path = "C:/titanic/train.csv"
type Train = CsvProvider<path,InferRows=0>
type Passenger = Train.Row

let passengers : Passenger[] = 
    Train.Load(path).Take(600).Data 
    |> Seq.toArray

Then did some preliminary data analysis tasks looking at how well specific features predicted survival:

let females = passengers |> where female
let femaleSurvivors = females |> tally survived
let femaleSurvivorsPc = females |> percentage survived

Finally we used a provided decision tree learning algorithm for prediction:

let labels = [|"sex"; "class"|]

let features (p:Passenger) : obj[] = [|p.Sex; p.Pclass|]

let dataSet : obj[][] =
    [|for passenger in passengers ->
        [|yield! features passenger; 
          yield box (p.Survived = 1)|] |]

let tree = createTree(dataSet, labels)

I used the decision tree code from the Machine Learning in Action book porting the Python implementation to F#, here’s the gist of it. The Python Tools for Visual Studio (PVTS) came in handy for checking the outputs were the same on both implementations. Mathias Brandewinder has a great article on Decision Tree classification and also Random Forest classification in F# using the same Titanic data set. 

Again it was great to see a full house for the event with over 50 members in attendance:

full house

There’s a few more pictures from the event over on the Skills Matter Facebook page :)

Check out the F#unctional Londoners meetup page for upcoming meetings, the next one is 2 weeks on F# Mobile Apps. If you’re interested in more hands on sessions with F# I’d also highly recommend the Progressive F# Tutorials in New York this September and London in October, as there is still a great early bird rate:

miketempbannerprogfsharp-670x180px

TickSpec dependency graph

Scott Wlaschin has just posted a really interesting series on Dependency cycles over on the F# for fun and profit site. The last post shows cycles and modularity in the wild comparing open source C# and F# projects including SpecFlow and TickSpec which have almost identical functionality,.

Here’s the dependency diagram for SpecFlow (C#):

 specFlow

and for TickSpec (F#):

tickSpec

They both have very similar functionality and in fact TickSpec implements it’s own parser too. Read Scott’s article to better understand why such large differences exist between C# and F# projects.

Machine Learning Hands On Session

Last night the F#unctional Londoners Meetup put on a Hands On Machine Learning session at Skills Matter in London. It was a really well attended event, so much so that we had to put a cap on the number of attendees when we reached 70 registrations. The material was recycled from a well received session by Mathias Brandewinder at the San Francisco Bay Area F# User Group in May.

I find F# a very good fit for Machine Learning, in fact my first use of F# was for the player matchmaking on Halo 3.

The goal of the session was to create a digit recognizer using Kaggle’s competition data set.


The first part of the session was to parse and transform the provided CSV files:

let path = @"c:\Digits\digitssample.csv"
let lines = File.ReadAllLines(path)
lines |> Array.map (fun line -> line.Split(','))

Then to implement the K-nearest neighbours algorithm to classify digits. KNN is the first algorithm explained in Manning’s Machine Learning in Action book.

pharrington_cover150

We used a guided script in the session that takes you through the problem in small manageable tasks, each one introducing the necessary F# language contructs required, which you could work through at home too.

Thanks for all the kind feedback:

  1. Finn NeuikFinn Neuik@finnneuik

    great evening at #fsharp UG courtesy of #kaggle, @skillsmatter and @ptrelford : I do like a bit of machine learning!

  2. James CrowleyJames Crowley@JamesCrowley

    Great evening learning some F# and machine learning with the help of @ptrelford @skillsmatter - thanks Phil!

  3. Andy BrackleyAndy Brackley@andybrackley

    Thanks @ptrelford for a great session on machine learning in f#. Excellent content and presentation

  4. Chris AustinChris Austin@cja117

    @ptrelford thanks for a great #fsharp workshop at #SkillsMatter in London.


If you’re interested in learning more check out:the Machine Learning with F# page on the F# Software Foundation site which includes plenty more tutorials.