Phillip Trelford's Array

POKE 36879,255

k-means clustering

The machine learning theme continues to be popular at the F#unctional Londoners meetup group. Last night Matt Moloney gave a great hands on session on k-means clustering. Matt has worked on large machine learning systems at e-Bay. More recently he has been working on the Tsunami IDE, an extensible REPL environment for the desktop and cloud.

Tsunami provides a lightweight environment focused on interactive development, very suited to machine learning. And with F# 3 Type Providers you get typed access to a diverse set of data from CSV files all the way up to Hadoop. Interestingly Tsunami can be embedded in to Excel and used as a replacement for VBA.

Grey Young describes Tsunami as a REPL on steroids.


k-means clustering has a number of interesting application areas, from search to pharmaceuticals. For the session Matt provided an F# script to analyse the canonical iris data set (flowers). The script also produces a variety of charts for visualizing the data including animated gifs showing the centroid positions at each iteration:

results_0_1

The FSharp.Data CSV Type Provider, available on Nuget, gives typed access over CSV files and was used to extract the values from the iris data file:

type Iris = CsvProvider<irisDataFile>
let iris = Iris.Load(irisDataFile)
let irisData = iris.Data |> Seq.toArray

/// classifcations
let y = irisData |> Array.map (fun row -> row.Class)
/// feature vectors
let X = irisData |> Array.map (fun row -> 
  [|row.``Sepal Length`` 
    row.``Sepal Width`` 
    row.``Petal Length`` 
    row.``Petal Width`|])

Computing k-means centroids:

let K = 3 // The Iris dataset is known to only have 3 clusters

let seed = 
  [|X.[0]; X.[1]; X.[2]|]  // pick bad centroids on purpose

let centroidResults = 
  KMeans.computeCentroids seed X |> Seq.take iterationLimit

I was particularly impressed by the conciseness of Matt’s implementation of the algorithm:

(* K-Means Algorithm *)

/// Group all the vectors by the nearest center. 
let classify centroids vectors = 
  vectors |> Array.groupBy (fun v -> centroids |> Array.minBy (distance v))

/// Repeatedly classify the vectors, starting with the seed centroids
let computeCentroids seed vectors = 
  seed |> Seq.iterate (fun centers -> classify centers vectors 
                                      |> Array.map (snd >> average))

Thanks again to Matt for giving a really interesting session.

Learning Machine Learning


If you’re interested in learning more Matt’s also giving an in depth session on machine learning at the Progressive F# Tutorials in London at the end of October:

ProgFsharp London 2013 

And if you’re in New York next week you can catch Rachel Reese give an introduction to data science followed by a machine learning introduction with Mathias Brandewinder and I at the Progressive F# Tutorials NYC.

Walkie Scorchie

From the window at the office I’ve seen a series of futuristic buildings erected, first the Gherkin, then the Shard and now the Walkie Talkie:

London skyline 

The last one being recently been re-dubbed the Walkie Scorchie as it produces a supercharged solar ‘death ray’ that has burned holes in carpets, melted furniture and even the interior of a Jaguar parked nearby.

It feels almost reminiscent of the dystopian future portrayed in the cult film Idiocracy:

Idiocracy skyline

Though our current society is probably closer to the surveillance society of 1984 with the omnipresent Big Brother watching over you:

Big brother is watching you

The everyday software we work with is no less broken, or so Scott Hanselman concludes in everything's broken and nobody's upset.

Software is increasingly a world of broken windows where developers are more accustomed to working around issues than giving feedback, let alone tackling the root causes.

waiting for background operation to complete modal dialog 

Anyone else struggling with the cognitive dissonance of a modal dialog waiting for a background operation to complete?

XAML

I’ve been using XAML in it’s various guises off an on for over 5 years now. Over that time I’ve used WPF, Silverlight and Metro. It feels like little has changed in that time. It’s still probably the best thing out there for desktop app development but surely we could do better.

XML

I’m still stuck editing views in XML, the poor man’s DSL. I, like the developers I know rarely use the designer view. When we do it frequently hangs and we’re left manually killing XDescProc.exe from task manager.

Data Binding

Data binding is central to XAML. With it you can easily bind a collection to a data grid and get sorting and filtering for free. But binding comes with a high performance cost and it’s stringly typed nature means you’re left finding binding errors in the output window at runtime.

Dynamically binding a view to a model would make more sense to me if I could edit the view at runtime, but you can’t, it’s compiled in.

Value Converters

If I want to transform a model value bound to the view to a different form I have to create a class that implements the IValueConverter interface then reference it as a static resource in the resources section. It has all the elegance of C++ header files. Achieving the same thing in ASP.Net is much simpler, you can just write inline some code in the view or call a function.

INotifyPropertyChanged

There’s a plethora of libraries and frameworks dedicated to working around this design decision, from the MVVM frameworks with LINQ expressions to PostSharp and attributes. The C# 5 compiler’s CallerMemberName attribute has finally brought some sanity to the situation. But I know many developers, including myself, have wasted countless hours dealing with it and trying to think of ways of subverting it.

Just about every view model class ends up inheriting from some ViewModelBase or ObservableObject class so that it can implement INotifyPropertyChanged.

Inheritance

It is often said that object-oriented programming is well suited to graphical user interfaces. Perhaps, but I start to have doubts when I see WPF’s Button class with 9 layers of inheritance and over 160 members.

Visual Studio 2010 introduced improved intellisense with substring search on members to partially work around this, but honestly I’d prefer to see buttons be closer to having content and a clicked event than their current diaspora of responsibilities.

Null Reference Exceptions

By far the most common error I see in C# applications is Null Reference Exceptions. Hardly a day goes by where yet another null check is added to the codebase. This needs to be fixed at the language level, functional-first languages like Scala and F# show it’s possible.

HTML5 & JavaScript

HTML 5 and JavaScript are now being pushed as an alternate environment for desktop development. JavaScript has come a long way performance and libraries wise and I like the HTML 5 Canvas model. However I’m not yet convinced how well this scales to multi-window applications that need to interact between processes.

Windows 8

I get that Metro style interfaces are good for tablets and touch, but for multi-window desktop applications it feels like a non-starter.

 

I’ve seen traders use 9 displays simultaneously, running a multitude of different apps from execution platforms and news services to Excel, Outlook and instant messengers.

To me Windows 8 appears to be consumer orientated release. I’m hoping the next version will bring something for the larger customer base of business users.

Vendors

One person’s problem is another’s opportunity. Third party vendors like JetBrains are grasping the opportunity with both hands by patching flaws in Visual Studio and C# with tools like Resharper. They’re now starting to provide elastoplasts over XAML editing too. Jetbrains are not the only ones, Telerik and others are making hay selling themes and datagrids.

627 

To temporarily workaround performance issues caused by the cost of the mouse handler events in one these products we were forced to throttle the number of system mouse events bubbling through the system.

Duct tape is a good short term solution, but surely at some point we should consider building a stronger foundation.

Simplicity

I would not give a fig for the simplicity this side of complexity, but I would give my life for the simplicity on the other side of complexity. - Oliver Wendell Holmes, Jr.

XAML is huge and bloated, WPF 4.5 Unleashed is 864 pages long, yet XAML’s focus on data binding makes it feel like a one-trick pony.

I think a modern desktop UI environment needs to cover a diverse range of product scenarios. There should be a low-level core that can be easily accessed with code for high performance features all the way up to a designer view for tacking databases on to views, and it should all interoperate seamlessly.

Sections of the development community are striving to bring simplicity to our environments. Language designers like Rich Hickey (Clojure) and Don Syme (F#) are bringing us simplicity at the language level. I’d love to see these thought processes applied to UI environments too.

To tackle the root causes of problems some times you need to stop continuously patching over the cracks, step back and take time to look at the big picture. To create Clojure Rich Hickey practised Hammock driven development. I’d love to see more of this kind of thoughtful design and less Mortgage driven development.

Tackling the root causes of complexity and defects in our software is not an easy choice, it requires investment and changes to the way we do things. But not changing is a choice too.

They live out their lives in this virtual reality as they would have around the turn of the 20th and 21st centuries. The time period was chosen because it is supposedly the pinnacle of human civilization. – Agent Smith

Must Java like languages and XML be considered the pinnacle of software development for time immemorial?

Try 10 Programming Languages in 10 minutes

There are a lot of interesting programming languages out there, but downloading and setting up the environment can be very time consuming when you just want to try one out. The good news is that you can try out many languages in your browser straight away, often with tutorials which guide you through the basics.

Following the pattern of 7 languages in 7 weeks book, here’s a somewhat abridged version.

Dynamic Languages

Fed up of long compile times, want a lightweight environment for scripting? Dynamic languages could be your new friend.

Try Lua

Lua is a lightweight dynamic language with excellent coroutine support and a simple C API making it hugely popular in video gaming for scripting. Have fun with game engines like LÖVE and Marmalade Quick.

Try Clojure

Clojure is the brainchild of the hugely charismatic speaker Rich Hickey, it is a descendant of one of the earliest programming languages LISP. There’s a really rich community around Clojure, one of my favourite projects is Sam Aaron’s Overtone live coding audio environment.

Try R (quick registration required)

R is a free environment for statistical computing and graphics, with a huge range of user-submitted packages. Ever wondered how to draw an egg?

Functional Languages

Aspects of functional programming have permeated most mainstream languages from C++ to VB. However to really appreciate the expressiveness of the functional approach a functional-first language is required.

Try Erlang

Erlang is a really interesting language for building fault tolerant concurrent systems. It also has great pattern matching capabilities. It has many industrial applications and tools including the RabbitMQ messaging system and the distribute database Riak.

Try Haskell

Haskell is heavily based on the Miranda programming language which was taught in British universities in the 80s and 90s. Haskell added Monads and Type Classes, and is still taught in a few universities, it is also still quite popular in academic research.

Try OCaml

OCaml like Miranda is based on the ML programming language adding object-oriented constructs. F# is based on OCaml, there is even a compatibility mode. OCaml still has industrial application, for example at Jane Street Capital and XenSource.

Web Languages

There’s a plethora of languages that compile to JavaScript languages out there. Also worth a look are the new features in JavaScript itself, see Brendan Eich’s talk at Strangeloop last year on the The State of JavaScript. Here’s 3 *Script languages I find particularly interesting:

LiveScript

LiveScript is an indirect descendant of CoffeeScript with features to assist functional programming like pattern matching and function composition. Check out 10 LiveScript one liners to impress your friends.

Try Elm

Elm is a functional reactive language for creating highly interactive programs, including games. Reactive programming is an interesting direction and I think languages designed specifically for this are worth investigating.

PogoScript

Unfortunately there’s currently no online editor for this one, but there is a command line REPL. PogoScript is DSL friendly allowing white space in function names.

Esoteric Languages

Esoteric languages tend to be write-only, a bit like Perl but just for fun.

Try Brainfuck

Brainfuck is the Rubik’s cube of programming languages. I built the site last year with the interpreter written in plain old JavaScript, check out the fib sample.

Browser IDEs

With so many programming language experimentation environments available online, the next logical step is to host the IDE there. Imagine not having to wait 4 hours for Visual Studio to install.

Cloud 9 is an online environment for creating Node.js apps, pulling together sets of relevant packages. Tools like Sploder let you build games online.

The Try F# site offers arguably the most extensive online learning features of any language. Cloud Tsunami IDE also offers a rich online development experience for F#. In the near future CloudSharper will offer an online IDE experience for developing web applications with F# using WebSharper,

Scaling up

Once you’ve completed some basic tasks in a new language you’ll want to move on to slightly larger tasks. I like to use exercises from the coding Kata Catalogue like FizzBuzz, the Game of Life and Minesweeper.

Some people enjoy going through the Project Euler problems, others have their own hello world applications. For Martin Trojer it’s a Scheme interpreter and Luke Hoban often writes a Ray Tracer.

I’d also recommend joining a local meetup group. The London Scala meetup have a coding dojo every month and the F#unctional Londoners meetup have hands on session in the middle the month, the next one is on Machine Learning.

Programming language books that include questions at the end of sections are a good way to practice what you’ve learned but are few and far between. The recent Functional Programming with F# book is an excellent example of what can be done with questions at the end of each chapter.

While the basics of a language can be picked up in a few hours, expect it to take a few weeks before you’re productive and at least a few months before you start to gain mastery.

Want to write your own language? Pete Sestoft’s Programming Language Concepts book offers a good introduction to the subject.