Phillip Trelford's Array

POKE 36879,255

Parsing with SNOBOL

Just before Christmas I came across some Java source code by “Uncle” Bob Martin aimed at “demystifying compilers” which expends about 600 lines of code to parse the following simple finite state machine:

Actions: Turnstile
FSM: OneCoinTurnstile
Initial: Locked
{
Locked Coin Unlocked {alarmOff unlock}
Locked Pass Locked  alarmOn
Unlocked Coin Unlocked thankyou
Unlocked Pass Locked lock
}

For fun I knocked up a broadly equivalent parser in F# using FParsec which was just under 40 lines of code, and posted the code on this blog.

The post generated some interest, and I even got a mention on Twitter from “Uncle” Bob himself:

I’d not seen SNOBOL before, but given Mr Martin’s recommendation I popped over to the SNOBOL page on WikiPedia and liked what I saw:

SNOBOL rivals APL for its distinctiveness in format and programming style, both being radically unlike more "standard" procedural languages such as BASIC, Fortran, or C.

SNOBOL first appeared in 1962 and appears to have been popular in US Universities as a text manipulation language in the 70s and 80s. The language supports pattern matching over text combined with assembler like control flow using labels and goto (like C#).

As a text manipulation language, SNOBOL code feels a little more readable than the new norm - regular expressions .

If you’d like to take it out for a spin there’s an open source SNOBOL IDE, with syntax colouring support, for Linux and Windows called TkS*LIDE.

SNOBOL Interpreter

Nowadays when I want to learn a new language I often start by implementing it, to this end over the course of about a week I built a SNOBOL interpreter with just enough functionality to run the samples on the Wikipedia page along with some more involved samples from other sources.

The SNOBOL interpreter is about 400 LOC and available as an F# Snippet.

Finite State Machine in SNOBOL

Armed with a basic knowledge of SNOBOL, I could now answer the question, is the implementation even smaller in SNOBOL.

The answer is a resounding yes, and here’s the 34 lines of SNOBOL code that proves it:

SNOBOL4 FSM

Disclaimer: unlike the FParsec version there’s no error handling/error messages and the FSM must be layed out in a specific format.

Conclusions

The SNOBOL finite state machine parser, like the FParsec based parser, fits on a page and is an order of magnitude shorter than the broadly equivalent clean Java implementation written by Uncle Bob Martin that aimed to demystify compiler writing.

Will I be switching from FParsec to SNOBOL for parsing? Probably not, FParsec is at least as expressive, provides pretty good error messages for free and runs on the CLR.

Special thanks to Uncle Bob Martin for the SNOBOL tip Smile

Top 100 .Net Bloggers from 2014

In my last post I covered the top 100 .Net bloggers since 2008, based on links posted on Alvin Ashcraft's Morning Dew. This (intentionally) captured many bloggers that are no longer actively blogging, but equally still have interesting content to consume.

For completeness here's the ranking for the years 2014 and 2015 (up to last Friday) which may better capture active .Net bloggers:

Rank Name 2014  2015  Total
1 Sean Sexton 195 0 195
2 Raymond Chen 86 17 103
3 Greg Duncan 74 14 88
4 Scott Hanselman 50 7 57
5 Peter Vogel 44 12 56
6 Brian Harry 46 8 54
7 Ricardo Peres 38 13 51
8 Oren Eini 32 12 44
9 Eric Lippert 44 0 44
10 Sacha Barber 31 7 38
11 Martin Hinshelwood 25 5 30
12 Eric Battalio 27 2 29
13 Carl Franklin & Richard Campbell 16 10 26
14 Jonathan Allen 17 9 26
15 Sasha Goldshtein 19 7 26
16 Dhananjay Kumar 25 1 26
17 James Montemagno 17 7 24
18 Jimmy Bogard 18 6 24
19 Willy-P. Schaub 19 4 23
20 Mike Taulty 21 1 22
21 Nicholas Blumhardt 18 3 21
22 S.Somasegar 17 3 20
23 Rob Eisenberg 13 7 20
24 Kathleen Dollard 20 0 20
25 Jeremy Clark 10 9 19
26 Jon Skeet 16 3 19
27 Phillip Trelford 17 2 19
28 Michael Crump 13 5 18
29 Immo Landwerth 13 5 18
30 Rory Becker 18 0 18
31 Rowan Miller 15 2 17
32 Sanjay Sharma 17 0 17
33 Jesse Liberty 15 1 16
34 Charles Sterling 15 1 16
35 Miguel de Icaza 12 3 15
36 Steve Smith 15 0 15
37 Bnaya Eshet 5 9 14
38 Scott Guthrie 12 2 14
39 Gael Fraiteur 11 3 14
40 Bill Wagner 11 3 14
41 Mary Jo Foley 12 2 14
42 Rick Strahl 7 7 14
43 Kim Spilker 14 0 14
44 Tatworth 14 0 14
45 MS Downloads 13 0 13
46 John Montgomery 8 4 12
47 Jeff Martin 9 3 12
48 Kerry Meade 10 2 12
49 Latish Sehgal 12 0 12
50 Richard Carr 12 0 12
51 Jonathan Wood 8 3 11
52 K. Scott Allen 8 3 11
53 Susan Ibach 7 4 11
54 Filip Ekberg 11 0 11
55 Mads Kristensen 8 2 10
56 Robert Green 9 1 10
57 Bertrand Le Roy 8 2 10
58 Daria Dovzhikova 10 0 10
59 CodePlex 10 0 10
60 Laurent Bugnion 6 3 9
61 Erik EJ 8 1 9
62 Iris Classon 6 3 9
63 Pete D. 4 5 9
64 DevToolsGuy 3 6 9
65 Dave M. Bush 7 2 9
66 Cameron Taggart 8 1 9
67 Deborah Kurata 8 1 9
68 Julie Lerman 7 2 9
69 Anand Narayanaswamy 9 0 9
70 Philip Fu 9 0 9
71 Glenn Block 6 2 8
72 The .NET Team 6 2 8
73 Jeremy Likness 5 3 8
74 Shawn Wildermuth 6 2 8
75 Ondrej Balas 7 1 8
76 Kunal Chowdhury 6 2 8
77 Adam Anderson 8 0 8
78 Jeremy D. Miller 8 0 8
79 Schabse Laks 8 0 8
80 Sam Sabri 8 0 8
81 Frans Bouma 5 2 7
82 Jean-Marc Prieur 5 2 7
83 Sergio De Simone 6 1 7
84 David Voyles 4 3 7
85 Dmitri Nesteruk 2 5 7
86 Nick Randolph 5 2 7
87 Alois Kraus 6 1 7
88 Jef Claes 6 1 7
89 Eric Sink 6 1 7
90 Josh Morales 6 1 7
91 Terje Sandstrom 7 0 7
92 Xinyang Qiu 7 0 7
93 Jon Galloway 7 0 7
94 John Papa 7 0 7
95 Daniel Rubino 7 0 7
96 Matthieu Mezil 7 0 7
97 Angelos Petropoulos 3 3 6
98 Peter Kellner 3 3 6
99 Dror Helper 5 1 6
100 Tom Warren 3 3 6

 

This definitely brings up some new names alongside the old familiar ones :)

Script

For the analysis we employed a simple F# script, using FShapr.Data’s CSV Type Provider for types over the data set and Taha Hachana’s XPlot library for charting.

Here’s the code for the top 100:

open FSharp.Data

let [<Literal>] path = @"LinksTo2015.csv"
type Posts = CsvProvider<path>
let posts = Posts.Load(path)

let topAuthors n =
   posts.Rows
   |> Seq.where (fun row -> row.Year >= 2014)
   |> Seq.where (fun row -> row.Tag.Contains ".NET" || row.Tag.Contains "Top")
   |> Seq.groupBy (fun row -> row.Author) 
   |> Seq.map (fun (author,rows) -> author, rows |> Seq.toArray)
   |> Seq.sortBy (fun (_,rows) -> -rows.Length)
   |> Seq.take n
   |> Seq.toList

let top100 = topAuthors 100

For the table I simply used another short snippet to transform the results to text for an HTML table.

Most Prolific Bloggers on .Net

On Saturday I headed down to Thoughtworks in Soho, London for an F# Open Data Hackathon organized by Thoughtworker Sean Newham. We started up with questions we’d like to answer using open data. I was interested in finding the most prolific bloggers in .Net, and formed a team with Adam Kosiński, Emmet Cassidy and my son Sean. We had from around 11am to 3pm to answer the question and present the results.

Dew Drop

We used data mined from Alvin Ashcraft’s Morning Dew site, which provides a labelled list of top links almost every week day since 2008.

Here’s Alvin’s activity since 2008:

Dew Drop Calendar 2008-2015

Alvin’s links covers many topics including .Net, Web, Mobile and XAML:

Dew Drop Tags 2008-2015

Top 100 .Net Bloggers

For this analysis we’re looking only at links labelled as “Top Links” or “.NET”. Between 2008 and 2015 there were over 20,000 links from over 3000 unique author names.

Interestingly the top 100 bloggers account for roughly half of all posts, and here’s the table of top 100 .Net bloggers based on data extracted from the Morning Dew:

Rank Name 2008  2009  2010  2011  2012  2013  2014  2015  Total
1 Greg Duncan 2 16 74 142 116 86 74 14 524
2 Oren Eini 31 71 76 42 94 49 32 12 407
3 Sean Sexton 0 0 0 0 0 193 195 0 388
4 Zain Naboulsi 0 0 236 56 17 37 3 0 349
5 Richard Carr 6 13 78 77 90 58 12 0 334
6 Eric Lippert 7 48 47 46 37 68 44 0 297
7 Raymond Chen 0 0 0 20 75 92 86 17 290
8 Scott Hanselman 28 16 38 39 44 37 50 7 259
9 MS Downloads 27 35 42 48 46 23 13 0 234
10 CodePlex 36 12 34 70 49 13 10 0 224
11 Sasha Goldshtein 4 16 30 31 22 25 19 7 154
12 Brian Harry 0 0 0 0 35 58 46 8 147
13 Julie Lerman 19 45 15 16 19 16 7 2 139
14 Scott Guthrie 3 12 47 18 19 24 12 2 137
15 Martin Hinshelwood 3 12 12 20 27 32 25 5 136
16 Mike Hadlow 6 16 30 21 25 22 4 0 124
17 Dhananjay Kumar 0 0 0 60 26 7 25 1 119
18 Derik Whittaker 24 21 21 21 18 8 6 0 119
19 Gunnar Peipman 1 40 36 9 9 15 4 1 115
20 Abhijit Jana 0 0 18 87 0 3 2 3 113
21 Ricardo Peres 0 0 0 6 15 31 38 13 103
22 Jimmy Bogard 19 18 14 12 7 5 18 6 99
23 Dennis Delimarsky 0 0 50 28 16 5 0 0 99
24 Peter Vogel 0 0 0 0 13 27 44 12 96
25 Jonathan Allen 0 0 11 19 25 15 17 9 96
26 Matthew Podwysocki 27 45 21 1 1 0 0 0 95
27 Rockford Lhotka 8 19 17 20 20 5 3 0 92
28 K. Scott Allen 11 8 14 15 19 11 8 3 89
29 Shai Raiten 0 0 18 40 22 9 0 0 89
30 Charles Sterling 3 8 4 6 25 24 15 1 86
31 Deborah Kurata 0 24 36 2 9 3 8 1 83
32 Phil Haack 3 5 10 31 13 11 6 0 79
33 James Michael Hare 0 0 8 38 24 3 0 3 76
34 Peter Kellner 2 18 10 8 22 9 3 3 75
35 Sacha Barber 7 4 5 9 4 8 31 7 75
36 Kunal Chowdhury 0 0 14 13 20 20 6 2 75
37 Pete Brown 3 6 28 17 16 3 2 0 75
38 Mike Taulty 7 9 9 10 13 4 21 1 74
39 Glenn Block 12 9 18 11 9 5 6 2 72
40 Miguel de Icaza 0 0 25 19 8 5 12 3 72
41 Davy Brion 7 30 26 9 0 0 0 0 72
42 Mary Jo Foley 0 3 1 20 17 16 12 2 71
43 Rick Strahl 9 13 1 11 15 8 7 7 71
44 Alex Skorkin 0 0 11 51 9 0 0 0 71
45 Jesse Liberty 3 0 10 13 19 4 15 1 65
46 Tatworth 0 0 0 14 30 7 14 0 65
47 Daniel Moth 17 23 0 12 6 3 4 0 65
48 Abhishek Sur 0 1 20 35 2 3 1 2 64
49 Clemens Reijnen 11 12 19 5 8 3 5 1 64
50 Justin Etheredge 25 20 16 3 0 0 0 0 64
51 Jeremy Likness 0 0 18 12 21 3 5 3 62
52 Iris Classon 0 0 0 0 37 16 6 3 62
53 Bill Wagner 0 7 2 21 11 7 11 3 62
54 Mark Needham 0 31 31 0 0 0 0 0 62
55 Harry Pierson 17 37 0 2 3 0 1 0 60
56 Rob Eisenberg 4 1 14 5 3 12 13 7 59
57 Steven Sinofsky 0 0 1 22 31 3 1 0 58
58 John Papa 8 3 7 3 13 15 7 0 56
59 Patrick Smacchia 9 8 11 12 10 4 2 0 56
60 Jon Skeet 6 1 0 15 9 5 16 3 55
61 Steve Smith 0 1 3 22 8 6 15 0 55
62 Bnaya Eshet 0 0 0 12 18 9 5 9 53
63 Carl Franklin & Richard Campbell 0 0 3 0 7 17 16 10 53
64 Shawn Wildermuth 7 13 8 6 10 1 6 2 53
65 Rory Primrose 0 0 19 18 7 7 2 0 53
66 Willy-P. Schaub 0 0 0 3 17 9 19 4 52
67 Kirill Osenkov 1 18 12 5 9 4 3 0 52
68 S.Somasegar 0 0 0 0 15 16 17 3 51
69 Bart de Smet 21 24 5 1 0 0 0 0 51
70 Laurent Bugnion 0 0 7 17 13 4 6 3 50
71 Eric Battalio 0 0 0 0 6 15 27 2 50
72 Maarten Balliauw 2 0 6 16 7 15 3 1 50
73 Don Syme 1 2 4 15 13 14 1 0 50
74 Gil Fink 1 4 24 20 1 0 0 0 50
75 Cameron Skinner 11 11 21 6 1 0 0 0 50
76 Dave M. Bush 7 25 5 0 0 3 7 2 49
77 Stephen Forte 4 21 21 3 0 0 0 0 49
78 Michael Crump 0 0 2 11 13 4 13 5 48
79 Kim Spilker 0 0 2 16 7 9 14 0 48
80 Jura Gorohovsky 0 0 5 12 15 12 4 0 48
81 Wally McClure 0 4 11 17 15 0 1 0 48
82 Jason Zander 0 10 13 5 18 0 1 0 47
83 Chris Sells 2 22 10 4 9 0 0 0 47
84 Marcelo Lopez Ruiz 1 2 37 5 1 0 0 0 46
85 Nicholas Blumhardt 0 4 6 6 1 7 18 3 45
86 Rob Reynolds 0 12 11 9 6 3 4 0 45
87 Dmitri Nesteruk 0 1 1 1 14 20 2 5 44
88 Rory Becker 0 0 13 11 1 1 18 0 44
89 Filip Ekberg 0 0 0 0 11 21 11 0 43
90 G. Andrew Duthie 2 12 4 4 20 1 0 0 43
91 Grigori Melnik 0 0 8 13 8 8 4 1 42
92 Jonathan Wood 0 0 6 21 4 0 8 3 42
93 Peter Ritchie 3 1 14 6 13 2 3 0 42
94 Rob Conery 10 2 4 15 5 1 5 0 42
95 Gian Maria Ricci 0 0 1 37 4 0 0 0 42
96 Anoop Madhusudanan 0 0 14 8 11 8 0 0 41
97 Jeff Blankenburg 0 1 15 6 19 0 0 0 41
98 Rowan Miller 0 0 5 4 4 10 15 2 40
99 Hadi Hariri 0 1 4 21 11 2 0 0 39
100 Yochay Kiriaty 3 15 15 6 0 0 0 0 39

Data

You can download the data from http://trelford.com/DewDropTo2015.csv

I’d be interested in hearing about what you find :)