Just before Christmas I came across some Java source code by “Uncle” Bob Martin aimed at “demystifying compilers” which expends about 600 lines of code to parse the following simple finite state machine:
Actions: Turnstile
FSM: OneCoinTurnstile
Initial: Locked
{
Locked Coin Unlocked {alarmOff unlock}
Locked Pass Locked alarmOn
Unlocked Coin Unlocked thankyou
Unlocked Pass Locked lock
}
For fun I knocked up a broadly equivalent parser in F# using FParsec which was just under 40 lines of code, and posted the code on this blog.
The post generated some interest, and I even got a mention on Twitter from “Uncle” Bob himself:
I’d not seen SNOBOL before, but given Mr Martin’s recommendation I popped over to the SNOBOL page on WikiPedia and liked what I saw:
SNOBOL rivals APL for its distinctiveness in format and programming style, both being radically unlike more "standard" procedural languages such as BASIC, Fortran, or C.
SNOBOL first appeared in 1962 and appears to have been popular in US Universities as a text manipulation language in the 70s and 80s. The language supports pattern matching over text combined with assembler like control flow using labels and goto (like C#).
As a text manipulation language, SNOBOL code feels a little more readable than the new norm - regular expressions .
If you’d like to take it out for a spin there’s an open source SNOBOL IDE, with syntax colouring support, for Linux and Windows called TkS*LIDE.
SNOBOL Interpreter
Nowadays when I want to learn a new language I often start by implementing it, to this end over the course of about a week I built a SNOBOL interpreter with just enough functionality to run the samples on the Wikipedia page along with some more involved samples from other sources.
The SNOBOL interpreter is about 400 LOC and available as an F# Snippet.
Finite State Machine in SNOBOL
Armed with a basic knowledge of SNOBOL, I could now answer the question, is the implementation even smaller in SNOBOL.
The answer is a resounding yes, and here’s the 34 lines of SNOBOL code that proves it:
Disclaimer: unlike the FParsec version there’s no error handling/error messages and the FSM must be layed out in a specific format.
Conclusions
The SNOBOL finite state machine parser, like the FParsec based parser, fits on a page and is an order of magnitude shorter than the broadly equivalent clean Java implementation written by Uncle Bob Martin that aimed to demystify compiler writing.
Will I be switching from FParsec to SNOBOL for parsing? Probably not, FParsec is at least as expressive, provides pretty good error messages for free and runs on the CLR.
Special thanks to Uncle Bob Martin for the SNOBOL tip