Moving away from flex to a handwritten lexer instead

Written by Walter on 1/1/2015

< >

Sometimes people hold on to standards and forget computer science evolves constantly. Way back I wrote an example interpreter called wsbasic and tried to make it as simple and readable as I possibly could and thereby wrote a lexer without using flex just to keep things simple. For MPL around 2003 I started tuning our flex based lexer class which was based on that (reason being the environment needed to be ported to multiple OS's like windows, linux, os x, etc.).

So yesterday while surfing around I found on the gnu site they've done the same. So that gives me a lot of hope it was a good path to follow and move away from flex+bison. New Languages and Language specific improvements in their advice says for C and Objective-C : The old Bison-based C and Objective-C parser has been replaced by a new, faster hand-written recursive-descent parser. Found the link here: Advice 2 .

Update in 2022 I've tried to recompile some of this old code. And now it's even harder. Bison++ is not found with homebrew and flex++ throws errors on the generated output files.

Luckily I've still got one of the latest stable versions of bison++ for download here : bison++-1.21-8.tar.gz

However when you do the standard ./configure + make you get the following error:

files.c:321:3: error: implicit declaration of function 'unlink' is invalid in C99 [-Werror,-Wimplicit-function-declaration]

I have to look through some of my backed up sourcecode from 2000-2005 and run another benchmark again to know for sure. Ofcourse, you won't have full regular expressions with this wsbasic style lexer class but from experience most of the time you don't need them. (you could do full regexps with boost libs also if you need a full regular expression engine but that would likely end up being slower again than a flex generated regexp parser).

At Emory University I gave a talk about MPL around 2005 and also had the same question from someone in the audience about why I didn't use flex for lexing. And my answer was mostly because we needed to be cross platform but also in our case it turned out to be quite clean and maintanable code as well. Hope I can explore a lot more in 2015 and maybe even more magic will happen ;).

Anyway just to keep all options open. It's still possible to get it all working after some patching.

So I've made 2 github repositories. One with a bison++ version that actually compiles and works (with some warnings) on macOS in 2022.

Another is a simple example that uses both flex++ and bison++ in a simple calculater program that parses expressions using flex++ and bison++:

$  ./calc
|                  Simple calculator                   |
|             Author: Walter Schreppers                |
 Give empty line to exit                                
 [a-zA-Z]* specifies a variable name for storing        
 ANS is special variable which always stores last result
 for float expressions add a decimal point in constants 
 start with option --help to see cmdline usage...       


You can clone and build calculator++ here:

However like I tried to explain in the beginning. Often (as for this above example) you can get away with a handwritten parser and lexer. And this greatly reduces issues to get your parser working across multiple platforms and also seems to be a lot eadier to maintain. For instance this bison++ was tricky to get up and running again (just to get that simple calculator++ example working) even though it worked fine a few years ago. A nice example of such a handwritten lexer and parser class is shown in my little wsbasic project. And I've pretty much never needed to update it in over 20 years and it still compiles without warnings:

It also pretty much runs the above examples just fine (and loads more as wsbasic is turing complete):

$ cat test.bas 
println 2*3+21
println 2*(3+21)

println c

$./wsbasic test.bas 

Back to archive