Wednesday, August 29, 2012

Making a programming language: Part 1 - how to start

Table of contents

Source: github

Source version for this post


Lately I gained some interest in programming languages and compilers. Those seem like quite some daunting monsters - just consider the amount of different features and more importantly, the vast infinity of possible programs.
So where to start? I have a course about compilers at my college, but I have to wait another year to enroll into that. So welcome Coursera. It's currently available only in self-study but that's enough for me. I should mention the Dragon Book, but I didn't read that(yet) so I can't comment.

Compiler is basicaly lexer -> parser -> optimiser -> code generator.

The regex approach

I made it through introduction and lesson on lexical analysis and recognized finite automata as something from college(thank your professor!) and finnaly understood the connection from them to regex in java and the like(professor mentioned regular expressions but no-one figured out what was the connection). 
Feeling empowered by the newly obtained knowledge I set out to make a lexer for my language.  I had no clear specification in mind since this is supposed to be a fun and creative project...I intended to just invent as I go.
My lexer was something like that(won't post the real code since I'm embarrassed)

  • create a bunch of java regex objects that all match to the start of the string
  • take the string and try to match it against all regexes
  • return an identifier object corresponding to the match
  • remove matched part of the string
  • loop
Yeah, pretty horrible.
I kinda abandoned this idea.

The recursive descent parser approach

By now I was into functional programming and scala language.  I also watched lesson on recursive descent on coursera. The code was fugly, bunch of c++ with pointer arithmetic and side effects. I wan't pretty functions :(
I considered doing a framework for this in scala or perhaps java but...
Scala (programming language)
Scala (programming language) (Photo credit: Wikipedia)
Enter scala's parser combinators. I suggest reading this if you arent familiar. One of the reasons scala is awesome. You get parser "generation" baked into the standard library. Granted, it's a bit slow, but who cares - this is a project for fun not for general usage. On the PLUS side you get productions in the laguage itself and parsing is almost like magic.
And in scaladoc you even get a nice code snippet. What more can a geek ask for.

Well....an AST would be nice. Fortunately scala also provides action combinators that allow to pattern match the parsers themself and map them into an AST. And it even isn't that complicated. 

I recreated the grammar from the scaladoc example and added the action combinators. The code block is kinda long, but I promise there is a short explanation at the bottom.


Notice the ^^ symbol. It sends the output from the parser  combinator into my mapping function(anonymous). And I just create the case class.
There is also an evaluator at the and...but that's in part 2.

next -> Part 2: something that kinda works

Enhanced by Zemanta

No comments:

Post a Comment