Friday, August 31, 2012

Making a programming language: Part 3 - adding features

Table of contentsWhole project on github

So now I have a repl that can evaluate stuff like
(2+3)*(7/2-1)
Not much of a programming language - more like a calculator, and not even a good one. Lets add some features!

Constants

Like pi, e and such. I have to change the grammar to match identifiers to.
Now I have
private def factor: Parser[Expression] = number | ("(" ~> expr <~ ")")

And I change that to
private def factor: Parser[Expression] = value | parenExpr 

private def value: Parser[Expression] = number | identifier

private def identifier: Parser[Identifier] = "[a-zA-Z]\\w*".r ^^ {
  s => Identifier(s)
}
and I also added a new token type
case class Identifier(id: String) extends Expression

and enabled this in evaluator
case Identifier(name) => {
  StdLib(name) match {
    case Some(v: Double) => v
  case _ => throw new SemanticError(name + " not found")
  }
}
StdLib is just a map object. I went the python way - variables(and constants) are entries in a global dictionary. Just a decision I made whil implementing this. As I said, I don't have a plan, I don't know what I'm doing and I don't know how stuff is done. How hard can it be?! (Top Gear anyone?)

Exponentiation

Another math feature. It's more of a test for myself to see if I understand grammars. Especially how to make a grammar not left-recursive. Because apparently such grammars don't work with RDP. I turned out I don't understand grammars.
private def exponent: Parser[Expression] = (value | parenExpr) ~ "^" ~ (value | parenExpr) ^^ {
  case a ~ "^" ~ b => Exponent(a,b)
}

private def factor: Parser[Expression] = (value ||| exponent) | parenExpr

The ||| operator is an ugly hack. It tries both sides and returns the longer match. By the time I was writing this I didn't know that order is importat. If I just wrote exponent | value it would have worked, because expoenent would match a value anyway and then failed on missing ^.

Token and evaluation(uses math.pow) for this are quite trivial.

Function calls

case class ExpList(lst: List[Expression]) extends Expression
case class FunctionCall(name: Identifier, args: ExpList) extends Expression
Simple: function call is a name and list of expressions to evaluate for arguments(wrapped because even an expression list is an expression):

Parser:
private def arglist: Parser[ExpList] = "(" ~> list <~ ")"

private def functionCall: Parser[FunctionCall] = identifier ~ arglist ^^ {
  case id ~ args => FunctionCall(id, args)
}

private def value: Parser[Expression] = number | (identifier ||| functionCall)
Again, I was having trouble - parser just didn't work and resorted to |||. functionCall should come before identifier.

Evaluating this is more interesting. I decided to make functions be values too for obvious reasons -> higher order functions(I'm into functional programming, remember?). So function values must be stored in same "namespace". StdLib(the only "namespace") required to become of type Map[String,Any]. I will have to do pattern matching anyway since this will be dynamic-typed language. (Yes this is a plan, I think it's easier to implement. Static typing ftw, but that's next project). And I needed a type for function values to pattern match on - I went with Any=>Any and sending in List(arg0,arg1,...) doing more pattern matching inside the function. Will be slow but hey...dynamic typing!

from evaluator
case FunctionCall(name, args) => {
  StdLib(name.id) match {
    case Some(f: FunctionVarArg) => f.apply(apply(args))
    case None => throw new ScratSemanticError("function " + name + "not found")
  }
}

and and example function in StdLib
type FunctionVarArg = Any => Any

lazy val ln: FunctionVarArg = {
  case (d: Double) :: Nil => math.log(d)
  case other => throw ScratInvalidTypeError("expected single double but got " + other)
}

Conclusion

As clearly illustrated above, not planning your grammar results in constant changes in many places. So if you're doing something serious just make the whole fricking grammar on a whiteboard beforehand. Seriously. 

Anyway..now I still only have a calculator, but a much more powerful one. I can write expressions like
e^pi
ln(10+2)
1+2*3/4^5-log(e)
But that's nearly not enough. I want to be Touring-complete an ideally to be able to compile/interpret itself.

Enhanced by Zemanta

Thursday, August 30, 2012

Making a programming language: Part 2 - something that kinda works

Table of contents, Whole project on github, relevant version on github

In the Part 1 I posted a working repl(read-eval-print-loop) for simple math expressions but I kinda cheated and only explained how I built the AST.

AST elements

Just scala case classes
sealed trait Expression
case class Number(n: Double) extends Expression
case class Add(left: Expression, right: Expression) extends Expression
case class Subtract(left: Expression, right: Expression) extends Expression
case class Multiply(left: Expression, right: Expression) extends Expression
case class Divide(left: Expression, right: Expression) extends Expression

Parser combinators revisited

I use power of scala library to cheat a bit and do lexing and parsing in one step.

Basic parser combinators from scala api documentation, everything you need to define productions in your grammar.
p1 ~ p2 // sequencing: must match p1 followed by p2
p1 | p2 // alternation: must match either p1 or p2, with preference given to p1
p1.?    // optionality: may match p1 or not
p1.*    // repetition: matches any number of repetitions of p1

However, to transform the matched string to an AST you need something more
private def number: Parser[Expression] = """\d+\.?\d*""".r ^^ {
  s => Number(s.toDouble)
}

Firstly, in the RegexParser class is an implicit conversion from Regex to Parser. So I could write
private def number: Parser[String] = """\d+\.?\d*""".r

Notice the type annotation. Inferred type would be Regex, since this function is private I can still have implicit conversion, but I rather have all parsers be of type Parser[_].

The ^^ part is an action combinator - a map function. But as ^^ is only available on Parser instances my regex has already been implicitly converted. So in my lambda I already know(scala can infer) the type of s to be String.

One last example
private def term: Parser[Expression] = factor ~ rep(("*" | "/") ~ factor) ^^ {
      case head ~ tail => {
        var tree: Expression = head
        tail.foreach {
          case "*" ~ e => tree = Multiply(tree, e)
          case "/" ~ e => tree = Divide(tree, e)
        }
        tree
      }
    }

Function rep is also from Parsers class matches any number of repetitions(including 0). Here's the type signature
def rep[T](p: ⇒ Parser[T]): Parser[List[T]]

The catch here is that ~ returns a single parser that matches both sides, but fortunately it can be pattern matched to extract both sides. And I can even use meaningful names since I am in fact matching a head with an optional tail.
Inside the case statement I used more imperative style to build a tree, nothing fancy here. Folding was a bit awkward in this case for me(maybe I'm just incompetent) so I went with a for each loop.

Apply the same pattern to +/- part and you have yourself a tree.

Oh, yeah...and the top parser function. A bit changed from last time, to yield useful error messages
def apply(s: String): List[Expression] = parseAll(expr, s) match {
    case Success(tree, _) => Right(tree)
    case NoSuccess(msg, _) => Left(msg)
}

Evaluator

Now what I promised in previous post - evaluation. At first I planned on compiling the code for JVM but I just wanted to see some results first so I decided to do a simple interpreter, no compiling whatsoever - for now.

My first approach was to modify the Expression
sealed trait Expression{
    def eval(): Double  
}

and implement this on all case classes hardcoding the double type and coupling AST representation and evaluation together. Yikes. Granted, it worked, but what an ugly way to do it.

So I did a hard reset(always use git folks! or something similar) and went about doing a standalone evaluator. Since scala's pattern matching skills are awesome and I'm already using case classes why not just do that.
object Evaluator {

    import Tokens._

    def apply(e: Expression): Double = e match {
      case Number(n) => n
      case Add(l, r) => apply(l) + apply(r)
      case Subtract(l, r) => apply(l) - apply(r)
      case Multiply(l, r) => apply(l) * apply(r)
      case Divide(l, r) => apply(l) / apply(r)
    }
  }

This is all the code. Just pattern matching and recursion. But yes, still hardcoded double as the data-type. Looking back, not a great decision...but hey I got something working and this is fun.


next time: Adding features(contsnts, exponents, function calls)
Enhanced by Zemanta

Wednesday, August 29, 2012

Making a programming language: Part 1 - how to start

Table of contents

Source: github

Source version for this post


Lately I gained some interest in programming languages and compilers. Those seem like quite some daunting monsters - just consider the amount of different features and more importantly, the vast infinity of possible programs.
So where to start? I have a course about compilers at my college, but I have to wait another year to enroll into that. So welcome Coursera. It's currently available only in self-study but that's enough for me. I should mention the Dragon Book, but I didn't read that(yet) so I can't comment.

Compiler is basicaly lexer -> parser -> optimiser -> code generator.

The regex approach

I made it through introduction and lesson on lexical analysis and recognized finite automata as something from college(thank your professor!) and finnaly understood the connection from them to regex in java and the like(professor mentioned regular expressions but no-one figured out what was the connection). 
Feeling empowered by the newly obtained knowledge I set out to make a lexer for my language.  I had no clear specification in mind since this is supposed to be a fun and creative project...I intended to just invent as I go.
My lexer was something like that(won't post the real code since I'm embarrassed)

  • create a bunch of java regex objects that all match to the start of the string
  • take the string and try to match it against all regexes
  • return an identifier object corresponding to the match
  • remove matched part of the string
  • loop
Yeah, pretty horrible.
I kinda abandoned this idea.

The recursive descent parser approach

By now I was into functional programming and scala language.  I also watched lesson on recursive descent on coursera. The code was fugly, bunch of c++ with pointer arithmetic and side effects. I wan't pretty functions :(
I considered doing a framework for this in scala or perhaps java but...
Scala (programming language)
Scala (programming language) (Photo credit: Wikipedia)
Enter scala's parser combinators. I suggest reading this if you arent familiar. One of the reasons scala is awesome. You get parser "generation" baked into the standard library. Granted, it's a bit slow, but who cares - this is a project for fun not for general usage. On the PLUS side you get productions in the laguage itself and parsing is almost like magic.
And in scaladoc you even get a nice code snippet. What more can a geek ask for.

Well....an AST would be nice. Fortunately scala also provides action combinators that allow to pattern match the parsers themself and map them into an AST. And it even isn't that complicated. 

I recreated the grammar from the scaladoc example and added the action combinators. The code block is kinda long, but I promise there is a short explanation at the bottom.


Notice the ^^ symbol. It sends the output from the parser  combinator into my mapping function(anonymous). And I just create the case class.
There is also an evaluator at the and...but that's in part 2.

next -> Part 2: something that kinda works

Enhanced by Zemanta