My programming escapades

Thursday, September 27, 2012

Making a programming language: Part 8 - going faster

Table of contents, Whole project on github

First of all, I wrote some tests for scrat. That was a bit challenging to get started. How do you test a language? I decided to write a bunch of programs that exercise and combine different language features. And then stare into code until I was absolutely sure they are correct. That's the problem with implementing a new language, nobody can tell you if your code is correct but your software, but you don't even know if software is correct.
So I wrote some test programs and parsed and evaluated them in my mind to produce final result. And then I wrote a class using ScalaTest that generates tests by iterating over the array of these tuples. Quite cool, oh and I also included some descriptions into the tuples. So I get nice output. I pondered this idea quite some time ago but finally implemented it a bit after functions and objects.

So now I can do crazy refactorings while still maintaining the language - I trust my tests! But first of all, some benchmarks. I decided to measure execution time of parsing and evaluation on the linked-list consturctor. To my surprise, parsing was quite slow. It started on about 800ms and dropped down to 200ms after a few executions(first time object creation and JIT I suppose). 200ms for 67 lines of code? That would mean about 3 seconds on a 1000 line file IF complexity is linear(which I later leard it isn't!). And that's 3 seconds on fifth run or so, first run(which is the only one when doing real stuff) would be 10 seconds +, unacceptable. (evaluation takes a few ms)

Research

At first I just gave it some thought. Well it's a recursive descent parser, and from what I remember it back traces on failure. So efficiency has a lot to do with grammar structure. And it won't be linear because you have to do (usually) more back tracing when dealing with longer input.

Internets here I come.

My thoughts were confirmed. I found some guys on forums complaining over speed and then Mr. Odersky himself commented something like this(from memory):

Well the parsers in scala standard library are more like an example how to do recusive descent, functional style. They are perfectly usable for parsing command lines but not for long files. You should use a parser generator for that.

I was bummed. The reason I didn't use a parser generator was this close integration that parser as a library could provide me. By the way implementation of Parsers is remarkably short, ~800 lines but most of them are comments. But it has quite some problems. A lot of object creation - every time a parsing function is invoked many object are created. Each parser is an object and each combinator is an object too. This in itself is not a big deal, but no memoization is performed, so it becomes a big deal. Now many objects are created on each try, so when backtracing you have to GC all these object and then recreate them. It's clean but it's no wonder it's slow.
Some time passed by and I accidentally found out about Packrat Parsing. This paper provides details but the gist of it is to use lazy evaluation and memoization to reduce object creation and speed things up.

Conversion and results

Conversion is dead easy. It's fully described in scala api documentation. Basically you mix in PackratParsers and change def : Parser[] = ... to lazy val : PackratParser[] = ... and that's it. Mixed in trait provides the necessary implicit conversions, lazy makes sure the creation is only done once and new implementation of parseAll does some clever parsing. Oh and you needn't convert all parsers, the paper says the optimal perfomance is achieved with the right mix of standard recursive descent parsers and packrat(packrat does include some overhead on specific grammars and inputs). But I just converted all and run the benchmark again. And behold...11ms. On the best run. More like 15 on average. But that's still more than a whole order of magnitude faster. And it should scale better.

next: don't know yet. I caught up with my implementation(finally!). I thinking about making functions stronger by relaxing the rules of invocation and doing some syntax sugar for lambdas. That and java interop. Or possibly compiling to bytecode.

Making a programming language: Part 7a - objects

Table of contents, Whole project on github

My goal in this post is for this to compile

func create(n){this}

and a call to it to return a reference to an object that contains "n".

Functions are a bit different from the rest of the language. Not by implementation or usage but by thought process behind designing them. I actually thought about objects and functions before implementing any of them. Considering my implementation of scopes(which I like) and shadowing I got this great idea that functions, scopes and objects are just many faces of the same thing. Or "could be" many faces of the same thing. Something a bit more powerful than a function being the "thing".

Let's see how that works. A function in this language needs a new local scope for every execution(not all functions, but this is a simplification because I don't care about performance, see previous post for details). New scope. New something. New. Bells should be ringing right now. I'm creating objects. I could just as well pass the reference to that object. I even have the reference. It's current scope - "this" in scala code. So I just need a language construct to access that. How about "this".

//in SScope body
map.put("this", this)

Not even a language construct, just a magic variable. You can even shadow it.
Anticlimactic? I hope so. The whole point of object is that it's just another view on function.

I could end the post here but I want to show why this is awesome(and therefore why I'm proud of it)

Privates

You don't even need access modifiers, you can use shadowing and nesting

func private(n) {
    that = this
    func (){ 
        func get() { that.n }
        func set(v) { that.n = v }
        this
    }()
}

This is a constructor that returns an object with functions get and set(but no n!). Outer object is available through its alias via nesting. And returned value is the last expression - an immediately invoked lambda(cumbersome syntax..gotta do something about that).
However, this is not bullet-proof

o = private(1)
o.that = func(){
           n=3
           this
         }
o.get()

Returns 3. Though knowledge of implementation is needed to execute such attack.

update: I kinda sorta forgot to include how I made that dot-access thingy-o.get() to work. Continuation below

next: using objects

Wednesday, September 26, 2012

I don't like banks

Image via CrunchBase

This kinda doesn't belong on this blog but I need a way to vent.

I wanted to buy a certain laptop. But no Slovenian company is selling that specific model. Or anything like it. Guenstiger to the rescue - luckily I had German classes in high school and I remembered enough to get by.

So I find this company(computeruniverse) who has it by good price, would ship it to me and even offers a (mostly translated) english page. Awesome. Not to the painful part - payment. I usually do business online through PayPal and on rare occasions directly with my credit card. But no can do. Limit exceeded. I quickly fire up my online banking portal but I can't order a higher limit because I don't have a premium account(I refuse to pay my bank for something that's mandatory today).

The bank

So I make a trip to the bank. Luckily it's just 15 min away using public transportation. First time being there(I moved to Ljubljana for college less than a year ago) I'm kinda impressed. It looks modern and stylish. And best of all, a good-looking well-dressed young lady sits down next to me to service me. I think "This is what a bank should look like". She fills out the application for me and is really nice over all. Props to you lady, sadly I forgot your name.
But here comes the first strike: she told me my limit will be raised starting next week - in 4 days. Reason: I opened my account at another branch. I'm like "wtf? it's the same bank?" but okay...I can hold another four days.
Let me make this clear. I explicitly told her I want this for one-time online payment.

Four days later

I try paying with PayPal. Failure. Oh heck...let's try credit card. Limit exceeded. They didn't raise my limit for single transaction. Strike two.

I'm tired of credit cards and waiting games. Luckily computeruniverse accepts advance payment by wire transfer. I do a bit of research and choose to trust them. Another trip to the bank. Grrr.

I get there, a bit angry and ask the teller to do the transaction and give her the details on my smartphone. She gives me a patronizing look and a form to fill out. Now listen to this. I needed to copy the details from my phone to the form - same data in many places(banks had high replication and eventual consistency before NoSQL :P). And she then typed the data into the computer. Putting my form into the drawer. And PRINTS out the same data on another form and asks me to sign it. Now I'm puzzled. WHY IN THE WORLD I HAD TO COPY THE DATA FROM THE PHONE TO THE FORM IF YOU COULD JUST TYPE AND LOOK AT MY PHONE!?!??!?!

Strike three.

Aftermath

I paid for my laptop successfully. But I also made an account at another bank. They gave me electronic banking for free and I'm in process of transferring my income to that account.

disclaimer: the bank at fault is NKBM. And for the record, my father has their online banking and it's quite bad.

Tuesday, September 25, 2012

Making a programming language: Part 6 - functions

Illustration of Function (mathematics). (Photo credit: Wikipedia)

Table of contents, Whole project on github

Long overdue, I'm finally writing about the most interesting part - user defined functions. Objects should be in the next post as they are a natural extension of what I'm about to do. And because I'm to lazy to write a post that long.

What's a function?

A function is something that takes parameters and returns a result. And I'm opting for side-effects as this is simpler to get something working that doing the whole referential transparency and IO monad(Haskell). Although it would be interesting to have sort of dynamically typed Haskell thingy...maybe in the future.

So - side-effect-ful functions mean function body is a block of expressions, not just an expression, if I'm to do useful side-effects. This also means I want lexical scope.

Body scope

Let's think about that. A function body needs a scope. It's parent scope is kinda obvious - it's the scope in which the function was defined. I want closures! It's local scope gets kinda tricky. I implemented read-only cascading and shadowing on write. If expressions can also have side-effects. So in different executions a parent variable may be shadowed or not. This means I cannot reuse the body scope, as the shadows need to be cleaned out(I'm considering an implementation that reuses it currently, but that's another story). As I'm not after performance I can simply create a new scope for each invocation of the function.

Parameters can be implemented as regular variables pre-put into the local scope before executing the function body.

Parsing

That was the hardest part. I had quite some problems in my grammar as I tried to introduce blocks. Mostly ambiguity and infinite recursion. I'll just post the interesting bits here - see the full commit if you're interested in details.

The function definition boils down to:

private def exprList: Parser[List[Expression]] = repsep(expr, "\\n+".
private def block: Parser[List[Expression]] = 
  """\{\n*""".r ~> exprList <~ """\n*\}""".
private def functionDef: Parser[FunctionDef] =
  "func" ~> identifier ~ ("(" ~> repsep(identifier, ",") <~ ")") ~ block ^^ {
    case id ~ args ~ body => FunctionDef(id, args, body)
  }

Oh yes, and since I use newlines as separators now, they aren't whitespace and I have to handle them explicitly.

override protected val whiteSpace = """[ \t\x0B\f\r]""".r

And later on I added lambdas which have optional identifier - so the only shcange is opt(identifier) in the parser.

Evaluation

It's just another node in the AST - a case class.

case class FunctionDef(name: Identifier, args: List[Identifier], body: List[Expression]) extends Expression

Now I needed something to execute the definition and create a function value in the enclosing scope. (this is in Evaluator class)

def createFunFromAst(arglist: List[Identifier], 
  body: List[Expression], scope: SScope): FunctionVarArg =
    (args: Any) => args match {
      case lst: List[Any] => {
        if (lst.length != arglist.length) {
          throw new ScratInvalidTypeError(
           "expected " + arglist.length + " arguments, but got " + lst.length)
        } else {
          val closure = new SScope(Some(scope))
          (arglist zip lst) foreach {
            t => closure.put(t._1.id, t._2)
          }
          apply(body)(closure)
        }
      }
      case other => throw new ScratInvalidTypeError(
        "expected list of arguments but got" + other)
    }

So, what am I doing here? I'm taking a part of the function definition(all except name - which is optional in the definition and not needed here) and the parent scope and then returning "FunctionVarArg" which is the same as native functions in standard library. This new function relies heavily on scala's closures(this would not be possible in this way in java!). First it checks if got a list of arguments(case clauses) or it throws an exception. Then it checks the arity. Scrat is dynamically typed, but not sooo dynamically typed. If everything matches up it creates a new scope("closure"), and inserts key-value pairs for arguments(zip+foreach). And then it evaluates it's body - apply(body)(closure). Mind you, this happens on every execution as createFunFromAst return a function value that, upon execution, does this.
Oh yes, there is also a case clause in Evaluator's apply that invokes createFunFromAst, again trivial.
Such functions are indistinguishable to native functions from scrat's point of view and are invoked by same syntax and by same code in Evaluator.

A sample

First thing I tried to implement(literaly) was fibonaci's sequence

func fib(n) { if n==0 then 1 else if n==1 then 1 else fib(n-1) + fib(n-2) }
println("20th fibbonacci number is", fib(20))

Excuse me for the ugly nested if's, but this was neccessary as I have not implemented < yet. But hey, it works.

Sneak peak

At this point I realised an awesome way to implement objects. With constructors like:

func create(n){this}

next objects and costructors

Friday, September 21, 2012

Power sets

It all started out when a friend of mine(check him out) told me a story about someone having an interview at Google. He was live coding and asked to implement a function that computes a power set of a given set. He totaly over-engineered it and after an hour of fiddling came up with 4-liner in python. Ok. Big deal. How hard can it be? I immediately started to brainstorm a solution of my own(and Matevž helped). Paraphrased train of thought below.

So what is a power set? Say a power set of set A. It's a set of all subsets of A. It's all possible combinations of including and excluding single elements. So it's like applying filters to the original set. Wait a sec, a set of size n has 2 to the n-th power subsets. It all matches up. N-bit number are said filters. So you count from 0 to 2^n-1 you enumerate exactly all the filters and therefore all the subsets. You just calculated the power set. This took about 30sec in real time.

Another minute to sketch the implementation.
You implement set as Lists. Let's say Java is the language of choice. An
y reasonable input would be shorter than 64 elements, at least on current machinery. So long can be used as counter. And now a nested for loop to iterate over elements and appending them if counter && (1 << n). Simple.

Won't post the code since I never wrote it.

The original blogpost

This happened in a lab at Faculty for Computer and Information science. On my way home I started thinkking....what is that 4-line implementation? It can't be my algorithm, that won't fit - it just isn't that elegant. Matevž also told me who was the original author and it was someone whose blog I was subscribed to(I still am, it's pretty awesome, check it out). So I searched for the original blogpost. And found it. It wasn't four lines. And it was(as Swizec admits) wrong.

def powerset(set):
      binaries = [bin(a) for a in range(2^len(set))]
      power = []
      for yeses in binaries:
        subset = []
 yeses = str(yeses)
 for i in range(len(yeses)):
                 if yeses[i] == “1”:
          subset.append(set[i])
 power.append(subset)
      return power

But more importantly, it's essentially my algorithm. (And it didn't took him an hour, read the post, it's interesting)

And then I realised. This is ugly. Quite ugly. It may be efficient and allow for doing iteration and evaluating it lazily, but oh boy it's ugly. Can't it be done in a simpler way?

Recursive solution

So by asking how I compute a power set I obtained the imperative algorithm above.

Balanced tree (Photo credit: Wikipedia)

Time to change the question. What is a power set?

It's a set of leaves of a balanced binary decision tree. Every inner node represents a decision to either take or leave out the element at respective index(I assume elements are indexed). That is root node responds to first element, it's children to second element, etc. Every leaf can be computed by following the decision path. It's a recursive structure. So the tree can be decomposed to left and right subtree and recombined. An this is repeated until you get to leaves. So a power set of A is an union of the power set of "A minus the first element" and the same power set again but with every subset added the first element. Subtrees are represented by the two powersets and the recombination by adding head element to one set of sets and then doing the union. And to stop the recursion: power set of an empty set is a set containing just an empty set.

My attempt to formally express what I just written:

$A = \{a_{1}, a_2{2}, ... , a_{n}\} \newline \newline P(A) = \left\{\begin{matrix} \{ \{\} \} & ;& A=\{\}\\ P(A-a_{1}) \bigcup \{ s \bigcup a_{1} | s \epsilon P(A-a_{1})\} & ; & otherwise \end{matrix}\right.$

Now let's do that in code, my language of choice now is scala

def power[A](set: List[A]): List[List[A]] = set match {
    case Nil => List(Nil)
    case head::tail => power(tail) flatMap (set => List(set, head::set))
}

Yay just four lines. and the last one barely matters. But I can do even better. The same code in haskell

power [] = [[]]
power (head:tail) = power tail >>= \set ->[set, head:set]

Both code snippets don't comply exactly to the equation. I swapped in flat-map instead of union. It gets rid of repetition(but I don't know the math notation, sadly). So you apply the lambda that creates both sides of the union to your list and then flatten it to produce the same result.

This code may look a bit cryptic at first sight but once you parse the syntax you can grasp the intention much more quickly than the imperative version. At least I can. Recursion is quite nice once you get used to it.

Thursday, September 27, 2012

Making a programming language: Part 8 - going faster

Research

Conversion and results

Related articles

Making a programming language: Part 7a - objects

Privates

Related articles

Wednesday, September 26, 2012

I don't like banks

The bank

Four days later

Aftermath

Related articles

Tuesday, September 25, 2012

Making a programming language: Part 6 - functions

What's a function?

Body scope

Parsing

Evaluation

A sample

Sneak peak

Related articles

Friday, September 21, 2012

Power sets

The original blogpost

Recursive solution

Related articles