Monday, November 5, 2012

Cool Monday: Hindley-Milner on a dynamic language

So I'm getting into type theory. Slowly. Note to self: read a proper book on this topic. I'm getting familiar with it through some practical applications. Namely scala and haskell. 
That same discussion about design patterns also included dynamic vs static typing. And I asked twitter about it. HairyFotr link this amazing talk about type inference to me. Basically there are two conclusions to be drawn
  • Every static typed language should have at least limited type inference. It's compiler's job to do so and quite trivial to implement.
  • Properly done static typed language provides all features the that dynamic typed languages can. Safely.
As I'm (still) implementing a language that happens to be dynamic(because I was too lazy to look-up how to do type checking) second point interests me more. 

Can I turn my language into a static one without changing syntax(adding type annotations) and losing features? That would be awesome!
After a day of thinking, answer seems to be YES!

Global type inference in a nutshell

So I want a dynamic-like syntax(no types anywhere). Good news is I don't have nominal types, so I can use inference to get structural types. 
Java, C# and many other mainstream languages use nominal typing at the level of the vm. This means that type A is a subtype of type B precisely when name of A is a subtype of name ob B. For example 
interface Foo{
    void method(int a);
}


interface Bar{
    void method(int b);
}
If you have a method that takes in an instance of Foo, you cannot pass an instance of Bar. Because Bar isn't subtype of Foo. Even though they are structurally the same. Fun fact: because JVM is designed like this, scala cannot have global type inference.
See, global inference(or "type reconstruction") looks at usages and reconstructs properties and structure.  Another example
a = b.c + 1
That would be legal scrat code(or python, ruby or many other things). You take b's property named c, add one to it and assign the result to a. So b must have property c. That's the first structural requirement. From the + operator you can see that this c must be numeric. And from this follows that a is also numeric as it's the result of this computation.
If this were a body of a method and b it's parameter, type requirement for be would be(in made up syntax) b: { c: Number } - object with member of Number type. But that doesn't give you no class names.
So why is this global inference? It goes through the whole block of code(usually a function body) and puts in stub types where it doesn't have enough info and then solves the system of requirements and substitutes back.

Possible problems

First of all, I have objects. This means I have to reconstruct object structure. This shouldn't be too bad. And you can quite easily figure out that A is subtype of B if set of requirements for A is a superset of requirements for B. 
Then there's "mutable types". I concluded that following code should be illegal
a = 1
a = "two"
as type of a should remain the same as in first assignment. This lets you reason about the code much more. But there's a hidden mutability. I use regular functions as object constructors returning a keyword "this" that evaluates to current scope. But throughout the body you still access this scope and it's type(it's an object after all) changes with every new (first) assignment and function definition. But this should be tracked through all possible code paths. Only problem is an if expression. Type of an if(and it's side-effects) can only be common super type of both then and else branch - an intersection of requirement sets. In a dynamic language you can reason about conditions and conclude when something should definitely be in scope and use it. Automatic reasoning about conditions? This could turn out tricky. Perhaps in later implementation.
An there's third an final problem(that I can see). Infinite types. I have not seen a practical usage but it doesn't work and that bothers me. Dynamic code in scrat just works, but same code translated to haskell yields a compiler error - "can't instantiate infinite type". But apparently infinite types can be detected, so maybe I can find a way to present them and it will compile. More about this soon.

Conclusion

There are some problems but I believe I can make it work. It would be super awesome to have a language that feels dynamic but gives you all benefits of static typing. Compilers should do the hard work after all! And they should be capable of inferring general enough types that all correct programs type check.
Or am I missing some important aspect that works only with dynamic types?

Otherwise...code coming soon.
Enhanced by Zemanta

12 comments:

  1. Try the Go programming language. It's known to be a statically typed that feels like a dynamic one.

    ReplyDelete
    Replies
    1. I might give t a try. Looking at the samples at golang it seems to me that go has only local type inference. You still need type parameters for function parameters. Anyway, this is about learning. But if there exists production ready language with such feature I would gladly use it

      Delete
  2. BTW, have you tried the Dynamic type in scala? The rules for implementation look funny at first, but you can then write obj.something, and decide what to do, based on what the string "something" is...

    But honestly, I haven't found a real use for it yet :)

    ReplyDelete
    Replies
    1. I heard dynamic can be used as glue with dynamic languages. And quickly concluded it must be just sugar for Map. Perhaps it need to look into it

      Delete
  3. You might be interested in RPython, the toolchain the pypy developers have made to compile their python interpreters to C and JVM and .net bytecode. It uses global type inference, but it is really, really slow, but it is dynamic and it is python (as in: any rpython program will run just fine in the interpreter) and they have the added bonus that it can generate a JIT compiler for you.

    However, it's by no means meant for general purpose programming. It is way too painful for that :) It's good enough for writing interpreters of dynamic languages, though.

    ReplyDelete
    Replies
    1. NICE!
      but it's a subset. i'm going for full language. still, this is quite close to what i'd like

      Delete
  4. I recommend two papers:
    Pluggable Types: http://bracha.org/pluggable-types.pdf

    Gradual Typing for Objects: http://ecee.colorado.edu/~siek/gradual-obj.pdf

    ReplyDelete
    Replies
    1. thanks for that. i definitely need to do some reading

      Delete
  5. >I concluded that following code should be illegal
    >a = 1
    >a = "two"
    > as type of a should remain the same as in first assignment.

    That doesn't need to be that way - and if it were, you would just have found something that wouldn't give you your second conclusion (from the beginning of your text): Then, a dynamic language would be most definitely more powerful than your static language (as you can plainly see).

    What it could do instead (and people usually do) is make the type of a a "sum type".

    So
    a = 1

    a :: Int provisional type

    a = "hello"

    a :: (Int|String) corrected provisional type

    So it uses (Int|String) also for the first assignment, and the second assignment. Problem solved.

    >Type of an if(and it's side-effects) can only be common super type of both then and else branch - an intersection of requirement sets.

    The type should be a sum (union) of the type of the "then" expression and the type of the "else" expression.

    ReplyDelete
    Replies
    1. i was looking at this through the eyes of scala hacker..but i see your point. union types are nice. but the problem is you must then preform tests in order to access data inside such variables. even in a dynamic lanugage. if something can be of different types, you're just asking for trouble if you don't preform a test

      Delete
  6. "because JVM is designed like this, scala cannot have global type inference": This is not true, I believe. It is up to the language to choose type inference, not because of JVM. For example, Frege, a Haskell-like language, compiles to JVM.

    ReplyDelete
    Replies
    1. there was a debate about this on reddit.
      jvm has an internal type system. a nominal one. you can do hacks do work around this but you loose java interop.

      Delete