Hindley-Milner type inference in Scala

I’m working on a spreadsheet application at the moment. (Very exciting.) Part of the implementation obviously includes an expression language (so you can write things like total = sum(numbers) or vat = price × 17.5%).

Part of the design is to disallow things like "text" ÷ 11 or apples + oranges, and for that I need a type system.

So I’m investigating type systems, and rules for inferring types, and I’m looking at the algorithm they call Hindley-Milner type inference. I found an implementation of the algorithm in Perl by Nikita Borisov. This was in turn based on a Modula-2 implementation described in a paper by Luca Cardelli, Basic Polymorphic Typechecking (1987/’88). Given that I read maths only very painfully and slowly, it’s a very clear and readable paper.

I have reimplemented the algorithm in Scala (the language I’m using for my application).

Because Scala is itself a statically-typed language, some of the logic becomes clearer than the Perl version (for example, it is obvious where type variables are expected as opposed to type terms). Scala is also somewhat syntactically lighter than Perl, and a lot lighter and more expressive than Modula-2, so you may find it easier to read too.

The essential algorithm is elegant: given an expression in the form of an abstract syntax tree (AST), it recursively creates a tree of types in the expression, inserting placeholder ‘type variables’ for all the unknowns. It then ‘unifies’ sub-types, for example, ensuring that a function call’s result type is the same as the function definition’s result type. The final unification creates the most general type tree possible which accurately captures the expression type. The final unification may include still-unbound type variables, which would indicate that the expression is polymorphic in these type variables.

My code is available for download in the hope that others find it as useful as I found Luca Cardelli’s paper and Nikita Borisov’s Perl implementation:

http://dysphoria.net/code/hindley-milner/HindleyMilner.scala

You can run it as a script to see it analyse some example expressions: scala HindleyMilner.scala

Note: I believe that there was a mistake in the original Perl code; when unifying two variables, it tried to ensure that generic type variables were always bound to non-generic ones, not the other way about. This was in order to satisfy the requirement “In unifying a non-generic variable to a term, all the type variables contained in that term become non-generic.” However, it does not matter the order in which they are bound. Once the ‘bindee’ is further bound to a term, they both become bound to the same term. The original code omitted a call to prune in the method occursintype which (I believe) lead to a fault, for which the mistaken ‘fix’ was added.

Update 15 Sep 2013: Now compatible with Scala 2.10!

17 thoughts on “Hindley-Milner type inference in Scala

  1. Christian Klauser

    Hi,
    I would like to thank you for publishing your implementation of the hindley-milner algorithm. It has been a great help!

    I’ve been playing around with the idea of using type-inference myself for a while now but couldn’t make sense of Milner’s original paper, in particular I didn’t understand the unification step. (Even though it is actually quite simple in an impure environment)

    I have since implemented my own version in C# (and yes, it is a pain to work without tuples, type inference, pattern matching and discriminated unions) and am now working “more advanced” features like pattern matching and case expressions.

  2. Andrew Post author

    Glad to be of help, Christian.

    Yeah, took me a while (and the act of debugging my own implementation) before I started to really understand it.

    Cheers,

    –A

  3. Peter Verswyvelen

    Hi Andrew & Christian,

    Great work!

    I’m also trying to understand HM type inference from a pragmatic point of view.

    I only know Scala theoretically, but I have strong skills in C#.

    So Christian, are you also sharing your C# implementation?

    Thanks,
    Peter

  4. Bill Hails

    Hi.
    I’m also investigating HM type inference, trying to code it into a language I’m developing as a hobby project. I’m a little concerned that the “prune” operation figures in the algorithm at all to be honest. Pruning logic variables is an optimisation step, and if it affects the outcome in any way then there is a flaw somewhere else in the code.

    In the Perl from Nikita Borisov, I *think* the flaw is in the direct assignment to a logic variable instance:

    $type1->{instance} = $type2;

    on line 271 in the code I found. Assigning to a logic variable is best done only by a logic variable method that can protect its own integrity.

    I’m also a bit disappointed having read the Cardelli paper (my maths is probably worse than yours) to find that the algorithm requires a distinction between let and lambda. I’m already firmly in the Scheme camp where “let x = y in z” gets rewritten to “fn (x) {z} (y)” (i.e. let is just lambda) before the AST is even built.

    Still, thanks very much for your implementation, I was getting hung up on the Perl one, and any thoughts about this?

  5. Andrew Post author

    Hi Bill,

    It’s been a while since I’ve looked at this, but I think I’m right in saying:

    The core type inference algorithm certainly does not require a distinction between let and lamda. That is: you’re perfectly free to add in a rewrite step which transforms ‘let’ into ‘lambda’, and remove ‘let’ handling by the type inferencer completely. The included AST is just a little toy language to show off that the type inference works.

    Line 271 of the Perl code implements unifies a type variable with its inferred value, so it’s quite essential :)

    You could wrap it in an setter method if you like (is that what you meant?), but that doesn’t change the behaviour of the algorithm.

    If more generally you mean that you’d rather have an implementation which only used immutable objects, and did not have the mutation which you point out on line 271… that too is possible. As Christian Klauser points out above, it’s trickier in a pure functional environment, but quite possible; you just have to maintain a map of type variables to their bound types. The Cardelli paper calls this the ‘set of assumptions’ (page 13). However, the Perl implementation, and my Scala implementation, mutate the type variable objects instead of maintaining a separate map.

    Hope this helps!

    –A

  6. Bill Hails

    Hi Andrew

    > Line 271 of the Perl code implements unifies a type variable with its
    > inferred value, so it’s quite essential :)

    Absolutely agree, but… :-)

    > You could wrap it in an setter method if you like (is that what you
    > meant?), but that doesn’t change the behaviour of the algorithm.

    Yes that’s what I meant. My real argument is that it’s dangerous *not* to wrap the assignment to a logic variable in a setter method.

    In the standard(?) implementation of Logic Variables, they arrange themselves in chains. For instance assume some language with LVs:

    # create an empty LV
    declare A;
    # result A->(empty)

    # create another empty LV
    declare B;
    # result: B->(empty)

    # unify A with B
    A = B;
    # result: A->B->(empty)

    # unify A with 1
    A = 1;
    # result: A->B->1

    The point then is that naive assignment directly to A here would not achieve the desired result, instead you would end up breaking the association of A and B giving

    A->1
    B->(empty)

    The setter method must chain down the list from A to B until it finds a pointer to (empty) and assign there.

    “Pruning” as I understand it is just that action of chaining down the list, but it is presented in the Perl algorithm at least as being a separate step from the assignment. So if as in the perl case it was forgotten, we get the bug you detected.

  7. Andrew Post author

    Bill,

    I see what you’re saying.

    Though in actual fact, the missed call to prune was in ‘occursintype’ which is used in a couple of places, notably ‘fresh’ (which creates new type variables from generic ones if necessary), not just in ‘unify’.

    The way the algorithm works, the assignment to ‘instance’ should never be performed on anything other than an unbound type variable. So your setter method could just assert(instance == null); instance = newValue;. The ‘Type’ class is already tightly bound to the type inference algorithm, so you might as well implement ‘unify’ and ‘prune’ (and possibly some other functions) as methods on Type, instead of exposing ‘instance’ at all.

    Cheers,

    –Andrew

  8. Andrew Post author

    Actually, let me retract that: ‘Type’ is not necessary coupled to the type inference algorithm, so, yes, a ‘bind’ method, which checks that the variable is not already bound (or which walks the list to the last unbound variable in the chain) before assigning ‘instance’ would make sense.

  9. Christian Klauser

    I keep my source code in a repository at assembla:

    http://www.assembla.com/wiki/show/tincal

    It is not a working compiler and a lot of things are missing. I haven’t settled on how to encode product and sum types which in turn determines how “value deconstructors” will be implemented. Those in turn are necessary for pattern matching.

    At the moment, it is a personal project that I work on from time to time. Don’t know how far I’ll get.

  10. Dima

    Hi Andrew,

    I tried to run ur code using eclipse under windows. I got the following error

    java.lang.NoClassDefFoundError: HindleyMilner

    Can you help me regarding this issue. Besides does the code enables inputting an expression and getting its type as output.
    Regards

  11. Andrew Post author

    Hi Dima,

    ’Fraid I can’t help; I don’t use Eclipse. Have you tried it from the command-line? Perhaps Eclipse is confused because it is written as a script (see line 304)…?

    The code outputs the type of an expression, given an abstract syntax tree representing that expression. However, it does not parse string expressions to create the AST, if that’s what you’re asking. So you can’t run it on arbitrary expressions from the command-line, or anything like that.

    Hope that helps,

    –Andrew

  12. davips

    How hard would it be to convert it to Scala 2.10.x?
    Is the line below correct ?
    var _nextVariableName = ‘α';

  13. Andrew Post author

    Hi there. Should be easy enough to convert to Scala 2.10, but I leave that as an exercise for the reader :)

    The weird line you spotted (and there are a few others) was a result of the source code being delivered with the wrong character encoding. (Should be UTF-8 of course.) I’ve fixed it. The line you mentioned should read:

    var _nextVariableName = 'α';

    If it still looks wrong in your web browser, try choosing a ‘Text Encoding’ of ‘UTF-8’ in the browser.

  14. davips

    Scala 2.10.0 code:
    I just had to change two lines to add [Type]:
    val Integer = Oper(“int”, Array[Type]())
    val Bool = Oper(“bool”, Array[Type]())

  15. Andrew Post author

    Thanks Davi. Took me rather longer than it should have, but I’ve modified the code to be Scala 2.10-friendly.

    Rather than changing Array() to Array[Type], I instead changed it to Seq(), which I think is a bit neater.

  16. Pingback: A Hindley-Milner type inference implementation in OCaml | Kwang Yul Seo

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>