I’m working on a spreadsheet application at the moment. (Very exciting.) Part of the implementation obviously includes an expression language (so you can write things like total = sum(numbers)
or vat = price × 17.5%
).
Part of the design is to disallow things like "text" ÷ 11
or apples + oranges
, and for that I need a type system.
So I’m investigating type systems, and rules for inferring types, and I’m looking at the algorithm they call Hindley-Milner type inference. I found an implementation of the algorithm in Perl by Nikita Borisov. This was in turn based on a Modula-2 implementation described in a paper by Luca Cardelli, Basic Polymorphic Typechecking (1987/’88). Given that I read maths only very painfully and slowly, it’s a very clear and readable paper.
I have reimplemented the algorithm in Scala (the language I’m using for my application).
Because Scala is itself a statically-typed language, some of the logic becomes clearer than the Perl version (for example, it is obvious where type variables are expected as opposed to type terms). Scala is also somewhat syntactically lighter than Perl, and a lot lighter and more expressive than Modula-2, so you may find it easier to read too.
The essential algorithm is elegant: given an expression in the form of an abstract syntax tree (AST), it recursively creates a tree of types in the expression, inserting placeholder ‘type variables’ for all the unknowns. It then ‘unifies’ sub-types, for example, ensuring that a function call’s result type is the same as the function definition’s result type. The final unification creates the most general type tree possible which accurately captures the expression type. The final unification may include still-unbound type variables, which would indicate that the expression is polymorphic in these type variables.
My code is available for download in the hope that others find it as useful as I found Luca Cardelli’s paper and Nikita Borisov’s Perl implementation:
http://dysphoria.net/code/hindley-milner/HindleyMilner.scala
You can run it as a script to see it analyse some example expressions: scala HindleyMilner.scala
Note: I believe that there was a mistake in the original Perl code; when unifying two variables, it tried to ensure that generic type variables were always bound to non-generic ones, not the other way about. This was in order to satisfy the requirement “In unifying a non-generic variable to a term, all the type variables contained in that term become non-generic.” However, it does not matter the order in which they are bound. Once the ‘bindee’ is further bound to a term, they both become bound to the same term. The original code omitted a call to prune
in the method occursintype
which (I believe) lead to a fault, for which the mistaken ‘fix’ was added.
Update 15 Sep 2013: Now compatible with Scala 2.10!
Hi,
I would like to thank you for publishing your implementation of the hindley-milner algorithm. It has been a great help!
I’ve been playing around with the idea of using type-inference myself for a while now but couldn’t make sense of Milner’s original paper, in particular I didn’t understand the unification step. (Even though it is actually quite simple in an impure environment)
I have since implemented my own version in C# (and yes, it is a pain to work without tuples, type inference, pattern matching and discriminated unions) and am now working “more advanced” features like pattern matching and case expressions.
Glad to be of help, Christian.
Yeah, took me a while (and the act of debugging my own implementation) before I started to really understand it.
Cheers,
–A
Hi Andrew & Christian,
Great work!
I’m also trying to understand HM type inference from a pragmatic point of view.
I only know Scala theoretically, but I have strong skills in C#.
So Christian, are you also sharing your C# implementation?
Thanks,
Peter
Hi.
I’m also investigating HM type inference, trying to code it into a language I’m developing as a hobby project. I’m a little concerned that the “prune” operation figures in the algorithm at all to be honest. Pruning logic variables is an optimisation step, and if it affects the outcome in any way then there is a flaw somewhere else in the code.
In the Perl from Nikita Borisov, I *think* the flaw is in the direct assignment to a logic variable instance:
$type1->{instance} = $type2;
on line 271 in the code I found. Assigning to a logic variable is best done only by a logic variable method that can protect its own integrity.
I’m also a bit disappointed having read the Cardelli paper (my maths is probably worse than yours) to find that the algorithm requires a distinction between let and lambda. I’m already firmly in the Scheme camp where “let x = y in z” gets rewritten to “fn (x) {z} (y)” (i.e. let is just lambda) before the AST is even built.
Still, thanks very much for your implementation, I was getting hung up on the Perl one, and any thoughts about this?
Hi Bill,
It’s been a while since I’ve looked at this, but I think I’m right in saying:
The core type inference algorithm certainly does not require a distinction between let and lamda. That is: you’re perfectly free to add in a rewrite step which transforms ‘let’ into ‘lambda’, and remove ‘let’ handling by the type inferencer completely. The included AST is just a little toy language to show off that the type inference works.
Line 271 of the Perl code implements unifies a type variable with its inferred value, so it’s quite essential :)
You could wrap it in an setter method if you like (is that what you meant?), but that doesn’t change the behaviour of the algorithm.
If more generally you mean that you’d rather have an implementation which only used immutable objects, and did not have the mutation which you point out on line 271… that too is possible. As Christian Klauser points out above, it’s trickier in a pure functional environment, but quite possible; you just have to maintain a map of type variables to their bound types. The Cardelli paper calls this the ‘set of assumptions’ (page 13). However, the Perl implementation, and my Scala implementation, mutate the type variable objects instead of maintaining a separate map.
Hope this helps!
–A
Hi Andrew
> Line 271 of the Perl code implements unifies a type variable with its
> inferred value, so it’s quite essential :)
Absolutely agree, but… :-)
> You could wrap it in an setter method if you like (is that what you
> meant?), but that doesn’t change the behaviour of the algorithm.
Yes that’s what I meant. My real argument is that it’s dangerous *not* to wrap the assignment to a logic variable in a setter method.
In the standard(?) implementation of Logic Variables, they arrange themselves in chains. For instance assume some language with LVs:
# create an empty LV
declare A;
# result A->(empty)
# create another empty LV
declare B;
# result: B->(empty)
# unify A with B
A = B;
# result: A->B->(empty)
# unify A with 1
A = 1;
# result: A->B->1
The point then is that naive assignment directly to A here would not achieve the desired result, instead you would end up breaking the association of A and B giving
A->1
B->(empty)
The setter method must chain down the list from A to B until it finds a pointer to (empty) and assign there.
“Pruning” as I understand it is just that action of chaining down the list, but it is presented in the Perl algorithm at least as being a separate step from the assignment. So if as in the perl case it was forgotten, we get the bug you detected.
Bill,
I see what you’re saying.
Though in actual fact, the missed call to prune was in ‘occursintype’ which is used in a couple of places, notably ‘fresh’ (which creates new type variables from generic ones if necessary), not just in ‘unify’.
The way the algorithm works, the assignment to ‘instance’ should never be performed on anything other than an unbound type variable. So your setter method could just
assert(instance == null); instance = newValue;
. The ‘Type’ class is already tightly bound to the type inference algorithm, so you might as well implement ‘unify’ and ‘prune’ (and possibly some other functions) as methods on Type, instead of exposing ‘instance’ at all.Cheers,
–Andrew
Actually, let me retract that: ‘Type’ is not necessary coupled to the type inference algorithm, so, yes, a ‘bind’ method, which checks that the variable is not already bound (or which walks the list to the last unbound variable in the chain) before assigning ‘instance’ would make sense.
I keep my source code in a repository at assembla:
http://www.assembla.com/wiki/show/tincal
It is not a working compiler and a lot of things are missing. I haven’t settled on how to encode product and sum types which in turn determines how “value deconstructors” will be implemented. Those in turn are necessary for pattern matching.
At the moment, it is a personal project that I work on from time to time. Don’t know how far I’ll get.
Hi Andrew,
I tried to run ur code using eclipse under windows. I got the following error
java.lang.NoClassDefFoundError: HindleyMilner
Can you help me regarding this issue. Besides does the code enables inputting an expression and getting its type as output.
Regards
I reimplemented your Scala code in Python for a project I am working on.
http://www.smallshire.org.uk/sufficientlysmall/2010/04/11/a-hindley-milner-type-inference-implementation-in-python/
Thanks for providing a clean implementation in Scala for me to work from; it was much more comprehensible than the Modula-2 or Perl, even though I don’t know Scala and do know Perl.
Hi Dima,
’Fraid I can’t help; I don’t use Eclipse. Have you tried it from the command-line? Perhaps Eclipse is confused because it is written as a script (see line 304)…?
The code outputs the type of an expression, given an abstract syntax tree representing that expression. However, it does not parse string expressions to create the AST, if that’s what you’re asking. So you can’t run it on arbitrary expressions from the command-line, or anything like that.
Hope that helps,
–Andrew
How hard would it be to convert it to Scala 2.10.x?
Is the line below correct ?
var _nextVariableName = ‘α’;
Hi there. Should be easy enough to convert to Scala 2.10, but I leave that as an exercise for the reader :)
The weird line you spotted (and there are a few others) was a result of the source code being delivered with the wrong character encoding. (Should be UTF-8 of course.) I’ve fixed it. The line you mentioned should read:
If it still looks wrong in your web browser, try choosing a ‘Text Encoding’ of ‘UTF-8’ in the browser.
Scala 2.10.0 code:
I just had to change two lines to add [Type]:
val Integer = Oper(“int”, Array[Type]())
val Bool = Oper(“bool”, Array[Type]())
Thanks Davi. Took me rather longer than it should have, but I’ve modified the code to be Scala 2.10-friendly.
Rather than changing
Array()
toArray[Type]
, I instead changed it toSeq()
, which I think is a bit neater.Pingback: A Hindley-Milner type inference implementation in OCaml | Kwang Yul Seo
Pingback: JEP 286 を調べていたはずなのにいつのまにか Kotlin で Hindley-Milner 型推論を実装していた | IT技術情報局