I’m working on a spreadsheet application at the moment. (Very exciting.) Part of the implementation obviously includes an expression language (so you can write things like
total = sum(numbers) or
vat = price × 17.5%).
Part of the design is to disallow things like
"text" ÷ 11 or
apples + oranges, and for that I need a type system.
So I’m investigating type systems, and rules for inferring types, and I’m looking at the algorithm they call Hindley-Milner type inference. I found an implementation of the algorithm in Perl by Nikita Borisov. This was in turn based on a Modula-2 implementation described in a paper by Luca Cardelli, Basic Polymorphic Typechecking (1987/’88). Given that I read maths only very painfully and slowly, it’s a very clear and readable paper.
I have reimplemented the algorithm in Scala (the language I’m using for my application).
Because Scala is itself a statically-typed language, some of the logic becomes clearer than the Perl version (for example, it is obvious where type variables are expected as opposed to type terms). Scala is also somewhat syntactically lighter than Perl, and a lot lighter and more expressive than Modula-2, so you may find it easier to read too.
The essential algorithm is elegant: given an expression in the form of an abstract syntax tree (AST), it recursively creates a tree of types in the expression, inserting placeholder ‘type variables’ for all the unknowns. It then ‘unifies’ sub-types, for example, ensuring that a function call’s result type is the same as the function definition’s result type. The final unification creates the most general type tree possible which accurately captures the expression type. The final unification may include still-unbound type variables, which would indicate that the expression is polymorphic in these type variables.
My code is available for download in the hope that others find it as useful as I found Luca Cardelli’s paper and Nikita Borisov’s Perl implementation:
You can run it as a script to see it analyse some example expressions:
Note: I believe that there was a mistake in the original Perl code; when unifying two variables, it tried to ensure that generic type variables were always bound to non-generic ones, not the other way about. This was in order to satisfy the requirement “In unifying a non-generic variable to a term, all the type variables contained in that term become non-generic.” However, it does not matter the order in which they are bound. Once the ‘bindee’ is further bound to a term, they both become bound to the same term. The original code omitted a call to
prune in the method
occursintype which (I believe) lead to a fault, for which the mistaken ‘fix’ was added.
Update 15 Sep 2013: Now compatible with Scala 2.10!