Removing nulls from Scala, some thoughts

I’ve written one or two small pieces of software in the new upstart programming language Scala, and I love it. It takes all of what is good and right about Java and C#, removes a lot of the cruft, and introduces powerful new bits from modern functional languages. It’s a pleasure to write in.

Unfortunately there is one of the bits of cruft from C#/Java which is still there: the concept of ‘null’—a value which can legally be assigned to any reference type, but which causes an exception if you try to dereference it. It’s an ugly carbuncle on the type system, but, for compatability reasons, it’s never yet been removed.

Here I present one way of ridding the Scala world of nulls—whilst remaining compatible and efficient. I wish for World Peace and for this to be implemented in Scala 3…

[If you don’t care about programming, type systems and language implementation, I heartily suggest you skip this article. I’ll review a film soon. Promise.]

Note: I’ve since thought a bit more about this, and submitted it to the Scala Improvement Process. You can read my current draft. It’s refined a bit from this one, particularly as regards arrays. (Also it pays more attention to variable initialisation which, it turns out, is particularly important for this sort of thing.) However, the version below is chattier and probably explains my line of thinking a bit better.

In Java (and similar languages), any reference type may hold the value ‘null’. ‘Null’ means ‘no object referred to’, and any attempt to dereference ‘null’ causes an error, a NullPointerException (or NullReferenceException in .NET*). ‘Null’ is used for two purposes:

  1. To provide a default value for uninitialised variables;
  2. To act as a ‘marker’ for missing values when the value is optional.

Scala provides a better, type-safe mechanism for the latter purpose, called Option (with subtypes ‘Some’, for present values, and ‘None’ for missing ones). Idiomatic Scala code uses Option instead of testing for null values. Unfortunately, Java code still uses nulls, and since Scala strives to be compatable with Java, nulls—and consequently NullPointerExceptions—are still present in Scala.

My main thesis here is that null can be replaced with None, in all cases, thoughout Scala, whilst maintaining Java compatability, Scala compatability and performance.

* Note that when I talk about Java, and the Java Virtual Machine (JVM), below, my arguments apply equally to .NET and the Common Language Runtime (CLR). They are very similar. However, one important difference between the CLR and the JVM is that the CLR reifies parameterised types and the JVM erases them. Another key difference is that .NET already has a convention for representing and boxing nullable primative types. I have tried to ensure that the proposal below accommodates both systems.

In brief

Nulls are bad. So we get rid of them. If I declare a variable v of type T, this means that the variable is never ‘null’, and always contains a valid reference to a T. If we want to declare that an expression sometimes has a value of type T, and sometimes has no value, we must use the type Option[T].

Any hitherto nullable values passed to and from Java now become Options, allowing Scala code still to interface cleanly and type-safely with Java.

The advantages:

  1. No NullReferenceExceptions. We’ve removed a whole chunk of type unsafety (‘type dangerousness’?) from the language.

Disadvantages:

  1. Hold on a minute… all reference values passed to and from Java are now Options? This makes the Java APIs hideously unwieldy.
  2. Contents of (yet) unassigned instance variables (of reference types) now become undefined. We quite liked the old idea of integer variables being initialised to zero and so on, but what value does an unassigned reference variable carry now? It cannot be null, so what is its value?
  3. More worryingly, contents of newly initialised arrays (of a reference type) now become undefined. Are arrays to be limited to containing Options? In which case what about arrays of basic types, like ints?

We’ll address all of these problems below, but first we look at a side effect of getting rid of null: we can now use the underlying virtual machine ‘null’ to represent the Option value None.

Internal representation

For reference types, Option[T] is now represented at the virtual machine level by an ordinary (nullable) reference to type T. Some(x) is now represented at the JVM level by (an ordinary reference to) x, and None represented by JVM null.

This means that…

  1. To other JVM languages, a Scala method which accepts an Option[String], say, would appear to accept a (nullable) java.lang.String reference, and thus be usable by non-Scala code (without that code needing to know about Option types);
  2. Option[AnyRef] is encoded as efficiently in the virtual machine as a traditional, nullable reference type.

Boxed Opion[primative] types, (i.e., Options of primatives cast to ‘Any’), would be represented as nullable references to the boxed type (similarly to other reference types). This is how Nullable<primative> values are boxed in .NET, and would also achieve good compatability with Java.

Unboxed Option[primative] types, Option[P] could be represented as above (as nullable reference to the boxed primative), or could be represented as a pair of type (P, Boolean). On .NET, Option[primative] would explictly be implemented as System.Nullable<primative>.

For Option[Option[T]]—i.e., nested Options—the outermost ‘Option’ would be represented as a nullable refererence, (as with the other types), and inner Option arguments would be represented by explicit JVM ‘Option’/‘Some’/‘None’ objects. This is essential so that we can distinguish Some(None) from None.

Given the foregoing, and the fact that (for reference types) Option[T] is represented identically to T—the only difference between them being the static type—we have essentially performed another level of type erasure (in addition to that performed by the generic type system).

var s = "string"
val opt1: AnyRef = Some(s)
assert(s eq opt1)
assert(s == opt1)
assert(opt1.isInstanceOf[String])
assert(opt1.isInstanceOf[Option[String]])
// But nested Options are represented differently so:
assert(!opt1.isInstanceOf[Option[Option[String]]])

However, we’re already living with regular Java type erasure, so Scala programmers have already learned that isInstanceOf doesn’t necessarily return what you might expect from a naïve analysis of the static types.

Default values

We introduce the idea of a default value for a type. For integers, the default is 0. For booleans, it’s false, for floating point numbers, 0.0.

…For strings it should be the empty string. For sets, the empty set, for lists, Nil, for Options, None.

The default value is an immutable ‘zero’ for the type, which may be safely shared by many references.

Some types will never have defaults. In particular: types for which a default value would be expensive to construct; or would be mutable; or simply don’t make any sense. Font, for example, or File, would be unlikely to have default values.

Getting the default value

The language runtime and user code is able to obtain the default value for any type which defines one. We add a function in Predef, default[T], which returns the default value of T, and a method defaultIfAny[T] which returns an Option[T] containing the default if T has one (and None otherwise).

DefaultOption

We add a new type DefaultOption[T] with a similar interface to Option[T]. It wraps a value, v of type T. It also itself has a default value, Default. DefaultOption has an accessor, as Option does, called get, which returns the wrapped value, or its default, or throws a NoSuchElementException if there is no wrapped object and no default.

DefaultOption may be implemented directly as the underlying type in the JVM. It kind of ‘gives back’ nulls for situations, like array initialisation, which demand them. Its important difference from Option is that for types with default values, DefaultOption is never empty.

Defining defaults

Defaults for user-defined types are not required to remove nulls from Scala, but they do erase one significant difference between the built-in primative types and types defined by the user.

How is the default specified for a user-defined class? The best mechanism I can come up with is a public val on the companion object to the class/trait, of the correct type, called ‘default’. For example:

// Companion object for the ‘Set’ trait.
object Set {
   val default = Set(); // Default value
   …
}

If your class is generic, this limits the default to being parameterised with a single set of types, (probably Nothing). However, given that this practically limits the programmer to implementing defaults values as immutable ‘zeros’, this may not be a bad thing.

Defining defaults (alternative)

Alternatively, the default value could be specified as an annotation on the type definition:

// Companion object for the ‘Set’ trait.
@default(Set())
trait Set {
   …
}

This does seem cleaner, (though I’m not sure if annotations can be used in this way).

Initialising instance variables

In Scala, all local variables must be assigned a value when declared, so although we have removed nulls, there is no problem proving that all local variables always have a value.

However, instance variables need not be assigned a value when declared, and this causes a problem. Reference types which do not declare a default value have no value to which unassigned instance variables of that type may safely be set.

The solution is simple:

  1. If an instance variable is of a type with a default, it need not be explicitly assigned a value;
  2. If an instance variable is of a type without a default, it does need to be explicitly assigned a value.

This may entail changing existing code. For example, a class with a mixture of ‘defaultable’ and ‘non-default’ types might be changed from:

class BufferedThing(size: Int) {
   var buffer = new Array(size)
   var posn: Int    // defaults to 0
   var name: String // defaults to null
   var file: File   // defaults to null
}

to

class BufferedThing(size: Int) {
   var buffer = new Array(size) // Already initialised to a value.
   var posn: Int          // Still defaults to 0.
   var name: String       // Now defaults to "".
   var file: Option[File] // Defaults to None.
}

Arrays

Arrays are an important systems building block. They are rarely used by application code, but they are essential in building the classes that many applications do use: buffers, queues, hashtables, images. Many of these—for example hashtables—are generic types, so the declaring class may not ‘know’ the ultimate type of the array.

If we disallow nulls, and if not all of our types have default values, with what value do we initialise a new Array[T](size)?

One possibility would be to require that all arrays have element type Option[SomeType]. However, this would cause a lot of problems and make a lot of people very angry. Array[Option[Byte]](1000000) would require (given most implementations) somewhat in excess of 5 megabytes, and would enforce (un)boxing on every element access. We didn’t used to have a problem with primative arrays, so why should they be punished by our new, null-free system?

Our requirements for arrays are:

  1. They be efficient, and not force us into boxing things which shouldn’t need to be boxed.
  2. They be compatable with Java (more or less).
  3. But they don’t have to be completely beautiful, since they are mostly used by people building other, more beautiful abstractions, like HashTables and StringBuffers on top.

We base our array implementation on the idea of storing types with default values.

For Array[TD], where the element type has a default value, the array appears to be initialised to that default. The JVM initialises primative values for us already, and, for reference values with default values, the Scala runtime can provide the default value when the application looks up an otherwise ‘null’ array cell.

For Array[TND], where the element type has no default value, the array is initially ‘empty’. Calling Array.apply for an array index which hasn’t been assigned a value will throw a NoSuchElementException (rather than, as in the past, returning null). It’s as if the array is implemented as a collection of DefaultOptions.

The client code can call isDefinedAt(index: Int): Boolean to test whether that array index has a value. (For arrays of defaultable types, isDefinedAt returns true as long as the index is within the bounds of the array.)

Any implementation of a mutable collection type will want to clear unused array cells when elements are removed from the collection, to avoid accumulating uncollectable garbage. We add a new method, clear(index: Int): Unit which empties array cell ‘index’. For arrays of a defaultable type, TD, this has the same effect as setting cell ‘index’ to default[TD].

We also define an override to Array[T].update which is defined to take a DefaultOption[T] rather than just a T.

If the array-using code wants to explicitly create an array (of any type, including defaultable types) which may contain missing values, it should use Array[Option[T]].

In summary, the new array behaviour:

  1. Keeps current, efficient, array characteristics for native types,
  2. Allows implementors of collection classes to clear unused elements to allow objects to be garbage collected, but
  3. Forces the ‘null’ (now ‘empty’) test to occur at the earliest possible moment, array lookup.

Java and legacy Scala compatibility

One of Scala’s main selling points is its strong compatability with existing Java code. In addition, the Scala language itself has been around for a few years now, and any proposed change must accommodate existing Scala code too.

On the other hand, we don’t want to bring nulls back into our newly null-free language just to accommodate legacy code.

Calling Null-Safe Scala code from other JVM languages

Scala code should never throw a NullPointerException, but, if called from legacy code which attempts to pass a null in as a reference parameter, the Scala runtime must do something: it throws an IllegalArgumentException (on .NET, ArgumentNullException).

That is to say: nulls (or at least, lack of values) are checked at the edge of the Scala code, ensuring that the type-system guaruntees are maintained within the code.

If Scala code declares a parameter of type Option, external code may pass in nulls. As described above, Options are mapped directly to (nullable) object references. A null in external code in this case maps to a Scala None.

Scala instance fields are never directly exposed, but are controlled by accessor methods, so the same rules apply to setting instance variable values as applies to method parameters.

The last way that external code may attempt to pass ‘null’ to Scala is by setting elements of a shared array of a reference type to ‘null’. As implied above in the section on Arrays, this would cause one of two things to happen when Scala code attempts to dereference the modified array element:

  1. If the array is of a type with a default value (for example, it is an Array[String]), the Scala code will return the default value (in the example, “”);
  2. If the array is of a type without a default value (for example, it is an Array[StringBuffer]), the Scala code will throw a NoSuchElementException.

If a Scala method has a return type of Option[T], other JVM languages will see the return type as of reference type T, and a returned None value as null.

Calling other JVM code from Scala

Conversely, other JVM APIs which are written to take reference type T for a particular parameter, will appear to client Scala code to accept an Option[T].

We supply an implicit conversion in Predef to turn value v into Some(v), making passing parameters to legacy code fairly painless.

However, returned reference values from all legacy code will also appear to be Options. This means that all return values from Java code must be explictly checked to ensure that they are not None. This could be unwieldy, especially when chaining Java API calls together, so we look at mechanisms to make this easier below.

Calling precompiled ‘Nullable’ Scala from new Scala

Note that ‘legacy’ code here could include precompiled code from current (nullable) versions of Scala. However, current, idiomatic Scala code, avoids ‘null’s in its public APIs anyway, preferring Option for optional parameters, and for returned results.

For this reason, when interfacing to older, null-enabled code, unless told otherwise, the compiler would assume that the older code was ‘well behaved’, i.e., avoids nulls in its external interfaces.

Method parameters of older Scala code would appear to be Options or of ordinary (non-Option) types, as originally specified. Return values of older Scala code to newer code would be null-checked by the compiler (possibly throwing a NullPointerException if the older code fails to fulfil this contract and inadvertantly returns null).

The user could indicate, via a compiler option, whether the older Scala code is ‘misbehaved’ with respect to null (i.e., accepts/returns nulls in its public interface), in which case it is treated similarly to non-Scala legacy code.

When passing Options (Some or None) to older Scala code, or receiving Option results, newer Scala code would supply or receive the parameters according to the legacy conventions, passing/expecting explicit Option objects instead of JVM nulls.

Calling source ‘Nullable’ Scala from new Scala

Legacy code, which makes use of ‘null’, should still be compilable with the new compiler, via a compile-time switch. This way, Legacy Scala code could be gradually migrated to the new, null-free world.

The only difference in semantics from code compiled using the old compiler would be that it would be impossible to distinguish at runtime between ‘null’ and ‘None’.

Making Java/.NET APIs more pleasant to use

All this is perfectly workable as described, but given that Scala prides itself on its JVM integration, it seems ugly to treat every reference accepted or returned by ‘foreign’ code as an Option.

For example, one would like to be able to write

val bounds = myComponent.getBounds()
val x = bounds.x
…

rather than

val bounds = myComponent.getBounds().get
val x = bounds.x
…

…since the API getBounds is defined to always return a (non-null) reference.

We would like a way to annotate external APIs as to whether they accept or return null values, and hence whether their parameters and returned results need to be exposed as Options. However, we cannot usually depend upon the writers of external APIs to provide this information.

We propose the following mechanisms, allowing API writers, Scala distributers and end programmers to provide the required information:

  1. There are several existing or proposed Java-level schemes to annotate nullable and non-null values, for example IntelliJ, FindBugs and JSR 305. The Scala compiler should recognise these annotations and treat parameters/returned values/fields as Options or non-Options as appropriate.
  2. As the Java API, for example, does not currently include such annotations, the Scala compiler should ship with built-in nullability rules for (the most-used parts of) the host API.
  3. The end programmer may, if they wish, provide nullability information for the particular Java/.NET API which they are using. This will be a lightweight, easy-to type text format; it could even be generated semi-automatically by a Scala IDE plugin. The format would be the same as that used for Scala-vendor-provided nullness mark-up.

The compiler stacks up all this provided nullability information, and complains if any of it is contradictory.

The whole point of this is to make it as easy and painless as possible to mark up external APIs as accepting/returning nulls. The programmer could mark up a little used corner of the Java API in their own file. If this was later added to the built-in nullness information, the programmer could choose to discard their local version.

The Scala website, and possibly the sbaz tool, could provide a repository of nullness databases for common APIs, such as Hibernate and Spring.

At worst, if none of this information is provided, the Scala programmer has some additional Option dereferences to make when using an unannotated API.

Note that it is quite possible for the nullness information for a legacy API to be wrong, or for the external code to contain bugs causing it to return nulls when it should not. For this reason, the compiled Scala code tests every value received from external code and throws a NullPointerException at any point it receives a null value where the API was marked up to indicate ‘not null’.

Summary

We’ve explored a possible mechanism for removing ‘null’ values from the type system of Scala.

The mechanism:

  • Maintains strong compatibility with existing Java/.NET code;
  • Maintains Java-like performance for reference types, value types and arrays of either;
  • Requires no changes to Scala syntax, some small additions to the API and very minimal changes to existing Scala programs;
  • Allows gradual migration of existing code, and piecemeal mark-up of external APIs;
  • Trades some type erasure for these goals.

Now if only I had more time on my hands and more compiler knowledge I might attempt to implement it myself, but I hereby throw it out there in the hope that some other soul might attempt it.

Further ideas (syntax)

One form of syntax, used by the languages C# and Nice, is to annotate typenames with a question mark to indicate nullness. For example, in Nice, ?String means the same as (Scala) Option[String]; in C#, int? means the same as Nullable<int>, (which in Scala would be Option[Int]).

It would be possible (though certainly not required for this proposal), to allow ?T to mean the same as Option[T]. This could appreciably reduce source code size where Option is frequently used, and may make it clearer to read.

As a counterargument, however, excessive use of Option is not necessarily a good thing, and the ‘?’ could increase code ‘noisiness’.

10 thoughts on “Removing nulls from Scala, some thoughts

  1. Sergey

    I like the idea, and I think the language would really benefit from this change. You write that you submitted this to SIP, but why I don’t see this on the scala-lang.org site.

  2. Andrew Post author

    Hi Sergey,

    Thanks. It’s submitted (you can read the full draft SIP on this site), and hasn’t officially been rejected, so in principal it’s still under consideration.

    I doubt that they’ll go with my solution exactly as presented here. In particular, the Scala language designers are very conservative about making wide-ranging changes to the type system. There is an implementation of one non-Null mechanism currently in Scala, the trait NotNull (which you can mix in with other types, for example “String with NotNull”), however this is incomplete and not fully supported. Judging by the noises at the moment, some kind of non-Null support will eventually make its way into the language, but probably not until after 2.8.

    –Andrew

  3. Andrew Bate

    What ever happened to this? Monads (i.e. Option) are the number one reason why Scala is slow… Why was this idea rejected?

  4. Mechanical snail

    I didn’t see this on the current list of SIPs, so I filed a bug (https://github.com/scala/scala.github.com/issues/105) in case it vanished. The reply was: “That was submitted before the current SIP process. I believe Martin is actively entertaining the idea for Scala 2.10 now that we have value classes.

    I’m not sure exactly where Andrew may have posted his SIP or why it didn’t make the translation, as I wasn’t involved in the previous process. I’d encourage Andrew to re-submit an altered SIP taking into account already accepted SIPs.”

  5. Andrew Post author

    Thanks for your comments Andrew and, er, Mechanical. ’Fraid I had a bit of a backlog on blog comments, which I’m just clearing out now.

    Andrew, do you have any evidence that Option makes Scala ‘slow’? I know that boxing of primitive types causes some performance problems, as does java bytecode generated for ‘for’ loops… Perhaps that’s what you mean by Monads, since the ‘for’ loop is Monad-tastic?

    Mechanical Snail: It was submitted when SIPs were just getting started. Was discussed a bit at the time, and Martin Odersky pointed out some bits of it which he philosophically disagreed with. I’ve since realised how to simplify it, and address many (all?) of Martin’s objections, so perhaps I should revise it and resubmit.

  6. Andrew Bate

    At the time when I wrote my comment I was working on a piece of software that made heavy use of functions whose return value was Option[T] for some T. This had a high impact on performance in this situation, a slowdown of 10x. The ugly solution was to have methods that returned null in place of None and a non-null reference x in place of Some(x).
    The performance of the for loop is also frustratingly slow. I wish that some of the implementation work of http://code.google.com/p/scalacl/wiki/ScalaCLPlugin could e integrated into the scalac compiler (less the OpenCL-specific optimizations).
    In summary, I would love to see your (possibly updated) suggestion as a SIP – and get rid of those nulls!

  7. Andrew Bate

    I have just remembered, your suggestion would also remove the need of the @NotNull annotation, for which there has been very little progress (and nothing really working) since 2.5

  8. Me

    This is a message and warning to others who find this page: please don’t use Scala. It is an abomination. While most experienced programmers understand why Scala sucks, we are seeing some novice programmers being taught to use Scala by academics who don’t work in the industry.

    “Unfortunately there is one of the bits of cruft from C#/Java which is still there: the concept of ‘null’—a value which can legally be assigned to any reference type, but which causes an exception if you try to dereference it. It’s an ugly carbuncle on the type system, but, for compatability reasons, it’s never yet been removed.”

    Wikipedia currently states:
    “A null pointer has a value reserved for indicating that the pointer does not refer to a valid object. Null pointers are routinely used to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types and to the Nothing value in an option type.”

    Obviously someone from the Scala community has already edited Wikipedia, because “option types” are not used anywhere except Scala, and F#, also according to Wikipedia. Besides the fact that the author of this post confounds the concept of “null” with the specific concept of a “null pointer”, there is no difference between an option type representing a null pointer, and a null pointer, other than a lot of overhead and the belief that option types somehow escape all of the reasons for having a null pointer. If you have an invalid pointer, what is the difference between that and the option type? Perhaps next we should put asserts into production code?

    If you don’t like null pointers, get out of the industry. Programming is hard sometimes. Sometimes you have to deal with unexpected return values. And by sometimes, I mean always.

    Scala is the solution to a problem that no one’s ever had.

  9. Andrew Post author

    Gosh, you’re angry.

    Yes, Option types (the way they’re currently represented) have a runtime overhead compared to null pointer values. The whole point of this post was a suggestion to remove that overhead, and make them equivalent to nulls at runtime, but without losing the nice properties of the Option type.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>