Removing `null` from the language

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. —Tony Hoare, 2009

Abstract

In Java (and similar languages), any reference type may hold the value null. null means ‘no object referred to’, and any attempt to dereference null causes an error, (a NullPointerException, or NullReferenceException in .NET). null is used for two purposes:

To provide a default value for uninitialised variables (and array elements) of reference types;
To mean ‘missing value’ in cases when a value is optional or not available.

Scala provides a better, type-safe mechanism for the latter purpose, called Option (with subtypes Some, for present values, and None for missing ones). Idiomatic Scala code uses Option instead of testing for null values. Unfortunately, in order to maintain interoperability with Java, Scala also supports null. Consequently Scala is also prone to NullPointerExceptions.

This proposal addresses removing null from Scala entirely, while still maintaining the ability to use Java code and be called from Java code, with minimal boilerplate, while improving Scala’s runtime efficiency. It aims to require as few changes to the Scala language as possible.

Document conventions

For ease of exposition, the following conventional terms are used:

RefType: ‘RefType’ is used as a placeholder to mean any reference type, that is, in Scala a subclass of AnyRef, and in Java a subclass of java.lang.Object.
ValueType: ‘ValueType’ is used as a placeholder for any value type in Scala (and Java), that is, in Scala a subclass of AnyVal and in Java any primitive type.
ValueTypeExceptOption: A placeholder for any value type and not Option, None or Some. (Note that we discuss below making Option a subclass of AnyVal.)
Nullable reference: An ordinary object reference in Java or in the Java Virtual Machine, which is allowed to have the value null.

Note that the discussion here is centred on the JVM implementation of Scala. However, except where noted, it applies equally to the .NET CLR implementation. For ‘Java’, ‘JVM’ and ‘NullPointerException’, feel free to substitute ‘C#’, ‘CLR’ and ‘NullReferenceException’.

Background

The Scala type Option (and the object None) has much in common with nullable reference types (and the value null), to the extent that in the absence of an Option type, nullable references can in many cases be used instead (and in Java code are used instead). At the risk of stating the obvious:

Use case	`Option` (Scala)	nullable (Java)
Declare an optional string value.	`var aName: Option[String]`	`String aName;`
Assign a concrete value to a variable	`aName = Some("Some value")`	`aName = "Some value";`
Assign the absence of a value to a variable	`aName = None`	`aName = null;`

This equivalence is not complete. Options are richer in that:

They can contain values of all types, not just reference types;
They can be nested. In other words Some(Some(value)) is valid. Moreover Some(None) is distinct from None.
None is an ordinary object with methods. It does not throw an exception when dereferenced, (but null does).

Specification

The Scala language

The keyword null and type Null are entirely removed from the Scala language. (Consequently, reference type variables and array elements no longer have a default value.)

Option is redefined as a value type (a subclass of AnyVal), with the default value None.

Further sections below address issues such as: initialisation of reference variables; array semantics; Java compatibility and migration of existing Scala code.

`Option` as a value type

We (re)define Option as a value type. This is so that:

The implementation of Option is not constrained by Java Object semantics, and does not need to maintain its object identity.
Options must have a value.
Option may have a default value (None, unsurprisingly).

Option is defined to have a boxed and an unboxed form. The unboxed form achieves compatibility with Java by representing an Option[RefType] as a nullable reference to RefType; the boxed form allows the additional rich semantics of the Scala Option type, and allows Options to be treated as normal objects including being cast to Any.

When an Option[RefType] is unboxed, None is represented as JVM null and Some(refVal) is represented as a plain JVM reference to refVal.

Additionally, in a CLR implementation, an Option[ValueTypeExceptOption] is unboxed as a System.Nullable<ValueTypeExceptOption>.

The unboxed representation of Option[ValueType] must have a JVM default value which is interpreted as None. This is important for array semantics (below). Otherwise, the unboxed representation of an Option[ValueType] is not specified here. Potentially it could consist of two slots, (ValueType, Boolean). In the case of nested Options, Option[Option[…[BaseValueType]…]], the representation could consist of (BaseValueType, Int). Alternatively, it could be represented as a nullable reference to the boxed ValueType.

When boxed, an Option is represented as an instance of the JVM class Some[T] or as the JVM object None.

The implementation is generally free to choose boxed or unboxed representations, depending upon ease of implementation or efficiency. However, an Option is always boxed when it is cast to a type higher than Option. Conversely, an Option is always unboxed when a) it contains a reference type (in a CLR implementation, any type other than Option) and b):

It is passed as a parameter to, or received as a result from a method which is callable outside the compilation unit.
It forms the element type of an array. That is, an Array[Option[String]] is represented as the JVM type [Ljava.lang.String.

Initialising instance variables (fields)

Because variables are not now allowed to hold the value null, and because, as a consequence, reference type variables do not now have a default value, reference type fields must be initialised to a definite value by the program.

This requires that:

The constructor must assign a value to every non-abstract reference-type field (val and var);
During object construction, only assigned fields may be read by the constructor and by methods called by the constructor;
References to partially-constructed objects cannot be allowed to ‘escape’.

1. All variables must have an assigned value

Within a constructor (object/class body), every non-abstract field must be assigned an (initial) value. This rule is already enforced for ‘val’ variables, but we here extend it to ‘var’ variables too.

Variables need not explicitly be assigned a value; if the type has a default value, and if the program does not explicitly supply a value, it is implicitly assigned the default value for the type. Any such implicit assignments are considered to occur before the call to the superclass constructor.

All value types have a default value; no reference type does. [See Appendix B for a proposal to allow types other than value types to have default initial values.]

2. Unassigned fields may not be read

Any expression called by the constructor (that is, variable values, or expressions which form part of any statement within the object/class body, and not within a ‘def’), may only refer to variables previously assigned a value.

Any constructor expression may only call methods which refer to variables already declared before the call, and which call other methods which refer to already-declared variables, (and so on recursively). This is complicated by the presence of subclasses which may override the methods. We outline a mechanism of enforcing this below.

The initialisation sequence of an object of class C with super-class B and super-super-class A is as follows:

C early definitions

B early definitions

A early definitions

A constructor

B constructor

C constructor

Early definitions are already restricted such that they may not refer to uninitialised fields, nor call any methods, (nor leak references to this).

We hereby change the language rules for constructors to also prevent them from reading uninitialised fields (and from leaking references to this), while maintaining some programmer flexibility, chiefly the ability to call (overridable) helper methods.

Note that in the diagram above, A’s constructor can safely read (at least) the variables initialised by all of the early definitions. B’s constructor can safely read (at least) the fields initialised by all of the early definitions, and all (concrete) fields inherited from A. C’s constructor can safely read (at least) the fields initialised by all of the early definitions, and all (concrete) fields inherited from A and B.

We declare two method annotations @inConstructor and @beforeConstructor(T: Class). The first indicates that a method is called from within the current class’s constructor, possibly after some fields have been assigned, but before others have been. The second indicates that the method may validly be called before the constructor for class T has started (and therefore refers to none of T’s fields, except those assigned in early definitions and by super-class constructors).

Expressions within the constructor for class T, and any methods annotated @inConstructor (for T) must obey the following rules:

They may only call methods which:
1. Have the annotation @beforeConstructor(classOf[U]) where U >: T, or
2. Have the annotation @inConstructor (for class U) where U is a strict superclass of T, or where U≣T and it can be statically proved that it only reads fields assigned before the first point at which the call is made, or
3. Are declared in the same template, are final and follow the rules given here, as if they had an @inConstructor annotation.
They may only read fields which were declared in an early definition, or were concretely declared in a strict superclass of T, or which are definitely assigned before the expression is evaluated, or the method called.
Any override of such a method must be annotated @beforeConstructor(classOf[V]) where V >: T. (That is, an override of the method is more restricted, as it should not need to know the order in which its super-class initialises its fields.)
May not refer to this. (See next section.)

Any expressions within the call to the superclass constructor, and any methods annotated @beforeConstructor(classOf[T]) must obey the following rules:

They may only call methods which:
1. Have the annotation @beforeConstructor(classOf[U]) where U >: T, or
2. Have the annotation @inConstructor (for class U) where U is a strict superclass of T, or
3. Are declared in the same template, are final and follow the rules given here, as if they had a @beforeConstructor(classOf[T]) annotation.
They may read only fields which were declared in an early definition, or were concretely declared in a strict super-class of T.
Any override of such a method must also be annotated @beforeConstructor(classOf[V]) where V >: T.
May not refer to this. (See next section.)

As a special case, if any abstract field in class T is read from a constructor expression in T, an expression within the call to T’s super-class constructor or from any method annotated @inConstructor for T or @beforeConstructor(classOf[T]), the first concrete implementation of the field must be assigned a value in an early definition. This could be achieved by annotating such abstract fields with a new annotation @assignEarly.

3. Partially-constructed objects may not ‘escape’ the constructor.

As a consequence of how the Java memory model treats constructors, constructors should avoid passing the object-under-construction to a scope visible to other threads. Otherwise they risk that other threads may see the object in a state other than the constructed state (for example, final fields may appear to change their value), thus violating class invariants. As we here specify an additional constraint, that reference variables never be JVM null, there is an additional risk with under-construction objects: that it becomes possible to observe the JVM null before it is replaced by the variable’s correct initial value.

Therefore we propose that passing the under-construction object outwith the scope of the class be disallowed. (Note that this change, more than any other proposed here, is likely to be the most onerous, and require most invasive changes to existing Scala programs.)

The constructor, and any method annotated with @inConstructor or @beforeConstructor, is disallowed from passing a reference to this to any method (or object constructor). This rule additionally disallows passing inner object or class instances which dereference their implicit reference to the class-under-construction.

Note that we allow references to closures and anonymous functions within the constructor (and special methods), so long as the closure body either a) follows the rules as regards referring to assigned variables, and not passing out this, or b) is not itself invoked from within the constructor.

Note that lazily-evaluated variables may be used to defer a ‘dangerous’ use of this until after the constructor has completed.

If an object needs to register a newly-constructed object, or store it in a global variable, this must be done after construction is complete. One possible pattern is:

class MyClass(/* args */) protected {
	// Perform construction here. Cannot pass out ‘this’ or store it
	// in global state. (Constructor is declared as ‘protected’ to prevent
	// clients from calling it directly.)
	
	protected def initialize: this.type = {
		// Perform any post-constructor initialisation here.
		// CAN pass references to ‘this’, or register UI listeners.
		return this
	}
}

object MyClass {
	// Clients should call this method to construct MyClass instances
	// (instead of using ‘new’ directly):
	def apply(/* args */) = new MyClass(/* args */) initialize
}

…where new objects would be constructed by clients using the companion object— MyClass(/*args*/) —instead of using new and calling the constructor directly.

Importantly, the constructor and the initialize method may be overriden independently by subclasses. When initialize is called the programmer knows that all of the constructors have run to completion.

It would be possible for the language to sweeten this construct with some syntactic sugar, but in the spirit of changing the language as little as possible, no such syntax is proposed here.

3a. Finalisation

The AnyRef.finalize method allows an object holding (usually native) resources to release these resources when the object is garbage collected.

As explained elsewhere if a constructor throws an exception, a finalize method called on that object could see a partially constructed object. If the object has not been fully constructed, some of its reference type fields could be unassigned (null in the underlying representation). However, the finalize method may legitimately need to access some of these fields.

This implies that:

We must take special steps to ensure that the finalize method does not directly or indirectly throw JVM NullPointerExceptions, while somehow letting it inspect the entire state of the object so that it can perform its cleanup task; and
The finalize method may not allow the, possibly incompletely constructed, object to escape, in case it violates its class invariants. (In any case, it is regarded as bad form for the finalize method to resurrect an object once the garbage collector has marked it for disposal.)

The finalize method in AnyRef is annotated with @calledByFinalize. Methods with this annotation:

May only be overridden by a subclass if the overriding method is either called ‘finalize’, or explicitly declares a @calledByFinalize annotation.
May not pass this to other methods or constructors (because this may not be a properly constructed object). Consequently Scala objects are prevented from ‘resurrecting’ themselves after garbage collection.
May only call methods annotated @calledByFinalize.
Are prone to throwing exceptions on attempts to fetch uninitialised reference-type variables. A field access of an uninitialised reference field throws scala.UninitializedFieldError (not NullPointerException), and it throws the exception when the variable is fetched, (not when it is dereferenced).

We define a new method in Predef, getIfInitialized[T <: AnyRef](v: => T): Option[T]. This can be used to ‘wrap’ an access to a possibly-uninitialised reference field, v, returning None if it is uninitialised, and Some(v) if it is initialised. It could be (inefficiently) implemented as follows, (but would ideally be an intrinsic function):

def getIfInitialized[T <: AnyRef](v: ⇒ T) = try {
	Some(v)
}catch{
	case UninitialisedFieldError ⇒ None
}

Subverting these safeguards

We note here that it is always possible to subvert these safeguards via Java subclasses or superclasses of Scala objects. For example, a Java subclass of a Scala class with a method annotated @inConstructor could override that method to call another, arbitrary method on the class, breaking the contract, and potentially causing a NullPointerException to be thrown from within Scala code.

Given that the JVM class-verifier does not enforce Scala rules, this is impossible to prevent. In addition, Java code can already subvert the type-soundness of other parts of the Scala language.

Array semantics

The section above outlines rules to ensure that every instance variable has a defined (non-null) value whenever it is dereferenced. However, these rules cannot ensure that array elements have defined values when dereferenced.

Some non-nullable language proposals have specified that arrays be limited to nullable (in our case, Option) types. However, this poses a problem for value types; we should be able to create an array of value types (not necessarily Array[Option[ValueType]]), and we should like to avoid different restrictions for value and reference type arrays.

Also, there is much existing code which uses arrays, including most implementations of collection classes. We should avoid changing array semantics excessively.

Hence we modify array semantics slightly for reference-type arrays, allowing arrays of reference types to have ‘missing’ values. That is, an array is redefined, from a function defined over all of the elements [0, length), to a function which is allowed to be undefined at some indices. Indeed, an array of reference type is initially undefined at all indices. (A missing array element is naturally modelled in the underlying JVM implementation as a null value.)

For value-type arrays, newly allocated arrays initialise each element to the default value for the type, as currently. (For the avoidance of doubt, value-type arrays never have missing values. A value-type array is a function defined at all of its indices.)

This rule for initialisation of value-type arrays extends to arrays of Options. As mentioned above, arrays of Option[RefType], store their elements unboxed. A new instance of such an array has all of its elements initialised to None. (This is implied by the JVM representation.)

The following Array[A] methods are introduced or redefined:

Array-type methods

apply(index:Int): A: Returns the element at index if there is a value in that array slot, that is, if isDefinedAt(index) would return true. If isDefinedAt(index) would return false, throw a NoSuchElementException.
update(index:Int, newValue:A): Unit: Sets element at index to value newValue. (Same as preexisting behaviour except that newValue may not be null.) Note that this implicitly marks array cell at index as containing a value.
isDefinedAt(index:Int):Boolean: True iff: index is ≥0 and <length, and if the representation does not have a JVM null at array slot index.
clear(index:Int): Unit: (New method.) Reset element at index to its default. For value-type arrays, sets the array slot to the default value for the type. (Specifically, for Option arrays, sets it to None.) For reference-type arrays, makes that array slot empty (such that isDefinedAt would return false). This method is necessary to allow implementers of collections classes to remove references to contained objects.

Array collection-type methods

++[B >: A](that: Iterable[B])

Returns an array containing the defined elements from this Array plus the elements of that Iterable. This is the same result as would be returned by Array.concat(this, that). Note that undefined elements will have been removed from the resulting sequence, so elements in the returned array may not appear at the same index as they did in the original this Array. This behaviour is subtly different from that in old, nullable Scala.

elements: Iterator[A]

An iterator over all of the array elements containing values; that is indices for which isDefinedAt would return true. That is, for(e <- array.elements) yield e produces the same result as for(i <- array.indices) yield array(i)

indices: Iterator[Int]

An iterator over all of the array indices, in order, for which isDefinedAt would return true.

map[B](f: (A) ⇒ B): Array[B]

Returns the array resulting from applying the given function f to each element of this array. f will not be applied to undefined elements in the source array, hence undefined elements in the source array will be mapped to undefined or default elements in the destination array. It could be implemented as:

def map[B](f: (A) ⇒ B) = {
	val result = new Array[B](this.length)
	for(i <- this.indices)
		result(i) = f(this(i))
	result
}

Importantly, the result Array will have the same length as this array.

zip[B](that: Array[B]): Array[(A, B)]

Returns an array formed from this array and the specified array that by associating each element of the former with the element at the same position in the latter, ignoring pairs for which one or both values are undefined. Hence, could be written:

def zip[B](that: Array[B]) = {
	val result = new Array[(A, B)](Math.min(this.length, that.length))
	for(i <- this.indices if that.isDefinedAt(i))
		result(i) = (this(i), that(i))
	result
}

Importantly, the result array is always the minimum of the lengths of the two source arrays and contains missing elements where either of its 2 source arrays contained missing elements.

Java interoperability

Having removed null from the Scala language, a Scala method (or function) with the signature (R1) ⇒ R2 where R1 and R2 are reference types

Cannot be passed a null value for its argument, and
Cannot return a null result.

Moreover, when Scala code calls a Java method, R2 method(R1 p), where R1 and R2 are reference types, it will never pass a null argument, and cannot accept a null result.

As described above, nullable Java references to R may be treated as if they were of type Option[RefType], mapping Some(refVal) to non-null Java reference and None ↔ null.

However, Java methods may be specified as not accepting or not returning null (informally, in documentation, or more formally, using annotations such as @Nonnull). In this case we would like to be able to pass and receive non-nullable references to and from Java code without wrapping them in Options.

There are 4 cases:

Passing parameters from Scala to Java
Returning results from Scala to Java: See ‘Transferring values from Scala to Java’ below
Passing parameters from Java to Scala
Returning results from Java to Scala: See ‘Transferring values from Java to Scala’ below

Transferring values from Scala to Java

In lieu of any @Nonnull annotations, the Scala compiler shall assume that a Java method parameter of a reference type R is nullable, and is therefore represented in Scala as an Option[R].

For example, Java method

File[] listFiles(FileFilter filter)

will appear to Scala code to have the signature:

listFiles(filter: Option[FileFilter]): Option[Array[File]]

An implicit conversion shall be provided in Predef from AnyRef to Option[AnyRef] to ease writing Scala code which calls Java code with nullable parameters.

implicit def anyRef2Option(ref: AnyRef) = Some(ref)

When Java code has a @Nonnull annotation on a method parameter, the parameter will appear to Scala code to be of an ordinary, reference type. The Scala compiler guarantees never to pass a null reference to such parameters. For example, the Java method:

void fillPolygon(@Nonnull Polygon p)

will appear to Scala to have the signature:

fillPolygon(Polygon p): Unit

Conversely, Scala methods which return results of a type Option[RefType] appear to Java code to return ordinary nullable references of RefType. Java code will receive null when Scala returns None and will receive someValue when Scala returns Some(someValue).

Scala method which return results of a type RefType also appear to Java code to return ordinary nullable references of RefType. However the Scala method is guaranteed never to return null to the Java code.

Boxing of primative options

Note that if Java accepts a nullable Object reference (which Scala interprets as Option[Any]), and Scala provides an Option[ValType] value, (where ValType may be a further Option), Scala type rules (specifically the fact that Option is covariant in its type parameter), imply that the Scala value be cast from an Option[ValType] to Option[Any], (which is represented in the JVM as a nullable reference to Any). Essentially in this case the contents of the Option must be boxed before passing them to Java code.

This corresponds to the .NET rules for boxing Nullable<ValueType> values; the inner ValueType value is boxed when casting to a (nullable) Object type.

Transferring values from Java to Scala

When a Scala method is defined as taking a parameter of type Option[RefType], it will appear to Java to take a normal nullable reference to RefType*. Java code may safely pass null, which will appear to the Scala code as None.

When a Scala method is defined as taking a parameter of type RefType, it will also appear, to Java code, to take a normal (nullable) reference to RefType. However, the Scala code cannot accept null values. The Scala compiler could annotate such parameters as @Nonnull, but it must emit null-checks for each such parameter at the top of the method. This null check throws a NullPointerException if the argument is JVM null.

Note that this null check could be omitted if the compiler can prove that the method is only ever called from Scala code (based on the visibility of the method and its class).

Note also that the requirement is that the potential NullPointerException is thrown before executing any Scala code in the method. The compiler can possibly optimise this away if all of the reference type parameters are dereferenced before the method does anything else.

*(An important implication of this is that a Scala object may not have two methods overloaded on the same name, whose parameters differ only in whether or not they take Option[RefType] or RefType—since such types are represented by identical JVM types. We address a possible way round this in Appendix C.)

In the absence of any @Nonnull annotation on the Java method, it can be assumed to return a nullable reference. Such a method appears to Scala code to return a value of type Option[RefType], which the Scala code likely pattern-matches to test for a value. For example, Java method

Object Dictionary.get(@Nonnull Object key)

appears to Scala code to have the signature

get(key: AnyRef): Option[AnyRef]

If the Java method is declared (via @Nonnull annotation) not to return null, the Scala compiler represents it as returning a plain reference. The Scala compiler must emit code to check the return result, either explicitly or implicitly (by ensuring that the result is immediately dereferenced), and throw a NullPointerException immediately if it receives null.

Accessing (Java) public fields

Java (unlike Scala) can expose public fields without accessor methods. Java fields of type RefType appear to Scala to be of (unboxed) type Option[RefType].

If they are marked as non-nullable, they are represented in Scala of being of type RefType. The Scala compiler must emit a null-check on attempts to read the field.

@Nonnull annotations for the benefit of Java

The Scala compiler shall mark all of its method parameters and return results for reference types as not-nullable, with annotations defined by JSR-305 (as soon as JSR-305 becomes finalised).

Other changes to Scala library

Other changes to the Scala library (except for Arrays) are expected to be minimal. In most cases, Scala modules are already written using Option instead of nullable types.

Annotating 3rd-Party Java code

As outlined in the specification, above, Java methods may be annotated to declare that they do not accept or return, null references.

Unfortunately there is no common standard for such annotations, and much existing Java code is not annotated in this way. However, annotations would greatly ease non-null Scala integration with Java code (and avoid an explosion of Options when interoperating with Java).

We outline a general mechanism here for applying such annotations to existing, compiled, 3rd-party Java code (including the standard Java libraries).

Choice of annotations

There are several candidate annotation schemes for marking Java fields, method parameters and method results as non-null. The most prominent appear to be:

IntelliJ IDEA
FindBugs
JSR-305

The general philosophy taken by this SIP is that as many kinds of non-null annotation as possible are accepted by the Scala compiler. The Scala compiler shall emit JSR-305 annotations—at least once that standard has been finalised. It could emit other annotations too, perhaps by a compiler flag.

IntelliJ IDEA

The JetBrains Java IDE, IntelliJ IDEA, can detect nullability issues through static analysis of Java code, decorated with nullability annotations. The annotations are released under an Apache licence, and have been submitted to Sun for possibly inclusion in the standard Java SDK. The annotations are in the Java package org.jetbrains.annotations.

@Nullable: Indicates that a type (of a field, method parameter or method result) may carry the value null. Assumed in lieu of any other annotation.
@NotNull: Indicates that a type may not carry the value null.

FindBugs

FindBugs is “a program which uses static analysis to look for bugs in Java code. It is free software, distributed under the terms of the Lesser GNU Public License.” It uses annotations to indicate nullability of fields, method parameters and method results. All annotations are in the Java package edu.umd.cs.findbugs.annotations.

@Nullable: Indicates that a type (of a field, method parameter or method result) may carry the value null. Assumed in lieu of any other annotation. “In practice this annotation is useful only for overriding an overarching NonNull annotation.”
@NonNull: Indicates that a type may not carry the value null.
@DefaultAnnotation: Indicates that all members of the class or package should be annotated with the default value of the supplied annotation classes. This would be used for behaviour annotations such as @NonNull, […]. In particular, you can use @DefaultAnnotation(NonNull.class) on a class or package, and then use @Nullable only on those parameters, methods or fields that you want to allow to be null.
@DefaultAnnotationForFields: This is same as the @DefaultAnnotation except it only applies to fields.
@DefaultAnnotationForMethods: This is same as the @DefaultAnnotation except it only applies to method [result]s.
@DefaultAnnotationForParameters: This is same as the @DefaultAnnotation except it only applies to method parameters.

JSR-305

“This JSR will work to develop standard annotations (such as @NonNull) that can be applied to Java programs to assist tools that detect software defects.”

The following annotations are applicable. They were interpreted from the current (as of 2 January 2009) SVN checkout of the JSR-305 source code. All annotations shown below are in the Java package javax.annotation.

@Nullable

Indicates that a type (of a field, method parameter or method result) may carry the value null. Assumed in lieu of any other annotation.

@Nonnull

Indicates that a type may not carry the value null.

@ParametersAreNonnullByDefault

Can be applied to a package, class or method to indicate that the method parameters in that element are non-null by default unless there is:

An explicit nullness annotation
The method overrides a method in a superclass (in which case the annotation of the corresponding parameter in the superclass applies)
there is a default parameter annotation applied to a more tightly nested element.

@ParametersAreNullableByDefault

Can be applied to a package, class or method to indicate that the method parameters in that element are non-null by default (with analogous exceptions to those noted for @ParametersAreNonnullByDefault).

This annotation implies the same ‘nullness’ as no annotation. However, it is different than having no annotation, as it is inherited and it can override a @ParametersAreNonnullByDefault annotation at an outer scope.

Applying post hoc annotations to 3rd-party code

It does not seem a great stretch to suggest that most Java code is not annotated for nullability. Most importantly, the standard Java APIs are certainly not annotated. As an (important) part of this proposal, we outline a mechanism by which preexisting Java APIs may be annotated, without recompiling, and without requiring access to the source code.

The user (or Scala developers) can provide an ‘annotations’ file which overlays annotations onto an existing, compiled Java library. The annotations file is essentially minimal Scala code describing an entire package, without any method bodies. In addition, it need not duplicate anything which cannot be copied or inferred from the Java library itself.

It uses a ‘?’ prefix before type names to indicate that in Java they may be null, and in Scala they will be represented as an Option[T]. (Additionally, see Appendix A.)

For example, an annotations file covering java.util might look, in part, like this:

// Comments are allowed
package java.util

class BitSet {
	// These methods’ parameters not allowed to be null:
	and(BitSet): Unit
	andNot(BitSet): Unit
	//…etc
}

class Date {
	this(String)	// Constructor does not accept nulls
}

class Calendar {
	fields: Array[Boolean]	// Non-null protected fields
	isSet: Array[Boolean]
}

object Currency {
	// Parameter not null; result may be null.
	getInstance(Locale): ?Currency
}

Note that:

The syntax is based on Scala syntax. This means that:
1. It’s light-weight, and semicolon-free
2. Static methods and fields appear as members of a companion object
3. Java interfaces are supported by the keyword ‘trait’
4. Java constructors are supported by the keyword ‘this’ (even for the default constructor)
5. It does not commit to a particular annotation scheme for nullability
6. Unfortunately it is less useful for programmers in other JVM languages, including, probably, the authors of the original library being annotated. Potentially, conversion tools could be written to annotate the original source code, based on the annotation file.
The following syntactic elements are omitted:
1. The keywords ‘var’ or ‘def’
2. Method bodies (and expressions generally)
3. Parameter names
4. Protection modifiers
5. import statements. The compiler looks at the original Java bytecode to determine what the candidate types are. In cases of ambiguity—if a package uses path.package1.ClassName as well as path.package2.ClassName—the annotation file may use the fully qualified type name, or as much of end of the fully-qualified path as required to disambiguate—for example, package2.ClassName.
6. Annotations (except as noted below)
Empty parameter lists, ‘()’, must be included, so as to distinguish methods from fields.
Class type parameters must be included, as in .NET class names may be overridden based on the number of type parameters.
Not all classes or methods need appear in the file. The Scala compiler treats missing methods and classes as if they are nullable.
Methods may also be annotated with @beforeConstructor (where the class parameter is taken as being the class in which it appears) and @calledByFinalize. It is anticipated that this will be rarely used.

It is anticipated that a Scala compiler would ship with annotation files for several of the core JDK packages, though perhaps not all of them. (There are a lot.)

In practice, the Scala developer will wish to write annotation files for lesser-used JDK libraries, third-party libraries (for example, Hibernate), and any internally developed, or proprietary libraries.

The Scala compiler will search for annotation files—named “packagename.scalax”—for all Java packages used, in the following locations:

In the same directory as the library itself;
Elsewhere in the current classpath;
Possibly additionally in a location specific to the Scala compiler.

Annotation files for the same Java package ‘stack’. So if the compiler ships with an annotation file for an older version of a particular Java package, the developer can augment this with a very limited annotation file which addresses only certain new classes and methods.

It may be necessary to develop rules whereby more specific, developer annotation files override more general, compiler-supplied ones. However, these rules are not specified here.

Impact on existing Scala code

I had the chance during the last MVP summit in March 2007 to talk about non-nullable types with the C# team. […] Indeed, we agreed that something like 70% of references of C# programs are likely to end-up as non-nullable ones. —Patrick Smacchia, codebetter.com, 2007

Idiomatic Scala code is likely to benefit from the proposed changes without changing greatly—as it makes use of Option in preference to null. However, code which makes heavy use of Java libraries is likely to have to undergo significant repairs.

This blow is softened by several advantages to removing null:

Removing a source of bugs
Simplifying the language. There is now no requirement to explain to new Scala users that None is sorta-kinda the same thing as null. Also, the type system is now simpler.
Improving runtime efficiency. The unboxed representation of Option[RefType] is as space-efficient as a conventional object reference.
Reducing the semantic gap between Java and Scala (and C# and Scala), since optional types are now compatible with Java nullable references.

There is persuasive evidence to suggest that removing null from a language flushes out faults from existing code. Anecdotally, according to the creator of C#, “50% of the bugs that people run into today, coding with C# in our platform, and the same is true of Java for that matter, are probably null reference exceptions.”

Where analyses have been performed of use of null in Java, it turns out that non-nullable references predominate, approximately 70%:30%. [REFERENCE FROM JAVA]

It is clear that this change would significantly break binary and source compatibility. It would also potentially break several existing assumptions about Scala language and library behaviour, not limited to the assumption of:

The existence of the value null and the type Null.
Every array being defined over all of its length elements.
A nullable reference being clearly distinct from an Option.
Option being a subclass of AnyRef.
Variables of type Option having an initial value of null.
All reference type variables being initialised to null.

It is clear that this is a significant change which, if it were to be made, would be best made before Scala had accumulated much ‘legacy’ code.

Migration path

The clearest migration path would be one in which there were an intermediate form of the Scala language in which current constructs were allowed (but warnings flagged), while the new, non-nullable constructs were also allowed.

Suggested steps on migration path

1. New initialisation rules are enforced

A version of the compiler enforces (or at least emits warnings to encourage) the new instance variable initialisation rules. This is an essential part of allowing a language without null, and likely to cause the most (potentially awkward) impact to existing code.

The Array class adds the clear method (as described above), and its use is encouraged. However, other array semantics remain unchanged.

2. Binary change uniting `null` and `None`

A version of the compiler emits binaries in the new format (with unboxed Option objects), but Scala code is still allowed to manipulate nullable reference types. Array has the new semantics (except where noted below). Depending on a compiler flag, either:

Java libraries appear to Scala code to take and return nullable reference types (as before); The compiler emits warnings if the Scala code assigns null to any Scala variables, or if it passes null to any Scala methods, (including storing null in arrays); Array.apply may return null;
Java libraries appear to Scala code to take and return Option objects; warnings are emitted for any use of null in Scala code. Array.apply never returns null, (but throws NoSuchElementException instead).

3. Scala becomes fully non-nullable

In this step, depending upon a compiler flag:

Java libraries appear to Scala code to take and return Option objects; warnings are emitted for any use of null in Scala code. Array.apply never returns null. (As per the second option in step 2.)
null is still a reserved word, but any use of it causes a compiler error.

Optional features

Though not essential to implementation of this SIP, there are 3 features which could make working with Options simpler and less verbose:

A. Syntactic shortcut for `Option`

An alias for Option[Type] would be ?Type. Similarly Option[Option[Type]] could be written ??Type, and so on, for any depth of nested Options. If Option is named without a type parameter [is this even possible?], the ‘?’ notation cannot be used.

This would be only a syntactic shortcut. Option[T] and ?T could be used interchangeably. That is, for a given type, T, Option[T]≣?T

Note that this is the syntax used for nullable types in the language Nice. It is also similar to the syntax used for nullable value types by C#, and nullable types in JavaFX Script (which place the question mark after the type name: T?).

B. Default values for non-value types

Removing null from the Scala language means that, potentially, more class instance variables must be assigned values, whereas before they could be left to the default, value, null.

It could be convenient to allow classes to specify, through their companion object, a default value. For example, imagine that the programmer writes the following code:

var someList: List[String]

Under the new requirements that all instance variables be assigned a value, this would be disallowed. In this example, an appropriate default value would be Nil, and it could be convenient to have the compiler supply this automatically. The compiler could check the companion object of the type for a no-argument method called ‘apply’. In the example above, the code would be rewritten by the compiler as:

var someList: List[String] = List()

Assuming that the types could be inferred and result type matched (in this example, ‘apply’ would need to return a value compatible with List[String]), the code would compile.

Most default values would be simple immutable constants and the call might be inlined by the compiler:

var someList: List[String] = Nil

Note that arrays of reference types would not be initialised by default, both for performance reasons, and to avoid changing array semantics too much.

C. Name mangling to allow method overloading

As discussed above, an overloaded Scala would now be unable to differ only in whether the type of a parameter is an Option[RefType] as opposed to a naked RefType. In other words, a class may not have two methods meth(s: String): Unit and meth(s: Option[String]): Unit.

We might relax this restriction by mangling the names of methods with RefType parameters. The most general method (from the point of view of a Java program wishing to call it), would have an Option[RefType] in the place of every RefType, as this would allow a Java caller to call it with any value, including null. The most restrictive method would not allow Java null, and hence would be declared with (non-nullable) reference types.

Where two methods are identical in respect of JVM parameter types, but differ in Scala type, each parameter which is either a RefType, or an Option[RefType] is numbered r₀ to r_n. Assume that ‘isOption’ is a function within the compiler which returns 1 if the given numbered parameter is of type Option[RefType], and 0 if it is of type RefType. The method is then assigned an unsigned integer number, rank, which is formed by taking isOption(r₀) as the most-significant bit, isOption(r₁) as the next-most-significant bit, up to isOption(r_n) as the least significant bit. We rename the method as “originalMethodName$rank” in the generated bytecode. We add a method with the original, unmangled name, as an alias to the lowest-ranked (most general) method.

Footnotes & references

Presentation: Null References: The Billion Dollar Mistake, QCon, Tony Hoare, 2009
The Java Memory Model, information website
Java theory and practice: Safe construction techniques, subtitled “Don’t let the this reference escape during construction”, IBM DeveloperWorks, Brian Goetz, 2002
JSR 133 (Java Memory Model) FAQ: How can final fields appear to change their values?
How to Handle Java Finalization’s Memory-Retention Issues, Sun Developer Network, Tony Printezis, 2007
Secure Coding Antipatterns: Avoiding Vulnerabilities, JavaOne 2006 Technical Session (PDF)
Destroying and Finalizing Objects (Java in a Nutshell), (notes that “Resurrecting an object is never a useful thing to do”)
Difficulties with non-nullable types (part 4), Cyrus’ Blather (MSDN Blogs), Cyrus Najmabadi, 2005
Nullable How-To, describing JetBrains Intellij IDEA nullability annotations, JetBrains website
FindBugs™—Find Bugs in Java Programs, FindBugs website
JSR 305: Annotations for Software Defect Detection, status page.
JSR 305: Annotations for Software Defect Detection in Java, source download page on Google Code.
I want non-nullable types in C#4, codebetter.com, Patrick Smacchia, 2007
The A-Z of Programming Languages: C# , Computerworld, Anders Heljberg interview with Naomi Hamilton, 2008
The Nice programming language (home page); see also Twice as Nice, IBM Developerworks, Andrew Glover, 2004

Removing null from the language