Non-nullable reference types in C# & .NET

Note: This article/proposal is an aggregation of several previous blog posts written over the past few years. The content is mostly the same, but has been edited for consistency and clarity.

Abstract

In the C# programming language, there are two kinds of data types: value types and reference types. Reference types can hold either pointers to objects, or the special value null, which is used to indicate a ‘missing’ reference. This feature of implicitly allowing nulls in reference types has some well-recognised problems and the .NET team are already exploring ways of enforcing non-nullable reference types; this paper explores one possible design. In summary:

  • C# should eventually get rid of implicitly-nullable reference types.
  • .NET can and should represent optional references the same way that it already represents optional structs (that is: Nullable<>).
  • We can and should represent non-nullable references and optional (e.g., string?) references using distinct types in the type system.
  • Guaranteeing that non-nullable references are never null comes at 2 costs:
    • default(T) is not available for non-nullable types, and this has implications for generic code.
    • Reading an uninitialised, non-nullable-reference-type field must throw an exception so that nulls are not passed around the code.

Within these constraints, there is an elegant evolution path for the language, which preserves binary and source compatibility with new code—and without burdening new code with possible NullReferenceExceptions or unwieldy syntactical noise.

Background

Existing languages, such as C# and Java, have a legacy of allowing any reference value either to contain a valid reference or to be null. In other words: reference types are implicitly nullable. null is used by programmers to mark a ‘missing value’ (like optional parameters, or a variable not required or yet to be initialised). However, this has several implications:

  • There’s nothing in the type system to say whether a value is intended by the programmer to be optional or not, so:
  • Programmers are required to explicitly document optionality in code comments, and to enforce rules and conventions manually in code, and the failure to do so is a very common source of errors.
  • The language runtime must potentially check for a null reference before every object dereference, which implies a runtime cost at each dereference and:
  • A potential NullReferenceException error is thrown at each and every dereference site.

.NET 2.0 introduced the idea of nullable value types. This allows ints, doubles, bools, and all other passed-by-value (struct) types, to be marked as optional. C# uses the same term, ‘Nullable’, to express this behaviour, but ‘nullable’ value types are significantly different from nullable reference types:

  1. Optional values have a different static type from non-optional values; an optional int has the type Nullable<int>, (usually abbreviated to int?) and is distinct from the non-nullable equivalent, int. Optional value types are actually structures which contain either no value or one value.
  2. You cannot implicitly dereference a nullable value type; you must use its Value property to explicitly coerce to the raw value type.
  3. The type system does not allow a null to be assigned to a non-nullable value type.
  4. Nullable numeric types have some special semantics. For example, nullableA + nonNullableB has a well-defined result, and does not throw a NullReferenceException. null propagates through maths expressions, similarly to NULL values in SQL.

There have already been a few initiatives to control implicitly-null reference types in C#:

  • The Spec# language is an experimental variant of C# which includes non-nullable references in addition to design-by-contract facilities. It introduces special rules to ensure that non-null fields are never accessed before being initialised, and prohibits arrays of non-nullable reference types.
  • Resharper and other compiler add-ons introduce static flow analysis of programs to track potential null reference exceptions. It uses existing implicitly-nullable types, augmented with programmer-supplied annotations.

There are some significant challenges to changing the null behaviour in C# and .NET code. Notably: the C# ecosystem is large and any change to the language or runtime has to offer solid backwards-compatibility. In addition, there are many places in the language specification where the program can encounter uninitialised values. We have to do something about this if we’re to stop nulls getting back in to a program by the back door.

Requirements for eliminating unwanted nulls

The first, obvious requirement for enforcing non-nullable references is that there needs to be some mechanism to indicate reference types without nulls. This would guarantee that dereferencing such types never results in a NullReferenceException.

This would make something explicit in the language (a value must never be null) which is currently implicit, thus removing a documentation and coding burden. For example, annotating a method parameter as non-nullable, would document and enforce that nulls are disallowed, and would obviate the need for the method to test if (p == null) throw new ArgumentNullException(“p”); on the first line of the method.

The second requirement is that we still do need a way to represent optional references sometimes (just not all the time).

We could employ the existing, implicitly-nullable reference types to indicate optional references. However, there is an opportunity to introduce:

  • Explicitly nullable reference types with the same semantics as nullable value types, introducing
  • Type uniformity between explicitly nullable reference types and value types. That is to say: the type System.Nullable<T> should be able to accept any T regardless of whether T is a reference type or a value type.

Why? Explicitly-nullable reference types add several things to the language:

  • Safety: Explicitly-nullable reference types require explicit testing and dereferencing. if (n.HasValue) DoSomethingWith(n.Value). This highlights where values may be missing, and forces the programmer to test for them.
  • Explicit intent: Marking a parameter as explicitly nullable is a stronger indication of intent than an unannotated reference-type parameter which merely defaults to permitting null.
  • Uniformity: As optional values and references are treated the same way, it makes the language simpler to learn and makes generic code less complicated.
  • Potential: An explicit syntax for optional references raises the possibility of changing the default behaviour in future versions of the language, from reference types being nullable by default, to them being non-nullable by default.

Uniformity makes writing type-generic code easier. It allows us to define, for example, an interface IParser<T> with method T? TryParse(string str)—That method returns an optional result, regardless of whether T is a reference type or a value type.

In many cases the programmer should not have to care whether a particular type they are using has value semantics or reference semantics (at least, if the type is immutable). By introducing uniformity between nullable-value and nullable-reference types, we remove one artificial distinction between value types and reference types.

Note that being able to use Nullable<T> for any type T implies that it would be possible to declare Nullable<Nullable<T>>. This is rarely required, but again, for reasons of uniformity, it is a useful addition to the language.

At this point Nullable<T> is a misleading name, and Optional<T> would be a better name, (but I suspect that that ship has already sailed).

Interoperability with old .NET libraries

There is one other important requirement: interoperability.

The new language feature must interoperate cleanly with legacy code, and must not break source or binary compatibility. At a minimum:

  • New, null-aware code should be able to pass parameters to older, non-null-aware methods in other libraries. This implies that there must be a way to pass (for example) a string! or a string? to a method which accepts an (implicitly-nullable) string.
  • Conversely new code must able to accept an implicitly-nullable reference result from a method and convert it into a null-aware-reference form.

To make it easier to migrate to the new style:

  • Old, non-null-aware code should be able to call code written with the new null-aware style. (This implies that old code which attempts to pass a string to a method which accepts a string! may have a runtime exception thrown by the runtime system if it attempts to pass in null where it is not permitted.) Similarly, old code must be able to accept a null-aware reference type result and use it as it would an ordinary implicitly-null result.
  • Ideally, it should be possible to gradually upgrade old code by adding ? and ! annotations to method parameters and results, without breaking source or binary compatibility—even overriding base class non-null-aware methods with null-aware ones.

Terminology

Reference types References to objects in the .NET virtual machine. Reference types are declared in C# with the class keyword and stored on the heap.
Value types Types whose instances are passed by value (‘on the stack’) in the .NET virtual machine. Reference types are declared in C# with the struct keyword. Value types are not inherently nullable in .NET, but can be wrapped up in a Nullable<T> structure to mark them as optional.
Implicitly-nullable [reference] types Up until now, the only form of reference types in .NET. Each value of the type may be either a valid (non-null) reference, or else the special null value. Implicitly-nullable references may be dereferenced without first checking for null (which leads to NullReferenceExceptions).
Non-nullable reference types Reference types proposed in this document, which exclude the special value null.
Non-nullable types Non-nullable reference types, plus struct types, but not Nullable<T>.
Explicitly-nullable types Exactly Nullable<T> where T is a non-nullable type. (Previously in .NET Nullable<T> was only allowed for value types T.)
Null-aware types The set of non-nullable and explicitly-nullable types (i.e., all types except implictly-nullable ones).

Design

Principles

The design proceeds from a few main principles:

  1. Non-nullable types are never null, period. Dereferencing them can never throw a NullPointerException. This is a real change to the type system, not a work-around sticking-plaster.
  2. Uniformity & consistency: We’re improving the language, not adding more warts. With the proposed changes, the new language should feel consistent and straightforward.
  3. Runtime efficiency: There should be no runtime performance penalty for adopting non-nullable, or explicitly-nullable reference types.

New Types

We add two new kinds of types to the type system, null-aware reference types:

  1. Non-null reference types T, denoted as T!, and
  2. Explicitly-nullable reference types, Nullable<T>, with syntax shortcut T?.

To avoid ambiguity, we will write existing, implicitly-nullable reference types, as T|null.

For example, strings:

string!      notNullable;
string?      explicitlyNullable;
string|null  implicitlyNullable;

The main rule is that a value of reference type T! cannot have the value null. This is in contrast to reference type T|null which can of course have the value null (in addition to any value of T!).

Implications

There are many subtle implications which we’ll explore shortly (generic code; backwards compatibility with existing code and others). However, there is one big issue that immediately arises from disallowing nulls— well, two closely-related issues:

  1. default(T) is undefined for non-nullable references.
  2. In .NET it is not possible to guarantee that every field and every array element is assigned a (non-null) value before it is accessed.

They’re closely related issues because .NET initialises all fields and array elements to default(T) before they are visible to the program. This way it avoids undefined behaviour if the program attempts to read an uninitialised field.

All implicitly-nullable reference types and existing value types in .NET have well-defined default values, but there is no obvious default value for a non-nullable reference type.

For example, what value should default(Stream!) return? By definition it cannot be null. And the language cannot just return new Stream(), (not least in this case because Stream is an abstract type). Without null, reference types simply do not have sensible default values.

However, without default values, we need to define what happens when the program reads an unassigned field (or array element).

The solution we propose is as follows:

  1. For non-nullable reference types, we (statically) disallow default(T).
  2. Fields of non-nullable reference types are initialised to a special ‘not defined’ value. Any attempt to read an unassigned field causes an UnassignedFieldException to be thrown. Think of it as kind of like null, except you get an exception when you read it, not when you dereference it.

Let’s explore the rationale and implications in a bit more detail.

default()

There is no natural definition of default(T) for non-nullable types. We could redefine it to return a value of type T|null or T?. We could make it throw a runtime exception if called on a non-nullable type.

However, these would quickly complicate generic (type-parameterised) code, which must work with all kinds of T—value types, and reference types of all kinds.

Instead we disallow default for non-nullable reference types, and provide a couple of other mechanisms for dealing with default(T)s use-cases:

  • we propose a mechanism to allow generic code to ‘forget’ about references to avoid memory leaks;
  • and for cases where generic search methods return default(T) to mean ‘not found’ (like Linq’s FirstOrDefault()), new APIs should instead just return Nullable<T>. We expand upon this later.

UnassignedFieldException

There are some situations in which non-nullable reference type fields & array elements can be referenced before the code has assigned them a value.

Consider the case when you allocate an array with: streams = new Stream![11];. That’s an array with 11 elements, each of which is a non-nullable Stream object. What should the .NET runtime do when you dereference stream[2] before you’ve assigned a value to it?

It cannot be null, because we’ve excluded nulls from the array element type; it cannot be default(Stream!) because default is not defined for non-nullable reference types.

This is a significant problem because in general we cannot force the programmer to assign a value to a field or array element before reading it. There are a few situations where unassigned values exist in C#:

  1. A non-nullable reference instance variable in a class, where that variable has not (yet) been assigned a value by the constructor.
  2. Where the constructor does assign a value, but the this reference ‘escapes’ from the constructor before the constructor has finished executing. (This is bad practice, but it’s quite easy to do accidentally.)
  3. A non-nullable reference instance variable in a struct, when the struct’s constructor has not been called. For example: default(StructType) or new StructType[20];
  4. Arrays of non-nullable reference types (as per the example): initially none of the array elements will contain a valid reference.

The solution is to accept that some values in a live program may be ‘unassigned’. The language cannot realistically prevent programs from dereferencing unassigned fields. The best we can do is to throw a UnassignedFieldException if the program attempts to read from an unassigned field. In fact, not only is that the ‘best’ we can do; it’s the only thing we can do without significantly changing the language.

Some languages do manage to enforce that all fields are assigned before they are read, but doing so in C# would require significant changes to the semantics of the language, and would be easy to circumvent from legacy code.

How is this actually better than allowing uncontrolled nulls, and ending up with NullReferenceExceptions in the first place? We seem to have just substituted the old familiar NullReferenceExceptions with new-fangled UnassignedFieldExceptions.

There are a couple of important differences:

Unexpected null values can propagate far through a program before they eventually trigger a NullReferenceException. The exception (if it occurs) is far removed from the fault in the code which accessed the unassigned value. Null values infect every code path.

On the other hand, an UnassignedFieldException happens as soon as you try to read the unassigned variable; the program fails as early as possible and the pointing figure of blame points more accurately at what went wrong.

Another important difference from NullReferenceException is that UnassignedFieldExceptions never affect local variables. Every local variable (including method parameters) is guaranteed to have a valid value. UnassignedFieldExceptions only ever affect reads of fields and array elements.

Furthermore, it is easy to determine statically which code is prone to UnassignedFieldExceptions. Programmers can eliminate them completely by ensuring that:

  1. Constructors assign all nullable-reference fields before returning and before calling any method which reads these fields, and before calling any virtual methods, and before passing this as a parameter to another method, and before making this visible to any other thread. (Phew!) These things are good practice and may be statically checked.
  2. Structs containing reference types should either make them (explicitly) nullable references, or include a flag to say whether the constructor was called.
  3. Most code avoids using arrays. Array code is low-level code; it should be carefully audited and tested. Most code should use more suitable collection types.

All of these cases can be identified statically, controlled by good programming practice, and violations flagged by static analysis tools.

To summarise: UnassignedFieldExceptions are thrown when programs attempt to use values before assigning them. They are very much more limited in their scope than NullReferenceExceptions and they are possible to avoid with reasonable coding practices.

A new operator: ! (the ‘damnit’ operator)

C# of course already has a prefix Boolean operator called !, which performs a logical ‘not’. This new ! operator is a postfix operator and means ‘assert not null, then interpret as a non-nullable reference’:

var notNull = possiblyNull!;

When applied to an explicitly nullable expression it returns expression.Value (which may throw an InvalidOperationException if the value is null).

When applied to an implicitly-nullable expression it returns the result of the expression typed as not-nullable or else throws a NullReferenceException:

Object|null obj = …;
var r0 = obj!; // r0 is of type ‘object!’. Throws exception if obj == null
var r1 = obj!.ToString(); // Throws NullReferenceException if obj == null
var r2 = obj.ToString(); // Precisely the same behaviour as r1 example.

When applied to a non-nullable expression it is a no-op. (The compiler should emit a warning, “Redundant ! operator”, since the ! could be safely removed without altering the program semantics.)

For example:

string?     a = null;
string?     b = "B";
string|null c = null;
string|null d = "D";
string!     e = "E";

var r1 = a!; // throws IllegalOperationException
var r2 = b!; // r2 == "B"
var r3 = c!; // throws NullPointerException
var r4 = d!; // r4 == "D"
var r5 = e!; // r5 == "E" (compiler warning: “redundant ! operator”)
// In all cases r1…r5 are inferred as ‘string!’

Rather than using the ! operator, the same operation can be written as an explicit type cast:

// Same semantics as ‘implicitlyNull!’:
nonNull = (string!)implicitlyNull;

// Same semantics as ‘explicitlyNull!’ or ‘explicitlyNull.Value’
nonNull = (string!)explicitlyNull;

Or, performing a type-cast and non-null coercion in a single operation:

var obj = (object|null)"Hello world";
//...other code...
var nonNullString = (string!)obj; // Could throw NullRef or ClassCastException

New Syntax

What we’ve proposed lets the programmer specify non-nullable and explicitly-nullable reference types in C#, which is very powerful.

However, it is inconvenient to write ! after every type, and it clutters up the code. Non-nullable types should be the default, not an optional setting.

We propose a compiler setting to flip the default, so that reference types without any annotation are assumed to be non-nullable. We will discussed this in more detail later.

Implementation

Importantly for runtime efficiency, implicitly-nullable reference types, non-nullable reference types and explicitly-nullable reference types (Nullable<T> where T : class) are all encoded the same way in memory: as an object pointer or handle.

Kind of type Valid reference No reference
Implicitly-nullable Object reference null
Non-nullable Object reference (Only possible for array elements & fields.)

throws UnassignedFieldException upon reading

Explicitly-nullable reference type HasValue: true

Value: Object reference

HasValue: false

Value: throws IllegalOperationException

 

For this reason—in the absence of generic types—the proposed changes could be implemented without any alteration to the .NET runtime/VM:

Field/array reads where the compiler cannot guarantee that the field is assigned, would be generated as: a read, a check for null, and a throw of an UnassignedFieldException if null.

Every method would be generated to check-and-throw-NullArgumentException for each non-nullable reference argument.

Some of these checks could be omitted, for example, private methods would not need to check their parameters, and field read checks can be omitted if the compiler can guarantee that the value is definitely assigned at the point of reading.

Implementation in generic code

In type-parameterised (generic) code, the runtime code generation must vary, depending on the kind of types which have been passed as type arguments. When the type arguments are explicitly-nullable or non-nullable, code is generated according to the above rules. When the type arguments are implicitly-nullable reference types, or struct types, code is generated as before.

The .NET runtime already generates different executable code depending upon the kind of the type arguments. For example, in existing generic code, the native code to evaluate (T)t == null is different depending upon whether T is instantiated as a struct, a class or a Nullable<struct>.

This seems straightforward. There are however a few subtle implications for writing and using generic code, since generic code by definition must work with potentially different kinds of type arguments, and we will examine this next.

Implications for generic code

We’ve described some design requirements for implementing non-nullable and explicitly-nullable reference types in C#, and a design which meets those requirements.

However, there is one major area we’ve not yet delved into: how these null-aware types interact with .NET generic types.

Generic types (and methods) are in a slightly different category than the kind of code we’ve considered so far. Up until now we’ve been talking about adding constraints to our own types in our own code to disallow nulls or control them. We can update the types within our own code as we like. There is even the possibility of automatic conversions between null-aware and implicitly-null reference types at the boundary of old and new code.

With generic code, it’s slightly more complicated: some of the types which generic code manipulates are specified by client code when the class is used. Programmers aim to write generic code to be robust, and avoid NullReferenceExceptions, but our client code may want to supply implicitly nullable reference types as type parameters. (Or not.) Conversely, if existing generic code restricts a parameter to be a class, it probably expects to be able to make use of null.

These and other assumptions are challenged if non-nullable reference types are passed as type parameters to existing generic code, or if the generic code wants to restrict its type parameters to non-nullable types.

New constraints on type parameters

In order to allow generic code to peacefully coexist with null-aware code—and to protect legacy code from null-aware code breaking its assumptions, we add a few new type constraints to generic type parameters:

  • default and ~default declare whether or not a type parameter needs to support a default value;
  • |null, ~null and ! declare whether a type parameter must allow implicit null, may contain implicit null, or must not allow implicit null.

default and ~default

For generic code to be allowed to call default(T), (where T is one of its type parameters), the provided type T must have a default value. So we introduce a new type parameter constraint default, which specifies that the argument type must have a default value:

public class Fooo<T> where T: default {
   public void Brrrr() { return Something() ?? default(T); }
}

In previous versions of C#, every type had a default value, so there was never a need to declare it as a type constraint, so no existing generic code declares ‘default’ constraints for type parameters.

For compatibility with existing code, default is assumed by the compiler for type parameters (except for type parameters on interfaces). We introduce its antithesis, the keyword ~default to denote that a type parameter does not assume a default value. For example:

public class List<T> where T: ~default {
   // Great! We can now store non-nullable ref types in generic Lists
   …
   // We’ll deal with this tricky bugger later:
   public T Find(Predicate match) { … }
}

Just to be clear, that ~default is not prohibiting type T from having a default value, just declaring that it doesn’t need to have one. It’s removing a constraint, not adding one.

It means “We welcome null-aware code here! We’re happy to accept a non-nullable type parameter.” It allows the type parameter to be of any type whatsoever (subject to other type constraints of course).

Value types always have default values, so if a type parameter has a constraint of struct, the default is implied and need not be specified.

The compiler can be set to issue warnings for generic type parameters which have not been explicitly declared as either default nor ~default (except where other constraints imply one or the other).

Type parameters on interfaces are implicitly ~default and ~null, unless declared otherwise. Classes implementing the interface may choose to require or not require default values when they implement the interface. (Type parameters in implementations of interfaces are still constrained to ‘default’ unless specified otherwise.)

|null, ~null and !

These type parameter constraints specify that the parameter must be implictly-null, may be implicitly null, or cannot be implicitly null, respectively.

For backwards compatibility, |null is assumed for parameters constrained to be reference types, and ~null is assumed otherwise.

|null is only allowed when the parameter is constrained to be a class type, and it implies default.

! implies ~default.

Null-aware coercion

As we described above, the programmer can declare that a generic type parameter must be null-aware, with the following syntax:

public struct Nullable<T!> { … }
or
public struct Nullable<T> where T! { … }

This can be thought of as a type constraint on T, or it can be thought of as transforming type arguments for T.

When T is already null-aware, no transformation occurs.

When T’s argument is an implicitly-nullable reference type, it is transformed to T! Another way of looking at it: it’s as if every reference to the type name T throughout the class (or method, or interface) is replaced with T!.

This reflects how certain values in generic code may not be allowed to be (implicitly) null. For example, the existing IDictionary<KeyType, ValueType> class does not allow null values for keys. In this new version of .NET, it would be defined IDictionary<KeyType!, ValueType>.

Note that the ! constraint only removes implicit nulls, and still allows explicitly null reference types.

Some examples

Here are some examples of how the default constraint and null-aware coercion interact:

// T1 is implicitly ~default, and coerced to a null-aware type:
public class Generic1<T1!> { … }

// T2 must be an explicitly-nullable or a value type (or have a default value).
public class Generic2<T2!> where T2: default { … }

// T3 must be an implicitly-nullable reference type (or have a default value):
public class Generic3<T3> where T3: class, default { … }

// A non-nullable reference type with a default? Only if we allow non-nullable
// reference types to define default values. It’s an unlikely combination:
public class Generic4<T4!> where T4: class, default { … }

In reality, type-generic programs should usually completely avoid default, because default(T) is usually used to mean ‘no value’, and that’s much better represented by an explicitly-nullable result or parameter. (And the compiler should almost certainly produce a warning for generic code which constrains any type parameters to class, default.)

Respecting the garbage collector

When you’re finished with a (reference) object, you should dispose of that reference, otherwise the garbage collector will be unable to reclaim it.

With implicitly nullable types, just assign null to it. With explicitly-nullable reference types, just assign null to it. For value types (if you need to), you can assign default(T) to it. However, if you don’t know the exact type (because you’re writing type-generic code), it’s trickier to ‘forget’ a value (which might or might not contain an object reference).

Boxes

One way to hold on to a value, and ‘unset’ it easily, without default() being available, and without possibly incurring additional memory overhead— for value types, T? is larger than T —is with a Box:

public struct Box<V!> where V: ~default {
   private readonly V _value;
   public Box() {} // Value is null or unassigned.
   public Box(V v) { _value = v; }
   public V Value { get { return _value; } } // Might throw FieldUnassignedEx
   public bool IsAssigned; // False if getting Value would throw exception
   public bool IsNull => IsAssigned && Value == null;
}

// Value we wish to be able to ‘unset’ without knowing its exact type:
Box box; 

box = Box.Of(newValue); // Set the value
var v = box.Value; // Get the value
box = default(Box); // Unset the value
var v2 = box.Value; // Will throw FieldUnsetException if no default

You can use Boxes to ask for the default value of a type—in a way which will always compile (no matter the type), but will fail at runtime if the type does not have a default. We’ll show this trick later to maintain source & binary compatibility with old code:

// Gets the default value of T— but may fail at runtime:
var defaultTIfAvailable = default(Box<T>).Value;

Arrays

If you’re building a generic collection class (List<T>, for example), you’re probably using an array for storage. If you allow elements to be removed from your collection, you are faced with disposing of array elements somehow, because your collection could be used to store references, and these references need to be garbage-collected. With array elements, as with single values, it’s easy enough to dispose of references if you know the type, but in generic code it’s tricker.

You could choose an implementation in which the elements are stored in an array of Box<T>. To ‘forget’ an element, you would assign the value ‘default(Box<T>)’ to its array slot. This technique has other advantages, too, such as avoiding the implicit check for ArrayTypeMismatchException by avoiding C# array covariance.

However, to avoid the need to implement (slightly-obscure) tricks, we redefine Array.Clear():

  • For arrays of types with default values, it behaves as it does now, and sets the elements in the given range to that default value.
  • For arrays of types without defaults, it makes the given range of array elements ‘unassigned’, the same state they were in when the array was allocated originally.

Some compatibility concessions

Default default parameters

We do allow one use of the default() operator on ~default generic code type parameters: as the default value of optional parameters. For example:

public class Generic where T: ~default {
   // Clients can only take advantage of this default parameter when T
   // has a default value, otherwise they need to specify it explicitly:
   public Generic(T initialValue = default(T)) { … }
}

public class ClientCode {
   void Method1() {
      //var m1 = new Generic<FileInfo!>(); // Disallowed default parameter
      var m2 = new Generic<FileInfo!>(new FileInfo(Path)); // Explicit: OK
      var m3 = new Generic<int>(); // OK because default(int) is defined.
   }
   void Method2<U!>() where U: new() {
      //var m4 = new Generic<U>(); // Disallowed default parameter
      var m5 = new Generic<U>(new U()); // Explicit: OK
      var m6 = new Generic<U?>(); // Also OK, because default(U?) is defined
   }
}

This rule exists to maintain source compatibility with old clients of generic code.

Living without default()

As per the example above where we showed a null-aware version of the standard .NET collection class List<T>:

public class List where T: ~default {
   // Great! We can now store non-nullable ref types in generic Lists
   …
   // Problematic, since it’s supposed to return default(T) if no match:
   public T Find(Predicate match) { … }
}

The rest of the public API is fine, but two methods make use of the default() operator to return an ‘unknown’ result: Find() and FindLast(). These methods could have been defined to throw an exception if they cannot find a match. And with explicitly-nullable reference types, they could be redefined to return T?… but that would break backwards compatibility.

We can work around this, using Boxes to get a default value without using the default keyword:

public class List where T: ~default {
   [Obsolete("Use TryFind instead")]
   public T Find(Predicate! match) {
      var d = DefaultOrThrowNotSupported("Find");
      return TryFind(match) ?? d;
   }

   // A new, more useful Find method. New code should use this:
   public T? TryFind(Predicate<T!> match) {
      // Return an element from the list or (T?)null.
   }

   private T DefaultOrThrowNotSupported(string! methodName) {
      try {
         // The default value of T, or a UnassignedFieldException.
         return default(Box).Value;

      } catch (UnassignedFieldException) {
         throw new NotSupportedException(
            string.Format(
               "{1} method not supported for types without a default value",
               methodName));
      }
   }
}

Newer code should use the newer TryFind() method, whose result is always T?, regardless of what T is.

Uses of default() operator in List<T>

Looking at the source code of the current .NET List<T> implementation, there are a couple of other uses of default(T):

  • Within the List’s Enumerator struct, to reset the ‘current element’ variable for the List’s IEnumerator. To fix this, the ‘current element’ instance variable can instead be stored as a Box<T> and the Current property changed to return current.Value. Legacy code (or any code using types with default values) will behave exactly as before. If the type does not have a default value, calling Current when its value is undefined will throw a UnassignedFieldException. This is acceptable since the behaviour in that case is undefined—and, importantly, only newer code will be affected.
  • default() is also used in various methods which take object parameters, to test whether the type accepts null: if ((object)default(T) == null). This would be changed to if (default(Box<T>).IsNull).

Backwards—and forwards—compatibility

Among the requirements we identified earlier was compatibility: new code must be able to interoperate with old code. Old libraries should be able to be upgraded to null-aware without breaking backwards compatibility with existing client code.

And we should also look specifically at generic code again, since that adds further complexity: newer generic code may need to work with implicitly-null type parameters specified by legacy clients, for example.

Terminology

Null-aware code: Code written for newer versions of .NET which support the proposed non-nullable reference types.

Non-null-aware code: Legacy code which only knows implicitly-nullable reference types.

Migrating code from implicit-null to null-aware

So far we’ve talked about adding non-nullable references and explicitly-nullable references to C#, with a backwards compatible syntax: explicitly-nullable reference types are expressed with the type name and a postfix !.

However, eventually we want implictly-nullable types go away — most codebases should be able to do without them completely. And we don’t want newer code cluttered up with (redundant) ! type modifiers. We should be able to write code which assumes non-nullable reference types.

Of course, old code should, absent of any other changes, continue to compile, pass its tests and keep its existing execution semantics.

So we define two slightly different versions of the syntax: one for backwards-source-compatibility with old code, which assumes implicitly-nullable types as the default (the Old Syntax); one for new code, which assumes non-nullable types as the default (the New Syntax).

  1. We define a superset unambiguous syntax, which may be used to annotate both implicitly-nullable and null-aware types, and to indicate the new type constraints for generic code. We’ve used that syntax in this document, to unambiguously distinguish between implicitly-nullable (T|null) and non-nullable (T!) types.
  2. Old Syntax: Same as the unambiguous syntax, except that implicitly-nullable types are assumed where not specified: an unadorned type T is interpreted as T|null.
  3. New Syntax: Same as the unambiguous syntax, except that non-nullable types are assumed where not specified: an unadorned type T is interpreted as T!.

Old code runs as before, and may be gradually enhanced with null-aware reference types over time, using the Old Syntax.

At some point the programmer can switch over the codebase (possibly a file at a time) to using the New Syntax.

In legacy code, using the Old Syntax:

  • An unadorned reference type T name means T|null
  • Literal strings are inferred to be of type String|null
  • The expression x as T returns a result of type T|null
  • In general, reference type expressions are inferred to be implicitly-nullable.

And eventually we intend all C# programmers to migrate to the New Syntax, in which:

  • An unadorned reference type T means T!
  • Literal strings are inferred to be of type String!
  • The expression x as T returns a result of type T?
  • Where possible, reference type expressions are inferred to be non-nullable.

In summary:

CONCEPT OLD SYNTAX NEW SYNTAX
Implicitly-nullable-reference type T T T|null
Non-nullable reference type T T! T
Explictly-nullable reference type T T? T?

How to manage this migration in syntax?

We propose that C# projects will have a per-project setting to switch them to the new syntax, much like the existing choice of C# language level. This would be overridable on a file-by-file basis with a #pragma line near the top of any .cs source file.

We expect that Visual Studio, Resharper, Visual Studio Code and others would have a command for refactoring a project wholesale from legacy syntax to new, null-aware syntax.

Legacy code using null-aware APIs

Legacy code (non-null-aware code) has only one kind of reference types, implicitly-nullable ones. We would like to be able to upgrade legacy APIs to null-aware, but have existing non-null-aware code continue to use them.

Changing types in an API from implicitly-null types to null-aware types should be a backward-compatible operation.

So we define the following rules for how they interact:

Reading fields/array elements

Legacy code which reads a field/array element with null-aware reference type T! has the field value implicitly casted to type T|null. Note that reading a non-nullable reference field involves the chance of it throwing a FieldUnassignedException if it has never been initialised.

Writing fields/array elements

Legacy code which reads a field/array element of type T? has the value implicitly casted from type T|null to type T?. (This is almost certainly compiled away to a no-op.)

Legacy code which writes to a field/array element of type T! has the implicitly-nullable value automatically coerced to non-nullable, (as if with the ! post-fix operator), at the point of assignment. This will throw a NullReferenceException at runtime if the value is null.

Methods which return null-aware types

Legacy code which calls a method/property-getter which returns a null-aware reference type T! or T?, has that return value implicitly converted to T|null on return of the method. (This is almost certainly a no-op in the virtual machine.)

Methods which accept null-aware types

Legacy code which calls a method or property-setter with nullable reference type parameter T?, implicitly casts the argument from T|null to T? at the call site.

Legacy code which calls a method/property-setter which takes a parameter of non-nullable reference type T! acts as-if it calls a (synthetic) overloaded method, accepting implicitly-nullable parameters.

For example, if an API contains a method:

public int Compare(String! first, String! second);

Then legacy code would call the following (synthetic, created at runtime) method on the API:

public int Compare(String|null first, String|null second) {
  if (first == null) throw new ArgumentNullException(“first”);
  if (second == null) throw new ArgumentNullException(“second”);
  return this.Compare(first!, second!); // Calls original method.
}

That is to say: when legacy code calls a method or property setter with a null value, where method or property setter does not accept nulls, the runtime will throw an ArgumentNullException, with the stack trace indicating that the exception was thrown within the called method or setter.

This creates a smooth migration path whereby old APIs which throw ArgumentNullExceptions manually may be upgraded to declaring their arguments as non-nullable. Legacy code will still receive ArgumentNullExceptions as before, maintaining the legacy behaviour — whereas newer code is statically prevented from passing a null argument.

Generic Code

Existing generic code should be able to accept non-nullable type arguments, with relatively few changes to the existing generic code.

As discussed before, for generic code to accept a non-nullable type argument, the type parameter does need to be declared to not require default(), (since non-nullable reference types cannot have default values).

Additionally, newer generic code might choose to exclude implicitly-nullable types from its parameters.

CONCEPT OLD SYNTAX NEW SYNTAX
Generic type parameter U constrained to support default(U): (none required) where U: default
Generic type parameter is required to be implicitly-nullable
reference type (also implies default):
where U: class where U: class|null
Generic type parameter is allowed to be implicitly-nullable
or non-nullable reference (also implies ~default):
where U: class ~null where U: class ~null
Generic type parameter is allowed to be any type at all
(nullable, non-nullable, reference, struct, whatever):
where U: ~default where U: ~null
Generic type argument is coerced to be non-nullable
reference type (also implies ~default):
where U: class! where U: class
Generic type argument is coerced to be a null-aware
type (value or reference) (also implies ~default):
where U! (none required)

This is a little complex — but on the other hand, you’d only write generic code which deals with nullable and non-nullable types if you’re writing low-level library or framework code. Most programmers (and code-bases) would move from generic code which assumes implicitly-nullable-references, to generic code which assumes null-aware references. (And most programmers don’t write much generic code.)

The last 2 cases in the table above require a bit of explanation: they say that type arguments are ‘coerced’ to be non-nullable. That means that users of the generic class are allowed to supply implicitly-null type arguments, but the compiler rewrites them to the equivalent non-nullable reference type when instantiating the generic class.

For example, dictionary keys are not allowed to be (implicitly) null, while dictionary values are. The IDictionary type could be declared like this:

public interface IDictionary<KeyType, ValueType>
    where KeyType!
    where ValueType: ~null ~default

Keys are constrained to not allow (implicit) nulls; Values are allowed to be any type whatsoever, including implicitly-null and non-nullable.

Legacy code (using the Old Syntax) is still be able to instantiate a dictionary type using its existing syntax:

// We’re using Old Syntax, so this is equivalent to writing
// “new Dictionary<string|null, int>()”
// …But the first type parameter is coerced to “string!”.
var dict = new Dictionary<string, int>();

// Even though the first type parameter is non-nullable, and hence
// the first parameter of ‘put’ is non-nullable, legacy callers are
// allowed to pass implicit nulls, —which result in a runtime
// exception at the call-site:
var myKey = (string)null;
dict.put(myKey, 22); // ← Throws NullArgumentException here

Summary

We’ve shown how we can enhance the strength of library code, in a backwards-compatible way: new code gets strong compile-time guarantees; old code gets these strong guarantees enforced at the boundary of the old and the new code.

Existing code continues to compile and run with its existing semantics, but can be gradually migrated to stronger types, using the backwards-compatible Old Syntax. Over time it may be migrated to the more succinct New Syntax, possibly a single file at a time.

Old, compiled code continues to run, even against upgraded, null-aware libraries.

And generic code can be written which accepts either or both implicitly- and non-nullable types.

All proposed new (and changed) language elements

| null type modifier

The type T|null is the new way to unambiguously refer to an implicitly-nullable reference type (when an unadorned T means ‘non-nullable reference to T’).

Not often required, since new, null-aware code can do without implicitly-nullable-types, and old, implicit-null code, would generally be written in the old syntax anyway.

! type modifier

The type T! is…

  • the non-nullable version of T, when T is an implicitly-nullable reference type; otherwise
  • T

Only required in the old, implicit-null syntax, since in the new, null-aware syntax, an unadorned reference type name means ‘non-nullable reference’.

! post-fix operator

When called on an expression ‘e’ of kind… Returns:
non-nullable e

(no effect, compiler warning)

explicitly nullable e.Value

(Throws an IllegalOperationException if null)

implicitly-nullable e cast to non-nullable equivalent type

(Throws a NullReferenceException if null)

! generic type parameter modifier

Applied to the declaration of generic type parameters. For example:

public class Generic<T!> {
or
public class Generic<T> where T! {

When T is already null-aware it has no effect; when T is an implicitly-nullable reference type, it’s turned into T!. It’s as if every reference to the type name T throughout the class (or method, or interface) is replaced with T!. Not required in New Syntax.

Nullable<T!> aka T?

A generic struct which can contain 0 or 1 instances of T, as per previous versions of .NET.

However, it now applies to all null-aware types (i.e., all types except implicitly-nullable reference types).

Ideally the Nullable<> struct should turn into a proper option monad, with all the methods that Java’s Optional<> class has, but that’s not essential for this proposal to work.

Importantly, type T?! is the same as T!? is the same as T?

as operator

The current semantics of x as T specify that it either returns a result with the type given by the second argument, or it returns null; its result type is T|null. In the New Syntax, as is an operator which returns T?. Additionally, it can be applied to explicitly-nullable value arguments, again returning an explicitly-nullable result.

default generic type constraint

Indicates that the constrained type parameter must contain default values. That is: the default(T) expression must be valid. Explicitly required in New Syntax, implied in Old Syntax. Implied for struct types.

~default generic type unconstraint

Indicates that the ‘constrained’ type parameter need not contain default values. That is: default(T) need not be valid. Implied in New Syntax, explicitly required in Old Syntax.

|null generic type constraint

Indicates that the constrained type parameter must contain implicit nulls. Only valid for reference (class) parameters. Explicitly specified in New Syntax. Implied (for reference types) in Old Syntax.

~null generic type unconstraint

Indicates that the ‘constrained’ type parameter is allowed to contain implicit nulls (though need not). Redundant if the |null constraint is also present. Explicitly specified in New Syntax. Implied (for reference types) in Old Syntax.

system.Box<T> struct

New helper type, part of the ‘system’ package (i.e., a built-in type). Used in generic utility classes (like collection classes) which must deal with type parameters which might be non-nullable references, and which need a way to ‘forget’ references to allow them to be garbage collected.

namespace system;
public struct Box<T> {
   /**
    * Constructs an empty Box. Same as 'default(Box<T>)'.
    */
   public Box();

   /**
    * Constructs a Box containing a value.
    */
   public Box(T value) {
      this.Value = value;
   }

   /**
    * Value held within the Box. Getting the value when the Box is
    * not initialised will throw an UnassignedFieldException.
    */
   public T Value {get;}

   /**
    * Whether the Box contains a defined value (one which may be retrieved
    * from the Value property without throwing an exception).
    */
   public bool IsAssigned {
      get {
         // This would be an intrinsic implementation of the runtime,
         // but equivalent to:
         try {
            T ignored = this.Value;
            return true;
         } catch (UnassignedFieldException) {
            return false;
         }
      }
   }

   /**
    * If and only if 'IsAssigned && Value == null'. (Does NOT throw an exception
    * if the value is undefined.) Can be used to find out if a type allows null:
    * var allowsNull = new Box<T>().IsNull; 
    */
   public bool IsNull => IsAssigned && Value == null;
}
public static class Box {
   public static Box<T> Of<T>(T value) => new Box<T>(value);
}

// For example:
var c = Box.Of(obj);

// Then use the contained value:
a = c.Value.ToString();

// Later, when we want to 'forget' c to allow garbage collection— but we
// don’t know what type it is, because its type comes from a generic type
// parameter:
c = default(Box<object>);

// If we subsequently try to use the contained value, a runtime error
// MIGHT occur (depending upon the type):
b = c.Value.ToString(); // Could throw UnassignedFieldException. Or not.

UnassignedFieldException class

New class in system package, thrown when attempting to access a field or array element which does not have a default value, and which has never been assigned.

Array.Clear() method

A clarification of the existing semantics: the Array<T>.Clear() method resets the specified array elements to their initial state, which will be default(T) for types with a default, and ‘uninitialised’ for types without a default value.

Summary

We’ve covered a design to introduce non-nullable and explicitly-nullable reference types to C# and .NET, including:

  • A syntax to declare them (in fact, one backwards-compatible and one new syntax),
  • Semantics, including some of the new rules of assigning values and calling methods,
  • Semantics of uninitialised non-nullable references fields (and UnassignedFieldExceptions)
  • The impact on generic types and new ways to constraining generic parameters
  • The missing default(T) operator… and what to use instead
  • Backwards compatibility
  • Forwards compatibility and evolving public APIs

The proposal is simple (introduces few new concepts) and, importantly, leads to a simpler C# language.

I look forward to reading your comments in the feedback section. (Comments saying ‘it won’t work’ when the commenter has clearly not read the article will be mercilessly mocked.)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.