Nullable reference types in C#: backwards compatability

C# Non-Nullable Types: Backwards Compatibility

In previous episodes I proposed a set of requirements for non-nullable reference types in C#/.NET, and a proposed design, including their interaction with generic types (polymorphic types).

There is also something to be said about backwards- and forwards- compatibility:

How can legacy code — using implicitly-nullable reference types — be migrated to using non-nullable references without causing disruption?
How should legacy code interact with APIs which accept or return non-nullable reference types?
What are the implications of generic code to backwards compatibility?

Terminology

Null-aware code: Code written for newer versions of .NET which support the proposed non-nullable reference types.
Non-null-aware code: Legacy code which only knows implicitly-nullable reference types.

Recap of the proposal

In addition to legacy, implicitly-nullable reference types (sometimes here unambiguously denoted T|null), we add two new kinds of reference types: non-nullable reference types, T!. And we extend struct Nullable to be allowed to contain reference types: these are explicitly nullable reference types, denoted T?

Newer code will exclusively use ‘null-aware’ types (everything except implictly-nullable references), while old code is ‘non-null-aware’—it continues to work with implicitly-nullable references.

Migrating code from implicit-null to null-aware

So far we’ve talked about adding non-nullable references and explicitly-nullable references to C#, with a backwards compatible syntax: explicitly-nullable reference types are expressed with the type name and a postfix ‘!’.

However, eventually we want implictly-nullable types go away — most codebases should do without them completely. And we don’t want newer code cluttered up with (redundant) ‘!’ type modifiers. We should be able to write code which assumes non-nullable reference types.

Of course, old code should, absent of any other changes, continue to compile, pass its tests and keep its existing execution semantics.

So we define two slightly different versions of the syntax: one for backwards-source-compatibility with old code, which assumes implicitly-nullable types as the default (the Old Syntax); one for new code, which assumes non-nullable types as the default (the New Syntax).

We define a superset unambiguous syntax, which may be used to annotate both implicitly-nullable and null-aware types, and to indicate the new type constraints for generic code. We’ve used that syntax in this document, to unambiguously distinguish between implicitly-nullable (T|null) and non-nullable (T!) types.
Old Syntax: Sames as the unambiguous syntax, except that implicitly-nullable types are assumed where not specified.
New Syntax: Same as the unambiguous syntax, except that non-nullable types are assumed where not specified.

Old code runs as before, and may be gradually enhanced with null-aware reference types over time, using the Old Syntax.

At some point the programmer can switch over the codebase (possibly a file at a time) to using the New Syntax.

In legacy code, using the Old Syntax:

An unadorned reference type T name means ‘T|null’
Literal strings are inferred to be of type ‘String|null’
In general, reference type expressions are inferred to be implicitly-nullable.

And eventually we intend all C# programmers to migrate to the New Syntax, in which:

An unadorned reference type T means ‘T!’
Literal strings are inferred to be of type ‘String!’
Where possible, reference type expressions are inferred to be non-nullable.

In summary:

Concept	Old Syntax	New Syntax
Implicitly-nullable-reference type T	`T`	`T\|null`
Non-nullable reference type T	`T!`	`T`
Explictly-nullable reference type T	`T?`	`T?`

How to manage this migration in syntax?

We propose that C# projects will have a per-project setting to switch them to the new syntax, much like the existing choice of C# language level. This would be overridable on a file-by-file basis with a ‘#pragma’ line near the top of any .cs source file.

We expect that Visual Studio, Resharper, Visual Studio Code and others would have a command for refactoring a project wholesale from legacy syntax to new, null-aware syntax.

Legacy code using null-aware APIs

Legacy code (non-null-aware code) has only one kind of reference types, implicitly-nullable ones. We would like to be able to upgrade APIs to null-aware, and have existing non-null-aware code continue to use them.

Changing types in an API from implicitly-null types to null-aware types should be a backward-compatible operation.

So we define the following rules for how they interact:

Reading fields/array elements

Legacy code which reads a field/array element with null-aware reference type T! has the field value implicitly casted to type T|null. Note that reading a non-nullable reference field involves the chance of it throwing a FieldUnassignedException if it has never been initialised.

Writing fields/array elements

Legacy code which reads a field/array element of type T? has the value implicitly casted from type T|null to type T?. (This is almost certainly compiled away to a no-op.)

Legacy code which writes to a field/array element of type T! has the implicitly-nullable value automatically coerced to non-nullable, (as if with the ! post-fix operator), at the point of assignment. This will throw a NullReferenceException at runtime if the value is null.

Methods which return null-aware types

Legacy code which calls a method/property-getter which returns a null-aware reference type T! or T?, has that return value implicitly converted to T|null on return of the method. (This is almost certainly a no-op in the virtual machine.)

Methods which accept null-aware types

Legacy code which calls a method or property-setter with nullable reference type parameter T?, implicitly casts the argument from T|null to T? at the call site.

Legacy code which calls a method/property-setter which takes a parameter of non-nullable reference type T! acts as-if it calls a (synthetic) overloaded method, accepting implicitly-nullable parameters.

For example, if an API contains a method:

public int Compare(String! first, String! second);

Then legacy code would call the following (synthetic, created at runtime) method on the API:

public int Compare(String|null first, String|null second) {
  if (first == null) throw new ArgumentNullException(“first”);
  if (second == null) throw new ArgumentNullException(“second”);
  return this.Compare(first!, second!); // Calls original method.
}

That is to say: when legacy code calls a method or property setter with a null value, where method or property setter does not accept nulls, the runtime will throw an ArgumentNullException, with the stack trace indicating that the exception was thrown within the called method or setter.

This creates a smooth migration path whereby old APIs which throw ArgumentNullExceptions manually may be upgraded to declaring their arguments as non-nullable. Legacy code will still receive ArgumentNullExceptions as before, maintaining the legacy behaviour — whereas newer code is statically prevented from passing a null argument.

Generic Code

Existing generic code should be able to accept non-nullable type arguments, with relatively few changes to the existing generic code.

As discussed before, the type argument(s) do need to declare that they can be assigned types without default values, (since non-nullable reference types cannot have default values).

Newer generic code might want to exclude implicitly-nullable types from its allowed parameter types.

Concept	Old Syntax	New Syntax
Generic type parameter U constrained to support `default(U)`:	(none required)	`where U: default`
Generic type parameter is required to be implicitly-nullable reference type (also implies `default`):	`where U: class`	`where U: class\|null`
Generic type parameter is allowed to be implicitly-nullable or non-nullable reference (also implies `~default`):	`where U: class ~null`	`where U: class ~null`
Generic type parameter is allowed to be any type at all (nullable, non-nullable, reference, struct, whatever):	`where U: ~default`	`where U: ~null`
Generic type argument is coerced to be non-nullable reference type (also implies `~default`):	`where U: class!`	`where U: class`
Generic type argument is coerced to be a null-aware type (value or reference) (also implies `~default`):	`where U!`	(none required)

This is a little complex — but on the other hand, you’d only write generic code which deals with nullable and non-nullable types if you’re writing low-level library or framework code. Most programmers (and code-bases) would move from generic code which assumes implicitly-nullable-references, to generic code which assumes null-aware references. (And most programmers don’t write much generic code.)

The last 2 cases in the table above require a bit of explanation: they say that type arguments are ‘coerced’ to be non-nullable. That means that users of the generic class are allowed to supply implicitly-null type arguments, but the compiler rewrites them to the equivalent non-nullable reference type when instantiating the generic class.

For example, dictionary keys are not allowed to be (implicitly) null, while dictionary values are. The IDictionary type could be declared like this:

public interface IDictionary<KeyType, ValueType>
    where KeyType!
    where ValueType: ~null

Keys are constrained to never be implicitly null; Values are allowed to have any value whatsoever, including null.

Legacy code (using the Old Syntax) would still be able to instantiate a dictionary type using its existing syntax:

// We’re using Old Syntax, so this is equivalent to writing
// “new Dictionary<string|null, int>()”
// …But the first type parameter is coerced to “string!”.
var dict = new Dictionary<string, int>();

// Even though the first type parameter is non-nullable, and hence
// the first parameter of ‘put’ is non-nullable, legacy callers are
// allowed to pass implicit nulls, —which will result in a runtime
// exception at the call-site:
var myKey = (string)null;
dict.put(myKey, 22); // ← Throws NullArgumentException here

Summary

We’ve shown how we can enhance the strength of library code, in a backwards-compatible way: new code gets strong static guarantees; old code gets these strong guarantees enforced at the boundary of the old and the new code.

Existing code continues to compile and run with its existing semantics, but can be gradually migrated to:

stronger types, using the backwards-compatible Old Syntax, and
eventually migrated to the more succinct New Syntax, possibly a single file at a time.

Old, compiled code continues to run, even against upgraded, null-aware libraries.

And generic code can be written which accepts either or both implicitly- and non-nullable types.

The next post will pull all of this together into a single (fairly informal) specification.

Andrew’s Mental Dribbling!

For long lost friends and stalkers