In my previous post, I outlined a list of requirements for non-nullable (and explicitly-nullable) reference types in C#. In this post we’ll dive into some further design decisions. Subsequent posts will look at the impact on generic types, plus backward-compatibility and some corner cases.
Expanding the type system
We add two new main concepts to the type system:
- non-null reference types T, denoted as ‘
T!
’, and - explicitly-nullable reference types, ‘
T?
’.
For example, strings:
string! notNullable; string? explicitlyNullable;
These we’ll call ‘null-aware’ types. The traditional, legacy, old-fashioned reeferences we’re all used to, which can implicitly be null, we’ll call ‘implicitly-null’ types. (Occasionally I’ll denote them here as ‘T|null
’ just to be unambiguous, for example, string|null
to denote the implicitly-nullable string type).
The main rule is that a slot of reference type T!
cannot have the value ‘null
’. (Whereas reference type T|null
can of course have the value ‘null
’ in addition to any value of T!
.)
‘T?
’ means precisely the same as ‘Nullable<T!>
’.
Some examples of usage:
// Declare a function which deals with non-null values:
public string! Truncate(string! input, int maxLength) {
// Guaranteed to never throw a NullReferenceException here:
return input.Length <= maxLength ? input : input.Left(maxLength);
}
string! a = "Bob";
string? b = "Bob";
string c = "Bob";
//Truncate(null, 10); // Illegal. Does not compile.
var r1 = Truncate("", 10); // Allowed
var r2 = Truncate(a, 10); // Allowed
//Truncate(b, 10); // Illegal
var r3 = Truncate(b.Value, 10); // Allowed, but may throw exception at runtime.
var r4 = Truncate(c, 10); // Warning. May throw null-ref exception at runtime.
As you can see in the example of ‘c’ above, you can pass an implicitly-null type (T|null
) to a function (or property) expecting a T!
. (This rule is important for backward-compatibility.) However, if the caller tries to pass a null value at runtime, the environment will automatically throw an ArgumentNullException
on behalf of the method, so that the null is never actually assigned to the parameter.
This allows legacy APIs to have their parameters changed from implicitly-null (|null
) to non-null (!
) without breaking source compatibility of old code. (In addition, the compiler could do some basic flow analysis to check that no obvious nulls will be passed to the function, and raise warnings as appropriate.)
Apart from this rule for method calls, T|null
values cannot be assigned to T!
variables. (The other way around is fine though):
string! nonNull = "Bob";
string? explNull = "Jeff";
string implNull = "Sam";
implNull = nonNull; // Allowed
implNull = explNull; // Allowed
//nonNull = explNull; // Disallowed (obviously!)
//nonNull = implNull; // Disallowed
explNull = nonNull; // Allowed
explNull = implNull; // Allowed
Some extra rules
Type T!!
is the same as T!
Type T?!
is the same as T?
ValueType!
is the same as ValueType
In other words, the !
suffix on a type which is already null-aware has no effect whatsoever.
A new operator: !
C# of course already has a prefix Boolean operator ‘!
’, which performs a logical ‘not’. This new !
operator is a postfix operator and means ‘assume not null’:
var notNull = possiblyNull !;
When applied to an explicitly nullable expression it returns ‘expression.Value’ (which may throw an InvalidOperationException
if the value is null).
When applied to an implicitly null expression it returns the result of the expression typed as not-nullable. If the value is null, throws a NullReferenceException
:
var obj = (object)…;
var r0 = obj!; // r0 is of type ‘object!’. Throws exception if obj == null
var r1 = obj!.ToString(); // Throws NullReferenceException if obj == null
var r2 = obj.ToString(); // Precisely the same behaviour as r1 example.
When applied to a non-nullable expression it is a no-op. (The compiler may emit a warning, “Redundant ! operator”, since the !
could be safely removed without altering the program semantics.)
For example:
string? a = null;
string? b = "B";
string c = null;
string d = "D";
string! e = "E";
var r1 = a!; // throws exception at runtime
var r2 = b!; // r2 == "B"
var r3 = c!; // throws exception at runtime
var r4 = d!; // r4 == "D"
var r5 = e!; // r5 == "E" (& compiler warning, “redundant ! operator”)
// In all cases r1…r5 are inferred as ‘string!’
Rather than using the !
operator, the same operation can be written as an explicit type cast:
// Same semantics as ‘implicitlyNull!’: nonNull = (string!)implicitlyNull; // Same semantics as ‘explicitlyNull!’ or ‘explicitlyNull.Value’ nonNull = (string!)explicitlyNull;
You could do a type cast and non-null coercion in a single operation:
var obj = (object)"Hello world"; var nonNullString = (string!)obj; // Could throw NullRef or ClassCastException
Implicitly-typed local variables
Implicitly-typed variables become slightly tricker, for reasons of backwards compatibility with older versions of C#:
var d = "Bob"; // Is this inferred as a ‘string|null’ or ‘string!’ ?
The inferred type of ‘d
’ above should generally be (non-nullable) string!
. However, this could break legacy code if the code expects it to be inferred as an implicitly-nullable string and subsequently assigns a null
to it (d = null;
). Therefore the type inference looks at all the places which assign to ‘d
’. If any of them could assign a null, or an implicitly-null value, ‘d
’ is typed as (implicitly-null) ‘string|null
’, otherwise it’s inferred as string!
.
This type inference rule may occasionally be too complex for the compiler, in which case it will infer as ‘implicitly null’ by default. If this happens, you can help the compiler by declaring it explicitly non-null like this:
var d = "Bob"!;
(That’s not doing anything very special; just using the !
operator to ensure that an unambiguously not-null result type is inferred for the string expression.)
In a later article we’ll suggest a way to tell the compiler that the code is null-aware, and that such implicitly-typed literals should be inferred as not-nullable.
default(T)
The C# ‘default’ operator takes a type name and returns the default value of that type.
What is the default value of a non-nullable reference type?
For example, what value should ‘default(Stream!)
’ return? By definition it cannot be ‘null
’. And the language can’t just return new Stream()
, (not least because Stream
is an abstract type). Without null
, reference types simply do not have sensible default values.
We disallow default()
for non-nullable reference types; it is a compile-time error to write default(Stream!)
.
(This creates interesting issues for generic code, but I’ll cover that in a future article.)
A related issue is:
Missing values
Unfortunately, there are some situations in which non-nullable reference type variables cannot have a value.
Consider the case when you allocate an array with: “streams = new Stream![11];
”. That’s an array with 11 elements, each of which is a non-nullable Stream
object. What should the .NET runtime do when you ask for stream[2]
(before you’ve assigned a value to it)? It can’t be null, because we’ve excluded nulls from the array; it can’t be default(Stream!)
because default is not defined for non-nullable reference types. We can’t (in general) force the programmer to assign it before reading it.
There are a few situations where something similar can happen:
- A non-nullable reference instance variable in a class, where that variable has not (yet) been assigned a value by the constructor.
- A special case of 1. is where the
this
reference has ‘escaped’ from the constructor before the constructor has finished executing. (This is bad practice, but it’s quite possible to do accidentally or on purpose.) - A non-nullable reference instance variable in a struct: it’s perfectly possible to instantiate a struct without its constructor ever being called. (In fact, it’s really easy.)
- Arrays of non-nullable reference types (as per the example): initially all of the array elements will be unassigned.
The solution is to accept that some values in a live program may be ‘unassigned’. The language cannot realistically prevent programs from dereferencing ‘unassigned’ fields. The best we can do is to throw a ‘FieldUnassignedException
’ if the program attempts to read from an unassigned field. In fact, not only is that the ‘best’ we can do; it’s the only thing we can do without significantly changing the language.
How is this actually better than allowing uncontrolled nulls, and ending up with NullReferenceExceptions
in the first place?
We seem to have just substituted the old familiar NullReferenceExceptions
with newfangled FieldUnassignedExceptions
— but there is an important difference: unexpected null values can propagate far through a program before they eventually trigger a NullReferenceException
. The error (if it occurs) is far removed from the fault in the code which accessed the unassigned value. A FieldUnassignedException
happens as soon as you try to read the unassigned variable, therefore the program fails faster.
Further, it is easy to prove which bits of code could be prone to FieldUnassignedExceptions
. Programmers can eliminate them totally by ensuring that:
- Constructors assign all nullable-reference fields before returning and before calling any method which reads these fields, and before calling any virtual methods, and before passing ‘this’ as a parameter to another method, and before making ‘this’ visible to any other thread. (Phew!) These things are good practice anyway.
- Structs containing reference types should either make them (explicitly) nullable references, or include a flag to say whether the constructor was called.
- Array code should be carefully audited, and most code should avoid using arrays at all (which is good practice: there are better collection types available)— or at least avoid arrays of non-null reference types.
All of these cases can easily be identified and violations flagged as warnings by static analysis tools.
(It is very much easier to avoid a FieldUnassignedException
than a NullReferenceException
!)
Summary
We’ve covered an approach to introducing non-nullable and explicitly-nullable reference types to C# and .NET, including how to declare them, some of the new rules of assigning values and calling methods, and the consequences of uninitialised values in the face of non-nullable references.
There are a few other wrinkles to discuss: generic types, and backwards (and forwards) source and binary compatibility. I’ll cover these in future articles.
You’ve reinvented Spec#, Sing# (used for the Singularity operating system), and M# (used for the Midori operating system).
Not quite!
Spec# has non-null and explicitly-null reference types (and a bunch of unrelated stuff too). However Spec# is fundamentally a research system which doesn’t need to be compatible with the large amount of existing C# code. It also puts some heavy constraints on the programmer to guarantee that non-nullable references always have an assigned value when they’re read.
My proposal is designed to work with existing C# code and provide the advantages of programmer control over nullability, without entirely changing the language, breaking any existing code or putting an undue burden on the programmer.
(Note too that there are other proposals for adding explicit nullability and/or non-nullable reference types to C#, not just Spec#. Many of them start with ! to mean ‘non-null’ and ? to mean ‘explicitly nullable’, but then quickly diverge from Spec# and from my proposal.)
Pingback: Nullable reference types in C#, a digression | Andrew’s Mental Dribbling
Pingback: Nullable reference types in C#, some requirements | Andrew’s Mental Dribbling
Pingback: Nullable reference types in C#: backwards compatability | Andrew’s Mental Dribbling
Pingback: Nullable reference types in C#, generics | Andrew’s Mental Dribbling
Pingback: Non-nullable reference types in C# & .NET | Andrew’s Mental Dribbling