Typed collection keys

Preamble

I seem to write on this theme a lot: how to leverage the type system of modern programming languages to reduce programming errors, and make application code less verbose. Well, this is another example of that, and it’s a technique I find myself using quite often in Java & C# to really simplify code, increase type-safety (i.e., reduce the opportunity for dumb errors) and make business logic more understandable.

The gist is that if you do the same thing every time you access particular keys/variables/lookup items, then you should encapsulate these actions within the key, rather than spreading them throughout the code.

It’s not an original idea, and we’ll point to some examples in a future post.

Motivating example:

Let’s say you have a map of strings to objects (in Java: Map<String, Object>), and you’re storing in it values of several different types:

‘sessionId’: Long
‘username’: String
‘loginDate’: Date object (representing a point-in-time)
‘userStaffId’: Integer (optional)

And let’s say that there could be a bunch of other such keys too—dozens or hundreds—it’s an open-ended list.

This situation occurs with session attributes, and configuration settings, and key-value databases.

If it is a small, relatively-fixed list of keys, the solution might be a wrapper class around the Map, with appropriately-typed getters and setters for each value. But the keys are an open-ended set. The code to retrieve a session-id might be:

long sessionId = (Long)map.get("sessionId");

Let’s imagine that we’re doing that a few times throughout the codebase.

And to set it:

map.put("sessionId") = /* generate the session id somehow */

Already, there are a couple of problems with this code, so let’s address them one at a time.

The string literal

A simple (and obvious) one to start:

We’re repeating the same constant strings throughout the codebase. This is error-prone, and makes the code fragile & hard to change.

It’s error-prone—because a typo at any of the string literals will never be caught at compile time, but will cause the code to fail at runtime.

It’s fragile—because these errors are easy to introduce accidentally.

It’s hard to change—because changing any key-name requires changing it at many places throughout the code.

So the obvious first step is to define each of the key-names we’re using as string constants:

public class Keys {
  public static final String SESSION_ID = “sessionId”;
  public static final String USERNAME = “username”;
  // etc.
}

And we change our access code to make use of the string constants:

long sessionId = (Long)map.get(SESSION_ID);

That’s an improvement. It’s one that you’d expect most people to make. Fairly uncontroversial.

However, there is another bit of repetition & fragility in the code as it stands, and it’s not immediately obvious how to refactor this one:

The type of each key

The repetition is the cast-to-long (or whatever the type of the key is) at each access point. This repetition leads to the same issues as the literal strings: fragility, error-proneness & difficulty in changing the code.

It’s error-prone—because a wrong cast at each any access point will never be caught at compile time, but will cause the code to fail at runtime. For example, there’s no compile-time guarantee that we’re casting to the same type when reading the value as the type we wrote to the key.

It’s fragile—because these errors are easy to introduce accidentally.

It’s hard to change—because changing any key’s type could require changing it at many, many places throughout the code.

Additionally, the repetition of the type casts—which is inherent to each key—is nevertheless written out explicitly each access point, and so clutters the code.

The idea

So can we somehow encapsulate the string key name with the type of the variable…?

Perhaps something like:

class Key<T> {
  final String keyName;
  final Class<T> valueType;
}

We can then construct a mechanism for getting and setting these variables:

static class KeyMapAccessor {
  static T getFrom<T>(Key<T> key, Map<String, Object> map) {
    Object untypedValue = map.get(key.keyName);
    return key.valueType.cast(untypedValue);
  }

  static void setIn<T>(Key<T> key, Map<String, Object> map, T value) {
    map.set(key.keyName, (Object)value);
  }
}

If we then declare SESSION_KEY as a variable of type Key<Long>, we can get its value like this:

long sessionId = getFrom(SESSION_ID, map);

And set it like this:

setIn(SESSION_ID, map, generateSessionId());

We’ve accomplished two big improvements here:

Removed the visual clutter of type conversions from the ‘getter’ code.
Made accesses of the value consistently typesafe. (This applies to the getter and the setter.)

Just to drive that last point home, you would get an error at compile time if you wrote:

setIn(SESSION_ID, map, "string value, should be a long!");
// or
int sessionId = getFrom(SESSION_ID, map); // int ≢ long

If you ever needed to change the type of a keyed variable, for example, changing the session ID from a Long to a GUID, you’d change the declaration of SESSION_ID, and the compiler would point out all the places in the code that needed to change.

Alternatively

Another, and common, approach to achieving type-safety is to wrap the (untyped) map in a typed wrapper, and provide accessors for each of the values. For example:

public class SessionMap {
  private final Map<String, Object> underlyingMap;
  private static final String SESSION_ID = "sessionId";
  // etc

  public Long getSessionId { return (Long)underlyingMap.get(SESSION_ID); }
  public void setSessionId { underlyingMap.set(SESSION_ID, value); }
  // etc
}

That’s very viable, especially when the number of keys/variables is bounded and/or small.

However, as soon as the number of keys is open-ended—or the keys don’t belong together, and putting them all in the same class would entail mixing of concerns—this becomes a less practical solution.

Summary & next steps

We’ve looked at the general model of associating type conversion logic and type information with a collection key name.

This technique can:

Increase code clarity, by moving type conversion code out of the business logic.
Increase code safety & robustness, by removing the fragile repetition of type conversion code throughout the codebase.

There are a few other variations on the technique that I might look at in future blog posts.

Strongly-typed database ids

1 Reply

More adventures in strongly-typed database ID fields. A follow-up (of sorts) to a post from 2013.

In that previous post I described a way of adding strong typing to database ID references in C# code, without really any runtime overhead, and interoperability with existing code which passes database IDs as integers. This post presents a refinement which is more flexible, and produces less cluttered code.

Background

In a lot of database-heavy apps, at least the ones I’ve been involved in, you spend a lot of time passing database IDs around in the code. Usually these are integers (32- or 64-bit), but they could also be UUIDs or strings.

The trouble is that an integer representing a customer ID has the same static type as an integer representing a user ID, invoice line ID, product ID, or for that matter an integer representing a quantity. The compiler will not complain at you when you pass an integer representing a user ID to a function expecting an integer representing a customer ID—because they are all just undifferentiated integers.

So it’s an appealing idea to somehow introduce static type checking for database entity IDs. Of course, we should avoid bloating the code or introducing any runtime overhead and it should easily interoperate with whatever the native key type is for the database entities.

Ideally the scheme should even cope with composite primary keys, though in my experience composite primary keys are pretty rare (at least when using an ORM which doesn’t directly expose joining tables).

Previous approach

In ID: Type-safety in database code, I described a C# generic struct type, ID<> for representing strongly-typed database IDs. It worked, but had the following shortcomings:

It was verbose: ID types look like ID<Customer> or ID<Invoice>, which is awkward to type and visually messy.
It was limited: It assumed that database IDs are always 32-bit integers. Different types of keys—for example, some tables with string keys and others with integers— cannot be mixed in a single project without creating multiple, different-named ID classes.

On the positive side:

IDs were ‘struct’ objects, and hence caused zero space overhead and minimal speed overhead.

New approach

Ideally key types would be named EntityName.Id, but how can we do that while keeping them as structs, and without requiring each entity to redefine its own Id struct?

The answer is to make it an inner type of a parameterised Entity class (parameterised by database key type and Entity subtype). Subclasses instantiate the parameter types, and get an Id struct type strongly typed with respect to their key and Entity type.

ID types now look like Customer.ID or Invoice.ID—which is visually less noisy, and puts the entity name first.
ID (entity key) types can be anything—Int32, Int64, String, anything—which implements IEquatable.
Entities have an ‘Id’ property which is of type EntityName.ID.
Entities have a ‘Key’ property which is of the underlying primary key type.

The downside is that all entity classes must inherit from the same Entity<> base class in order to be able to have-strongly typed ID types. However, since the entity ‘knows’ about its ID type, it can expose an Id field of that type.

It’s possible for many entities which share the same underlying key type (and key field name) to inherit from a common subclass of Entity, specialised to their key type.

The Code

// base class of all entities:
public abstract class Entity<K, E>: Entity<K, E>.IDOrEntity
	where E: Entity<K, E>
	where K: IEquatable<K>
{
	public ID Id => new ID(Key);

        // Subclasses must implement this:
	public abstract K Key { get; set; }

	public bool IsNot(Entity<K, E> other) => !Is(other);

	public bool Is(Entity<K, E> other) => this.Id == other.Id;

	// Union type of Entity and ID
	public interface IDOrEntity
	{
		ID Id { get; }

		K Key { get; }
	}

	// The ID type, unique to the Entity type:
	public struct ID: IEquatable<ID>, IDOrEntity
	{
		private readonly K _key;

		public ID(K key)
		{
			this._key = key;
		}

		public K Key => _key;

		public ID IDOrEntity.Id => this;

		public override bool Equals(object obj) => this == (obj as ID?);

		public static bool operator !=(ID first, ID second) => !(first == second);

		public static bool operator ==(ID first, ID second) => first.Equals(second);

		public bool Equals(ID other) => this.Key.Equals(other.Key);

		public override int GetHashCode() => Key.GetHashCode();

		public override string ToString() => Key.ToString();

		public static implicit operator ID(K value) => new ID(value);
	}
}

You’ll notice that there is one abstract property on Entity: Key; this represents the entity’s (primary) key as its underlying type. Making this abstract allows subclasses to decide how they want to store all their fields—the Entity class itself does not store any state.

(It might be possible to make this Key field protected.)

Examples

var cust = new Customer();
var cust1 = new Customer();
var custId = (Customer.ID)89;
var order = new Order();
public bool CheckCustomer(Customer.ID id);

var ok = CheckCustomer(cust.id);
// var doesNotCompile1 = CheckCustomer(order.id);
var idsMatch = custId == cust1.id;
// var doesNotCompile2 = custId == order.Id;
var sameEntity = cust.Is(cust1); // Compares Id values for equality.
Customer.ID custId = 33; // Allowed (if key type is integer).
// Customer.ID custId2 = order.Id // Not allowed (no matter if they share key types).

Entities which use the same key type/key name

I have a few line-of-business applications most of which use 32-bit integer entity IDs. Table key names are almost always ‘Id’, and they use Microsoft’s EntityFramework for database access. We can abstract the common bits of the 32-bit-ID database entities like this:

// All (or most) entities in the application inherit (directly) from this:
public abstract class BaseEntity<E> : Entity<int, E>
	where E: BaseEntity<E>
{
	[Key("Id")]
	public override int Key { get; set; }
}

This says that all inheriting entities have an Int32 key field (and hence an ID type based on ints), represented in the database as a field called ‘Id‘.

Accepting IDs or entities

As with my previous approach, it includes a mechanism for methods to receive as parameters objects which can be either an ID or a whole entity.

This is useful because frequently business logic already has an entity object, and it’s a useful optimisation for called methods not to have to retrieve the same entity again from the database.

We specify an interface to represent the union of an Entity type and its corresponding ID type, called EntityName.IDOrEntity. Entities and their IDs implement this interface, and an extension method on the interface, GetEntity(Func<ID, Entity>), provides a mechanism to either return the entity, or to look up the entity from its ID.

In other words, if you provide an entity, the method can use it directly; if you provide just an ID, it can look up the entity itself.

public static class IDOrEntityExtensions
{
	public static EntityType GetInstance<K, EntityType>(
		this Entity<K, EntityType>.IDOrEntity idOrEntity,
		Func<Entity<K, EntityType>.ID, EntityType> getter)
		where EntityType : Entity<K, EntityType>
		where K : IComparable
	{
		return (idOrEntity as EntityType)
			?? getter(idOrEntity.Id);
	}
}

Summary

It’s only a single class (plus one extension class), but it provides a nice (and simple) mechanism for enforcing a bit more compile-time safety on a database application.

The new version is more intuitive too and makes the code clearer and cleaner.

Non-nullable reference types in C# & .NET

Abstract

In the C# programming language, there are two kinds of data types: value types and reference types. Reference types can hold either pointers to objects, or the special value null, which is used to indicate a ‘missing’ reference. This feature of implicitly allowing nulls in reference types has some well-recognised problems and the .NET team are already exploring ways of enforcing non-nullable reference types; this paper explores one possible design. Continue reading →

Nullable reference types in C#: backwards compatability

3 Replies

C# Non-Nullable Types: Backwards Compatibility

In previous episodes I proposed a set of requirements for non-nullable reference types in C#/.NET, and a proposed design, including their interaction with generic types (polymorphic types).

There is also something to be said about backwards- and forwards- compatibility:

How can legacy code — using implicitly-nullable reference types — be migrated to using non-nullable references without causing disruption?
How should legacy code interact with APIs which accept or return non-nullable reference types?
What are the implications of generic code to backwards compatibility?

Continue reading →

Ionic mobile development, growing pains

Andrew’s Mental Dribbling!

For long lost friends and stalkers