*The Java Programming Language and JVM, that is. Sorry for the very pompous title, but I want to look at what—with the benefit of modern best-practice knowledge and hindsight—could have been different and better about 1995’s Java Programming Language.
Honestly, this post feels a bit unfinished. I still agree with it, but it’s a bit of a list without a start, and end, or… much of a point. Anyway, to enjoy, and maybe I’ll pull together more of a coherent argument in a future post.
Motivation
“There are only two kinds of languages: the ones people complain about and the ones nobody uses.”
—Bjarn Stroustroup
(God, that quote is overused.)
Java is a very, very successful programming language by any measure. It’s also one that many smart motivated people have attempted to improve or reinvent. (See Scala, Kotlin, C# and other languages.)
What I want to look at in this post is not how I would reinvent Java or the JVM nowadays, but how Java might have been different if its creators had known then what we know now about:
- How Java was and is used in practice;
- What modern Java best-practice looks like;
- How the language went on to evolve;
- Which original features turned out to be practically bad (or good) ideas.
I don’t want to expand the original scope of the language. For example: Java did not get parameterised (generic) types until version 5 of the language in 2004, probably to keep the initial version of the language and its virtual machine simple. So a reimagined version of Java in 1995 wouldn’t necessarily get them (until a later version) either.
I also don’t want to introduce any ideas which could not have reasonably be predicted or implemented at the time. For example, there has been loads of language research into type systems for safer concurrency. Java’s original inventors did not have the benefit of that knowledge at the time.
So we’re going to create an alternate universe with some knowledge from 2021, but we’re not going to create a language which would have been unrecognisable or anachronistic in 1995.
Essentially we’re going to examine the ‘bets’ that the original language developers made, and reevaluate them—and see what alternative-universe Java emerges.
Sketch
Syntax and surface detail
Octal literals—really?!
In v1, Java had the same way of writing numeric literals as C did, including some of its weird historical quirks.
For example, 027
is interpreted as octal 27 (= decimal 23) instead of, as you might naively expect, being interpreted the same as ‘27
’ (which is decimal).
This is a crazy, obscure, source of bugs. Nobody nowadays really uses octal, certainly not in everyday programming. For low-level programming, hexadecimal is often useful, and hexadecimal constants are clearly distinguished by an unambiguous prefix, 0x
, so 0x27
is visually very distinct from 27
, whereas the octal prefix is silent and deadly.
This is an example of wrongheaded consistency with what came before.
I’d argue that the designers of Java should have known better, even in 1995. If they wanted to support octal constants, 0o
would have been a better prefix (e.g., 0o27
), and if they wanted to prevent C programmers from inadvertently thinking they were writing 0
-prefixed octal constants, they could have disallowed numeric constants which begin with ‘0
’ (or issued a compilation warning).
And while supporting octal constants prefixed with 0o
, they could have usefully added binary numeric literals prefixed by 0b
: That would have been useful for low-level code. 0b010111011
—and easy to implement at the time.
Modern language which have adopted exactly this very sensible syntax include ES2015 JavaScript, Python, Swift, Rust, Ruby, Haskell…
Types before values (prevent eliding the types).
Every modern language (from the last 10 years) uses a variable typing syntax of the form var someName : TypeName
— unlike Java’s TypeName someName
— because: it’s more regular, and easier to teach, and clearer, and easier to parse, and, if in a later version of the language we were to introduce type inference, we could simply omit the type part and keep the rest of the syntax the same.
Java followed the syntax from C, I guess to keep the familiarity of that language. Though it’s so different in so many other ways that, honestly, I don’t that that detail would have affected its initial popularity at all.
Fields, Properties and JavaBeans
Ugh! ‘getAge()
’, ‘setOffset()
’. They clutter up Java programs. C# got this so, so much righter than Java did. The useless verbs ‘get’ and ‘set’, and the associated method-call syntax, add immeasurable visual noise to most substantial Java code bases.
Java chose the path of inventing fewer concepts—using methods to alter state, instead of a special new ‘property’ syntax. But our experience from the last 30 years shows that, actually, ‘properties’ are a highly used and useful concept, and a special syntax for them pays for itself.
Also, we know now that fields should practically never be public
, and that any exposed values (exposed either to other classes, or to subclasses) should be exposed via an abstraction (‘property accessors’), so building this into the language as a first class abstraction would have been a wonderful service to the world.
Lack of unsigned primitive types.
I know this causes quite a bit of problem to some (lower-level) users of Java, and it seems like it would have been a fairly straightforward problem to solve back in 1995.
Just as we tend to think of bytes as being unsigned quantities… It seems like it would have been not unreasonable to define integer types as:
Bit size | Signedness | Java name | C# name |
---|---|---|---|
8 | u | – | byte |
8 | s | byte | sbyte |
16 | u | (char ) | ushort |
16 | s | short | short |
32 | u | – | uint |
32 | s | int | int |
64 | u | – | ulong |
64 | s | long | long |
C# got this completely right. ‘Bytes’ are (conventionally) unsigned quantities, 0–255, but C# does also allow signed bytes, -128–127. The other integer sizes are signed by default (as per the C language), but unsigned versions are available. It’s consistent and complete.
One additional quirk of Java is that the char
type is supposed to represent a Unicode code-point, but it behaves just like an unsigned 16-bit integer. You can add, subtract, multiply and divide them (and use bit manipulation on them…) so in Java char
types can be and are used in arrays of unsigned 16-bit integers. Which encourages unscrupulous programmers to use the char
type as a general 16-bit numeric type (in a similar way that char
is a synonym for ‘byte’ in the C language). That just makes programs harder to understand.
Safety
No overflow checking by default.
This is a strange one. Java was quite radical at the time, in that every single object dereference implies a null check of the reference, for safety.
However, all arithmetic in Java is modulo, or ‘clock’ arithmetic. If you add 1 to an integer with value Integer.MAX_INT
, the result is not a failure, but is Integer.MIN_INT
; the value wraps around.
This is a source of rare, unusual bugs, and security failures.
In retrospect, it would have been much cleaner to make all arithmetic fail on overflow, with alternative operators, or intrinsic methods, to perform wrapping arithmetic for when that is required.
The Zig language includes the alternate arithmetic operators %+
, %-
, %*
, %/
which wrap instead of overflowing, and defines the ‘normal’ arithmetic operators to fail on overflow.
Inheritance defaults are the ‘wrong’ way around.
We now know that C# got this right, and Kotlin got it better.
Generally inheritance is overused, and most classes should be final and most methods should be final. Most modern guidance on writing OO code stresses composition over inheritance—and the open-closed principle.
Would the original designers of Java have understood this? It’s hard to say; the influence of Smalltalk (where every class is open for modification) was strong in Java.
But maybe we can bring a stack of mid-2010s programming books back in time to convince them.
What I’m saying is: Java should have done what Kotlin does: classes and methods should be implicitly ‘final’, and overridable methods should be explicitly annotated with ‘open
’.
Next time…
Mutability, error-handling, null-safety, oh my!
Some billion-dollar mistakes that might have been prevented.