In this relatively non-technical post, I want to show you a simple yet surprisingly deep language design problem our team tackled. I remember the thought-provoking discussion that we had surrounding said problem and I believe this memory is worth sharing for two reasons.
First, you will see an example for how Duckling meets its priorities.
More importantly though, you will catch a glimpse of how we decide on the design of the language. In particular, you will see the reasoning behind our choice regarding one of the most commonly used elements of Duckling's syntax: variable declarations.
Why declare variables?
For our team at DuckType it is clear as day that enforcing explicit variable declarations is a desirable trait of a general-purpose language. It brings structure to the code, making it easier to analyse for both the reader and the compiler.
Let's see what happens when a language does not enforce variable declarations, taking Python as an example.
A silly mistake
Consider the following code snippet in Python.
if input("y/N") == "y":
x = 42
else:
pass
print(x)
This program asks the user for some input, and if is is equal to "y"
, then the variable named x
is assigned the value 42
and printed. However, if the user inputs anything else, x
remains without value, and an error (featured below) is thrown.
Traceback (most recent call last):
File "/no/path/to/see/here/test.py", line 6, in <module>
print(x)
^
NameError: name 'x' is not defined
Python does not provide a guarantee that a name is defined when it is used. This is a silly reason to have a program occasionally crash. Clearly, it is not very smart to consider such programs valid (i.e. error-free).
A realistic problem
The example featured above is somewhat unrealistic. It breaks a couple of sanity rules which programmers typically adhere to, even when they are not enforced. To consider a more grounded scenario, let us assume that we are working with a Python-like language which enforces some light static checks:
- The first assignment to a name counts as its declaration.
- The name cannot be used outside of the scope in which it has been declared (for example, outside of the
if
branch in the example above).
Essentially, let's say we are dealing with a typical language with variable declarations, enforcing variable initialisation, but without static typing or variable declaration keywords like val
, var
, let
, etc.
Even in such a relatively strict language, we may encounter problems. Barring typing hiccups, one issue which immediately comes to my mind is accidental shadowing, or perhaps worse: accidental not-shadowing. To put this in a realistic context, consider the following piece of Python-like code:
class BigResultComputer:
result = 0
# A lot of omitted code...
def computeSomething():
result = 42
log(result)
return result
# More omitted code...
In this example, we deal with a big class called BigResultComputer
which has the task to... well... compute something big. It has a class attribute1 called result
and a large definition. Now suppose that a programmer comes along with the intent to change the algorithm of the class by introducing some sub-computation in the computeSomething
method. It is only natural that the result of this sub-computation be stored in an aptly named result
variable.
Uh-oh. There are two mutually exclusive things which the programmer may want to achieve with this:
- The sub-computation should overwrite the
result
field ofBigResultComputer
, or - The sub-computation is a very small part of the whole algorithm and the local
result
field should shadow the class field.
Both possibilities are rather common and so there is no obvious way to interpret this piece of code. A further change to the programming language is necessary.
A convenient solution?
One remedy is to make it necessary to access object fields explicitly via a reference to the object. In the case of Python, this reference appears as the first argument to a method, and is conventionally (although not necessarily) called self
.
This is a neat and theoretically perfect solution. But in practice I know for a fact that I am not the only person who gets irrationally angry when I forget to precede all my field accesses with self
for the tenth time in a single day. For example, this is how I might attempt to write a method of a Rectangle
class which describes the shape.
class Rectangle:
# Some omitted code...
def describe(self):
print(
f"Rectangle with dimensions {width} by {height} has:\n"
f"- Area: {width * height}\n"
f"- Perimeter: {2 * (width + height)}\n"
f"- Aspect Ratio: {width / height}"
)
There are 8 missing self
references in these few lines of code. After I notice the error, I have to go through each occurrence one by one and painstakingly correct it. Occasionally I even experience semantic satiation and the word "self" loses all meaning to me for a moment. Call it a skill issue if you want — I admit that I do not code in Python often.
Another solution is to make variable declarations explicit with dedicated syntax.
This is embraced by many popular general-purpose languages, including C, C++, Java, Scala, Kotlin, Rust, and even JavaScript, in a sense. Explicit declarations have the additional benefit of providing the perfect place for specifying the type of a variable, its mutability, and any other desired properties. Here's how it works in C++:
struct Rectangle {
// Clear list of the object fields.
double width;
double height;
// Constructor — some linting policies
// would forbid these shadows.
Rectangle(double width, double height):
width(width), height(height) {}
// Method to describe the rectangle.
void describe() const {
// Simple use of object fields.
std::cout
<< "Rectangle with dimensions " << width
<< " by " << height << " has:\n"
<< "- Area: " << (width * height) << "\n"
<< "- Perimeter: " << (2 * (width + height)) << "\n"
<< "- Aspect Ratio: " << (width / height) << "\n";
}
// Method to construct a modified copy.
Rectangle expand(double by) {
// Admittedly poorly named variables,
// but clear indication of shadowing.
double width = this->width + by;
double height = this->height + by;
return Rectangle(width, height);
}
};
The above code demonstrates how shadowing is clearly indicated. Explicit declarations contribute to legibility and maintainability by enforcing a single source of truth about a variable. Duckling follows this path.
Denoting mutability
All the languages mentioned above, except for Python, provide some way to denote the mutability of variables, i.e. whether the value of a variable is allowed to change during the program's execution. This is also a functionality we want to see in Duckling, so let's examine what is already present on the market.
Established practices
C and C++ denote the type of a variable with the const
keyword to specify that the value of that variable cannot be changed by accessing it through that name. Java uses the final
keyword in a similar manner, but with much lesser guarantees. In Java, since all values are either simple types or references, immutability is never transitive.
int mutable_int = 0;
const int immutable_int = 42;
On the other hand, Rust assumes by default that a variable is immutable. In order to indicate otherwise, a variable's declaration must additionally use the mut
specifier.
let mut mutable_int : i32 = 0;
let immutable_int : i32 = 42;
Notice that C, C++, Java, and Rust all assume some default behaviour. They expect the programmer to make an effort if different behaviour is desired. This inconvenience creates an incentive to stick to the default. Rust makes sure that the programmer is aware of all the variables which may change in value by making it a conscious decision to allow mutability. The other three languages generously provide a mechanism to maybe prevent the programmer from shooting themselves in the foot (Java much less than C and C++), if the programmer really cares to use them.
Finally, Scala and Kotlin do not assume a default. These two modern evolutions of Java2 choose to use the var
keyword for mutable variables and the val
keyword for immutable values. This way, they do not create an incentive to use either, apart from the inevitable hints provided by the IDE nagging the programmer to change var
s to val
s where possible.
And JavaScript... well... it has three different keywords used for declarations. It nearly avoids creating any incentive in either direction, with const
being only two characters longer than var
and let
. But it's all sorts of weird and we better not stare at it too closely, lest it stare back.
The decision
Now that I have familiarised you with around 1500 words worth of setting, it is time to recount how the team decided on Duckling's handling of variable declarations. I hope I didn't overhype this part.
Our initial instinct was that we want to incentivise good habits — we care about the ergonomics of programming languages, and how humans interact with them. We are obviously not the only ones. Let's see a couple of examples from the industry.
For 30 years now, Java has had "checked exceptions", which forces programmers to clearly indicate which exceptions can be thrown by a function. While enforcing exception awareness seems like a helpful feature on the surface, it turned out to be largely unpopular.
Scala 3 introduces "open classes", which are supposed to help control class extensions by warning if a class which isn't marked open
is extended. This stands in contrast to the typical approach of making the programmer remember to mark their classes final
(which they won't). I personally believe this is a great idea, but weirdly, I only learnt about this feature when writing this blogpost. Its introduction seems to be hindered by backwards compatibility, and I found it surprisingly difficult to enable.3
Finally, as mentioned earlier, Rust incentivises strict control over variable mutability by making immutability the default. Again, this is opposite to the standard practice of having the programmer remember to mark their variables const
or final
(which — again — they won't). As opposed to the previous two examples, however, Rust has seen a lot of success with this approach.
When considering the ergonomics of Duckling, we were heavily inspired by some of these ideas. Observing the undeniable helpfulness and verified success of Rust's strategy, we wanted immutability to be the default. We intended for mutability to require a bit more conscious effort from the programmer.
We quickly noticed, however, that this is inconvenient.
OK, yes, of course it is… but in a slightly different way than you might expect. In our opinion it was totally fine, and even desirable, to incentivise immutability by default in large projects. Making mutable variables inconvenient in this context was a great idea.
Unfortunately, we don't want our language to be usable only in large projects. Our hope with Duckling is that it will be easy to use in scripts. We want our language to scale well to large codebases, but keep prototyping reasonably frictionless. This is one of our intended selling points, and since they represent our gripes with many contemporary languages, we care about Duckling meeting these requirements. We are doing it for other programmers, but also for ourselves.
So let's take a step back and ask the question: how do we incentivise a programmer to keep their variables immutable in large projects, but not inconvenience them when they are experimenting with a script? Clearly, making a mutable variable declaration require less work than an immutable one would go against the first requirement. But making it less convenient would contradict the second.
Since we must keep both options equally convenient, we opted to use two three-letter keywords for the two different kinds of declarations. My personal favourite are those used in Scala and Kotlin, but my suggestion to copy them was assertively overruled, because var
and val
"look too similar" and "would be confusing"… whatever that means. Instead, Duckling will use the keywords var
and let
, respectively. We also predict a const
keyword for constants which are computed at compile time and cannot ever change.
var mutable_int : i32 = 0;
let immutable_int : i32 = 42;
const GLOBAL_CONSTANT : i32 = 255;
There is one remaining issue: how does this accomplish the goal of incentivising a programmer to use immutability by default in large projects? To tell the truth, it doesn't.
We instead keep in mind that modern tools are very happy to point out unnecessary mutability. There are few things more frustrating than the incessant nagging of a compilation warning or a yellow squiggle in the editor. And there are few things more satisfying than getting rid of said frustrations. We leave it to these tools (some of which we are developing ourselves) to incentivise immutability. In extreme cases, we leave it to company policy to enable compiler options which turn warnings about unnecessary mutability into errors.
-
In Python, the
result
field would be a class attribute, shared between all instances of the class. While this may be considered strange, I implore you to either suspend your disbelief in favour of the example, or pretend that the language used is actually not Python, but some weird amalgam of Python and a derivative of C++ or Java, where object fields are declared in the class body. ↩ -
Javalutions, if you will. ↩
-
The official Scala docs state that open classes should be enabled by default starting in Scala 3.4. Perhaps I am misunderstanding something, but despite working with Scala 3.6.4, I only saw the warning with my own eyes after using
scalac
directly and following advice to enable the-source future
flag from this Medium article. Ultimately, however, I failed to get the warning to appear inside IntelliJ. ↩