Typo

March 26, 2007 Programming

One of the great holy wars in programming concerns itself with “type systems”, usually in the sense of “static typing” versus “dynamic typing”, and from time to time it flares up again. Unfortunately, most of the loudest voices are quite content to argue without really understanding the subject, and so proceed to build straw-man-style arguments based on what they think they know. Most often this seems to be a result of inexperience — far too many people have only ever worked seriously with one type of language, and so have no practical understanding of how “those other languages” really do things.

So before you go jumping into the fray, do everyone a favor and familiarize yourself with the way things actually work. I’ll even help you out a bit with a basic overview, if you’d like to read on (and I do mean “basic” — normally I’m all for pedantry, but if I wanted to really do justice to type systems I’d be writing for weeks).

Typing doesn’t refer to your keyboard

At their most basic level, type systems are rules about types of data — things like numbers, strings, etc. — and what you can do with them. The canonical example of why a language needs to have some sort of rules about data types is a program like this (expressed here in pseudo-code):

x = "Hello";
y = 3;
z = x + y;

One of the most important questions answered by a type system is: what does this program do? If y was, say, the string “world”, it’d be easy to figure out: the value of z would be the string “Helloworld” (and, in fact, plenty of languages use + as a string concatenation operator to do just this). If x was a number, like 5, then the value of z would be the number 8. But here we’re trying to “add” a string and a number; how do we do that?

The types are strong with this one

One spectrum of type systems runs from strong typing to weak typing. With strong typing, the program above would die with an error; for example, if you try this in Python (a strongly-typed language), you’ll get back TypeError: cannot concatenate ‘str’ and ‘int’ objects (switch the order of the operands and it’ll be TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’ — putting the number first will cause Python to treat the + operator as meaning addition, not concatenation). With weak typing, the language may “coerce” one or the other of the values into a compatible type, and then do what you asked; for example, JavaScript will coerce y into the one-character string “3”, and then store the string “Hello3” in z.

And this is generally the biggest difference between strong and weak typing: strong typing will simply raise an error and refuse to perform an operation on a data type which doesn’t support that operation, while weak typing may try to munge it into something sensible. That probably makes weak typing sound dangerous, but keep in mind that weakly-typed languages nearly always have well-defined rules for how they’ll do this; learn the rules and you’ll be able to predict what your program will do.

And some languages also do bits of both; for example, consider this snippet which might occur in a C program:

int x = 3;
double y = 5.0;

What happens if you try to add x and y? They’re of different types, but C will let you get away with it by temporarily “promoting” x to a double in order to get compatible types (so you could do it with double z = x + y;, trusting that x would get promoted appropriately). There are specific rules which define which types can be “promoted” (sometimes you’ll hear it called “widening”, which is closer to what actually happens under the hood) in which circumstances, and what the result will be. C is sometimes labeled a weakly-typed language because of this and and a few other features (casting and pointer manipulation being the usual culprits, though the latter is more to do with safety than with weak typing), but most of its descendents — among them obviously strongly-typed languages like Java and C# — support this feature.

Similarly, Java and C# both have a distinction between “primitive” or “value” types (such as numbers) and “reference” types (such as classes), and support “boxing”, a method of performing an operation on a primitive type — by way of an intermediary object “box” — which are normally only allowed on reference types.

Generally, proponents of strong typing argue that it’s better because it’s “safer”; they claim that by raising errors on these types of operations, the language helps you avoid certain types of bugs (for example, if you inadvertently tried to add a number and a string, it’d be better to get an error than to end up with a value that wasn’t what you expected it to be, especially if you’re going to store that data or do other work with it), while fans of weak typing argue that the increased flexibility of implicit type conversion allows you to accomplish things with less code (since you don’t have to perform as many explicit conversions to get the correct types), and that a little discipline on the part of the programmer will work around the bugs strong typing is meant to prevent.

Static cling

The other major spectrum of type systems runs from static typing to dynamic typing, and in the vast majority of cases people who argue about “strong” and “weak” typing are really talking about these without realizing it. Let’s look at another example:

x = "Hello";
# ...some other intervening code...
x = 3;

In a statically-typed language, this code will raise an error. In a dynamically-typed language it’ll be fine. A good way to think of the difference is this:

In a dynamically-typed language, the value bound to a variable has a type (e.g., 3 is an integer, “Hello ” is a string, and so on), and nothing can change the type of that value, but a value of a different type can be bound to the same variable later on (assuming the language is not purely functional — pure functional languages disallow any form of reassignment to a variable and, sometimes, the existence of any variables at all, but not on grounds of type safety).
In a statically-typed language, the variable itself has a type, and nothing can change that type, but another value of the same type can (usually; again, pure functional languages are an exception) be bound to the same variable later on.

Some statically-typed languages require you to explicitly declare the types of your variables (and, often, they also require you to declare the type of the values returned by functions and methods); tThat’s why in some languages you’ll see things like int x = 3 instead of just x = 3. Not all statically-typed languages require this, though; some (like OCaml, for example) will infer types automatically for you in most cases, only requiring an explicit declaration when there’s an ambiguity.

Proponents of static typing again argue that it’s “safer” — you can typically “type-check” a statically-typed program fairly easily without having to actually execute any of it (and nearly all compilers and interpreters for statically-typed languages do this) and so be notified immediately of any type-related errors (for example, a compiler can easily spot that, say, x was declared an int and y a string, and refuse to compile a program which contains the operation x + y).

They also usually point out the range of things which can be easily automated in a statically-typed language (for example, many common refactorings, like breaking up a single large method into several smaller ones, or breaking out a set of values and methods into their own class); these sorts of things are hard to do in dynamically-typed languages, since you don’t necessarily have all the relevant information until the program is running.

Those who prefer dynamic typing again argue in favor of flexibility and discipline; they’ll say that some things which are impossible or which require the programmer to jump through lots of type-related hoops in a static language are possible or much easier in a dynamic language and, much as in the strong/weak debate, that a little discipline on the part of the programmer will go a long way.

Putting it together

A full description of a programming language’s type system usually takes one adjective from each spectrum: Python, for example, has “strong dynamic” typing, JavaScript has “weak dynamic” typing, C# has “strong static” typing, C arguably has “weak static” typing (though this one’s a matter of some debate), and so on. There is a third spectrum, sometimes labeled “safe” and “unsafe”, but these terms are so loaded and nebulous (depending on your personal opinions, it’s possible to argue either side for the same language) that I’m going to stay away from them here, except to note that a common myth of some type systems is “if it passes type checks, it must be correct”. Most people who fall into that pit learn sooner or later that type errors are just one class of errors, and that the strictest, “safest” typing in the world can’t stop you from writing a program that does something stupid.

And almost without fail, debates about type systems will involve at least one person confusing the different classes of typing; the most common error is to assume that static typing is the same as strong typing, and that dynamic typing is the same as weak typing; in effect assuming that type systems are one-dimensional when, in fact, they’re (at the very least) two-dimensional. The c2 wiki has a nice graph showing this with examples.

And now you know

Or, at least, you know the basics. There’s a whole heck of a lot more that’s useful to know about type systems (for example, the varying definitions of “safe” and “unsafe” and what makes different languages fall into those categories), but strong/weak and static/dynamic is the bare minimum you need to understand to get by, and if you can master that you’ll be doing better than an awful lot of the folks who argue about them…