One of the great holy wars in programming concerns itself with “type systems”, usually in the sense of “static typing” versus “dynamic typing”, and from time to time it flares up again. Unfortunately, most of the loudest voices are quite content to argue without really understanding the subject, and so proceed to build straw-man-style arguments based on what they think they know. Most often this seems to be a result of inexperience — far too many people have only ever worked seriously with one type of language, and so have no practical understanding of how “those other languages” really do things.
So before you go jumping into the fray, do everyone a favor and familiarize yourself with the way things actually work. I’ll even help you out a bit with a basic overview, if you’d like to read on (and I do mean “basic” — normally I’m all for pedantry, but if I wanted to really do justice to type systems I’d be writing for weeks).
At their most basic level, type systems are rules about types of data — things like numbers, strings, etc. — and what you can do with them. The canonical example of why a language needs to have some sort of rules about data types is a program like this (expressed here in pseudo-code):
x = "Hello"; y = 3; z = x + y;
One of the most important questions answered by a type system is: what does this program do? If y was, say, the string “world”, it’d be easy to figure out: the value of z would be the string “Helloworld” (and, in fact, plenty of languages use + as a string concatenation operator to do just this). If x was a number, like 5, then the value of z would be the number 8. But here we’re trying to “add” a string and a number; how do we do that?
One spectrum of type systems runs from strong typing to weak typing. With strong typing, the program above would die with an error; for example, if you try this in Python (a strongly-typed language), you’ll get back TypeError: cannot concatenate ‘str’ and ‘int’ objects (switch the order of the operands and it’ll be TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’ — putting the number first will cause Python to treat the + operator as meaning addition, not concatenation). With weak typing, the language may “coerce” one or the other of the values into a compatible type, and then do what you asked; for example, JavaScript will coerce y into the one-character string “3”, and then store the string “Hello3” in z.
And this is generally the biggest difference between strong and weak typing: strong typing will simply raise an error and refuse to perform an operation on a data type which doesn’t support that operation, while weak typing may try to munge it into something sensible. That probably makes weak typing sound dangerous, but keep in mind that weakly-typed languages nearly always have well-defined rules for how they’ll do this; learn the rules and you’ll be able to predict what your program will do.
And some languages also do bits of both; for example, consider this snippet which might occur in a C program:
int x = 3; double y = 5.0;
What happens if you try to add x and y? They’re of different types, but C will let you get away with it by temporarily “promoting” x to a double in order to get compatible types (so you could do it with double z = x + y;, trusting that x would get promoted appropriately). There are specific rules which define which types can be “promoted” (sometimes you’ll hear it called “widening”, which is closer to what actually happens under the hood) in which circumstances, and what the result will be. C is sometimes labeled a weakly-typed language because of this and and a few other features (casting and pointer manipulation being the usual culprits, though the latter is more to do with safety than with weak typing), but most of its descendents — among them obviously strongly-typed languages like Java and C# — support this feature.
Similarly, Java and C# both have a distinction between “primitive” or “value” types (such as numbers) and “reference” types (such as classes), and support “boxing”, a method of performing an operation on a primitive type — by way of an intermediary object “box” — which are normally only allowed on reference types.
Generally, proponents of strong typing argue that it’s better because it’s “safer”; they claim that by raising errors on these types of operations, the language helps you avoid certain types of bugs (for example, if you inadvertently tried to add a number and a string, it’d be better to get an error than to end up with a value that wasn’t what you expected it to be, especially if you’re going to store that data or do other work with it), while fans of weak typing argue that the increased flexibility of implicit type conversion allows you to accomplish things with less code (since you don’t have to perform as many explicit conversions to get the correct types), and that a little discipline on the part of the programmer will work around the bugs strong typing is meant to prevent.
The other major spectrum of type systems runs from static typing to dynamic typing, and in the vast majority of cases people who argue about “strong” and “weak” typing are really talking about these without realizing it. Let’s look at another example:
x = "Hello"; # ...some other intervening code... x = 3;
In a statically-typed language, this code will raise an error. In a dynamically-typed language it’ll be fine. A good way to think of the difference is this:
Some statically-typed languages require you to explicitly declare the types of your variables (and, often, they also require you to declare the type of the values returned by functions and methods); tThat’s why in some languages you’ll see things like int x = 3 instead of just x = 3. Not all statically-typed languages require this, though; some (like OCaml, for example) will infer types automatically for you in most cases, only requiring an explicit declaration when there’s an ambiguity.
Proponents of static typing again argue that it’s “safer” — you can typically “type-check” a statically-typed program fairly easily without having to actually execute any of it (and nearly all compilers and interpreters for statically-typed languages do this) and so be notified immediately of any type-related errors (for example, a compiler can easily spot that, say, x was declared an int and y a string, and refuse to compile a program which contains the operation x + y).
They also usually point out the range of things which can be easily automated in a statically-typed language (for example, many common refactorings, like breaking up a single large method into several smaller ones, or breaking out a set of values and methods into their own class); these sorts of things are hard to do in dynamically-typed languages, since you don’t necessarily have all the relevant information until the program is running.
Those who prefer dynamic typing again argue in favor of flexibility and discipline; they’ll say that some things which are impossible or which require the programmer to jump through lots of type-related hoops in a static language are possible or much easier in a dynamic language and, much as in the strong/weak debate, that a little discipline on the part of the programmer will go a long way.
A full description of a programming language’s type system usually takes one adjective from each spectrum: Python, for example, has “strong dynamic” typing, JavaScript has “weak dynamic” typing, C# has “strong static” typing, C arguably has “weak static” typing (though this one’s a matter of some debate), and so on. There is a third spectrum, sometimes labeled “safe” and “unsafe”, but these terms are so loaded and nebulous (depending on your personal opinions, it’s possible to argue either side for the same language) that I’m going to stay away from them here, except to note that a common myth of some type systems is “if it passes type checks, it must be correct”. Most people who fall into that pit learn sooner or later that type errors are just one class of errors, and that the strictest, “safest” typing in the world can’t stop you from writing a program that does something stupid.
And almost without fail, debates about type systems will involve at least one person confusing the different classes of typing; the most common error is to assume that static typing is the same as strong typing, and that dynamic typing is the same as weak typing; in effect assuming that type systems are one-dimensional when, in fact, they’re (at the very least) two-dimensional. The c2 wiki has a nice graph showing this with examples.
Or, at least, you know the basics. There’s a whole heck of a lot more that’s useful to know about type systems (for example, the varying definitions of “safe” and “unsafe” and what makes different languages fall into those categories), but strong/weak and static/dynamic is the bare minimum you need to understand to get by, and if you can master that you’ll be doing better than an awful lot of the folks who argue about them…
Comments for this entry are closed. If you'd like to share your thoughts on this entry with me, please contact me directly.
Nice overview. Surprisingly, I actually did understanding typing correctly (although I wouldn’t have been confident enough that I did to say it out loud ten minutes ago). Thanks for the confirmation.
But I do have a n00b question. It seems to me that languages like Python (strong/dynamic) have a bit of a discrepancy on their hands. It feels funny to argue in favor of flexbility on the dyanmic/static axis and in favor of safety on the strong/weak plane — doesn’t it?
The canonical answer to your question is “ask Guido”, because he’s the only one who could really explain why Python made that design choice :)
There are some relevant bits in the Zen of Python (particularly “In the face of ambiguity, refuse the temptation to guess” and “Explicit is better than implicit”) which go along with strong typing but not necessarily static typing.
Personally, I think going strong/dynamic strikes a nice balance: you get the ease of understandability which comes from strong typing (since you don’t need to memorize rules about what can be munged and when and how), without the cumbersome nature of (most) static type systems.
And — again, this is purely subjective — I think there are some useful patterns which are so much easier in a dynamically-typed language that it’s worth giving up static analysis. For example, you can just do duck-typing stuff, where you don’t care at all about the type of the thing passed to you, and not have to worry. In a statically-typed language, at best you’ll have to jump through some hoops to satisfy the type system, and at worst you just won’t be able to do it.
I need to write one of those “five things that annoy me about Python” lists, but one of them is related to this exact topic. It’s would be nice if Python’s comparison operators respected the strong typing behaviour a bit more. It’s always struck me as borderline insane that None < 5 < “a string”, when they are really incomparable (and for those saying “strong typing only applies to operators”, I will point out that a comparator is a mapping from two types to a boolean — ergo, an operator). This is the sort of thing that the c2wiki gets into with strong typing without loopholes a bit, too. Python has loopholes like this, even for some of the fundamental types.
@Malcolm, that’s scheduled to be fixed in Python 3000. See http://www.python.org/dev/peps/pep-3100 (“Comparisons other than == and != between disparate types will raise an exception unless explicitly supported by the type”).
Nice!
I am used to a different but related meaning for the phrase dynamically typed. You say that the values have types but the variables do not. I normally use the expression to suggest that the values themselves may change type.
In Ruby, arrays define the
*operator on arrays to take an integer as a parameter. It is an error if you pass another array, as in[1, 2] * [3, 4]. However, you can open theArrayclass orEnumerablemodule and redefine*so that it produces the Cartesian Product of the arrays.Thus, an expression that is an error suddenly becomes correct. Arrays have ‘changed type’ in the sense that the set of correct operations has changed.
Also, when a language allows heterogeneous values to be bound to a variable at run time, it is still very possible to use static typing. Languages with type inference can create a new type on the fly
(int | string)and check that every user of the variable handles both possibilities. The Java variant Nice can do this to support value types that are nevernilseparately from value types that might benil.Be very careful about the definition of strong typing vs weak typing. A lot of people think that strong typing means “no implicit conversions”, and weak typing means “allows implicit conversions”. This is used commonly enough that Wikipedia’s (non-authoritative!) definition equates weak typing with implicit conversion, as you have done above.
However, this is only one interpretation. Almost all type theorists (who study types to a degree that would scare even guru programmers) generally define strong typing to mean “the type system cannot be evaded”, and weak typing to mean “the type system can be evaded”. A cast in C or C++ is an example of evading the type system: a cast means “you, the type system, think it’s of this type, but I as the programmer am forcing you to think of this type”. It’s therefore possible to trick the type system. Under this definition, C and C++ are two of the very few weakly-typed languages around.
I’ve wound up in enough typing debates that I now always define strong/weak/static/dynamic typing before using those terms, and I usually define it in the way that the type theorists do, because, well, that’s what they do for a living. I can always say “implicit type conversion” if I mean implicit type conversion, rather than saying “weak typing”, which is often ambiguous. Even Benjamin Pierce, arguably the leading authority on type systems, has stated that the various typing terms are ambiguous.
Great article. Simply and clearly written (evidenced by the fact that even i could cruise along fairly fast :)) Keep it up and apply those pedagogical skills of yours to more articles!
I have to strongly disagree with this statement. You can in fact have type systems that are rich enough to describe arbitrary theorems about the behaviour of programs. Coq has just such a type system, and it is not alone (although it is a bit lonely).
Andre, I understand where you’re coming from, certainly, but I’ve usually seen “the type system cannot be evaded” associated more with “safe” typing than with “strong” typing. And, of course, that’s another fuzzy area: just as many people speak interchangeably of “weak” and “dynamic” typing, so too do many speak interchangeably of “safe” and “strong” typing. I’ve tried to walk a fine line here, because I know a lot of my readers are users of Django who are often new to Python and to programming in general, and so I deliberately left off safe/unsafe and higher discussion of what those terms can mean — if I were writing for C programmers I probably would have handled it differently ;)
Regardless of which definition we go with, I think C falls into the “weak” category, but I’d also put several other languages in there under either definition: implicit conversion carried out by the language, in order to avoid what would otherwise be a type error, seems to me to be “evading” the type system.
Gavin: I’ve yet to see a “provably correct” program of non-trivial size (though “non-trivial” is a somewhat nebulous term), regardless of type system or programming paradigm (typically that’s touted by people newly come to the FP bandwagon, which can make for some amusing arguments). Got one lying around?
James,
Actually I do have some “lying around”. Now, “non-trivial” is a bit hard to work with, since you can always move the bar on me. In any case, you don’t need to demonstrate a huge program to see that in fact arbitrary properties can be expressed in the type system, and that you can then connect the pieces and prove properties about them in turn. So the approach certainly scales to some extent.
There are limitations and caveats to the approach of course, but I think it stands as a clear contradiction to the notion that types can not capture all of the important properties that you want to express.
http://coq.inria.fr/V8.1/stdlib/Coq.Sorting.Heap.html
Especially look at the type of the final function “treesort”.
Theorem treesort : forall l:list A, {m : list A | sort leA m & permutation _ eqA_dec l m}.
treesort is a function with the following property. Given any homogeneous list of A’s (named l) we can supply a list (named m) of A’s such that the list is sorted according to some binary predicate leA and the “m list” is a permutation of the “l list”. Of course in order to be sure that it is in fact a permutation we have to be able to compare the elements of both lists, so we also require an additional decidable equality predicate.
I hope this example serves to dispell the myth that types can’t capture important properties. It simply isn’t true. Full Martin Lof style type theories can bring the full weight of mathematical sophistication to bear on the problem of software correctness.
“The c2 wiki has a nice graph showing this with examples.”
Which was hacked out of Cardelli’s Handbook of Computer Science and Engineering, Chapter 103”, and in the original the quadrants were formed from safe/unsafe and typed/untyped.
And seen again in Krishnamurthi’s “Programming Languages: Application and Interpretation” p263
Cardelli says some languages “can be euphemistically called weakly checked (or weakly typed…”
Krishnamurthi says - So what is “strong typing”? This appears to be a meaningless phrase, and people often use it in a non- sensical fashion. To some it seems to mean “The language has a type checker”. To others it means “The language is sound” (that is, the type checker and run-time system are related). To most, it seems to just mean, “A language like Pascal, C or Java, related in a way I can’t quite make precise”. If someone uses this phrase, be sure to ask them to de?ne it for you. (For amusement, watch them squirm.)
Gavin, Coq looks nice (and awfully familiar — my degree was in philosophy, btw), but I still have a feeling that these sorts of techniques are going to get awfully unwieldy as the size and functionality of a program grows. Mathematical proofs have been tending toward that for a while now, with some growing to and beyond the bounds of what human beings can reasonably manage, and I have a feeling that proofs of computer programs will naturally go the same way (quite possibly much sooner than their “pure” mathematical counterparts).
Isaac, I believe I did define “strong typing” and related it to the notion of a type checker which does not permit implied coercion of values into compatible types.
“Safe” typing, I’d argue, is a far more nebulous term — does it mean that the language checks types for compatibility before each operation, and raises an error when they’re incompatible? Does it mean that the language will work to prevent incompatibilities by coercing types into compatibility before carrying out certain operations? Does it mean that the language forbids operations which can’t reliably be type-checked? Does it mean something else, none of the above, all of the above?
Dismissiveness is not a proper attitude to take here, especially given the existence of an extremely large gap between practical and theoretical computer science — many experts in one area would be hopelessly adrift if dropped into the other, and broad dismissals which reach outside a person’s area of expertise are bound to be dangerous.
James wrote “Safe” typing, I’d argue, is a far more nebulous term” As befits a chapter in a “Handbook” Cardelli provides a glossary- Type safety : The property stating that programs do not cause untrapped errors.
which is much the same view as Krishnamurthi Type safety is the property that no primitive operation ever applies to values of the wrong type.
Your comments on type safety seem to focus on how it could be implemented rather than what it achieves.
James wrote I believe I did define “strong typing” afaict you gave an example - not a definition. Can you define strong typing in terms of what it achieves?
Isaac, again this strikes me as a somewhat dangerously dismissive attitude — I could define a dichotomy of languages “safe” and “unsafe” and just state that “safe” languages do not permit the programmer to make errors, but that’s far from useful. The implementation, or even suggested implementation, is what makes it useful (and, historically, the “safest” languages have also been of somewhat limited practical utility).
Which is why I ask what exactly is meant by “safe” — there are several classes of things to which the terms “safe” and “unsafe” might apply, and replying that “programs do not cause untrapped errors” does very little to answer the question. How do we define an “untrapped error”, for example? If the language is dynamically typed and so has no static checking, would the language be “unsafe” if it raised an exception at runtime due to a mistake from the programmer? Or, in the name of “safety”, would it be required to implicitly munge things to avoid the exception?
These are questions which largely can’t be answered without speaking to implementation, which is why I mentioned the gap between theoretical and applied CS.
And for the record, I would define strong typing largely in terms of implementation — a language which is “strongly typed” will:
Whether the language permits the programmer to work around this — e.g., through “loopholes” in the type system like casts and other explicit conversions — can go either way.
Andre Pang, as far as I know, there are no practical programming languages where the type system can’t be evaded in any way whatsoever. Even ultra-strong languages usually have an “unsafeCoerce”lying around somewhere. Explicit escaping from the type system is practically always possible.
But there’s a long way from that to languages like C, which have basic features which are type-unsafe (unions, casts and void pointers in C, variant records in Pascals, I hear)
James Bennet, exceptions at run time are of course trapped errors. In practice, I think the best test is whether it can cause silent memory corruption or execution of arbitrary memory addresses without resorting to very explicitly unsafe features.
James wrote … just state that “safe” languages do not permit the programmer to make errors, but that’s far from useful. In contrast it would be useful to know that we had a “Safe language: A language where no untrapped errors can occur.”
James wrote How do we define an “untrapped error”, for example? As befits a chapter in a “Handbook” Cardelli provides a glossary- Untrapped error: An execution error that does not immediately result in a fault.
James wrote …would the language be “unsafe” if it raised an exception at runtime due to a mistake from the programmer? As Harald Korneliussen said exceptions at runtime are of course trapped errors.
James wrote …questions which largely can’t be answered without speaking to implementation I agree that when we speak about a language being type safe or not type safe, we mostly don’t have a specification of behaviour that in itself allows us to answer that question without talking about a particular implementation of the language.
(This - strikes me as a somewhat dangerously dismissive attitude - seems to be against the person rather than against the argument they make.)
James wrote When faced with an operation on incompatible types, never implicitly coerce or convert types to obtain compatibility…
I don’t think we can understand that definition without a definition of “incompatible types”.
I don’t think we can understand that definition without a definition of “operation” - is assignment an operation?
Harald Korneliussen wrote … there are no practical programming languages where the type system can’t be evaded in any way whatsoever.
I don’t think “practical programming languages” has a fixed well-understood meaning that would allow us to judge if some specific language was in fact a “practical programming language”.
I’m not sure we are talking about any way whatsoever. I think we’re talking about whether we can write programs that evade the type system in the language itself - we could do all kinds of things with a FFI.
The fun really starts when we look at a language where everything has the same type or to put it another way, an untyped or typeless language.
We can do that without dropping down to assembler, we can use BCPL.
Everything has the same type - so no BCPL program has “incompatible types” - but somehow I don’t think this is really what you wanted strongly typed to mean!
Let’s try your pseudo-code in BCPL - it has a quite amusing result
and when we write out the values of x and z in different ways
I think that program might be exhibiting untrapped errors ;-)
James,
I knew you would move the bar on me!
I don’t think that a nebulous supposition that the methodology doesn’t scale counts for much especially given the number of projects that are being done in Coq. Take a look at this lambda calculus compiler that takes lambda calculus to the metal.
http://www.cs.berkeley.edu/~adamc/papers/CtpcPLDI07/
What is a real project? Is a compiler too simple to count?
In any case, I think I’ve shown that in fact your original statement is false as it didn’t make any mention of scaling.
Gavin
Isaac, you seem to be saying very little of substance, so I’ll humor you one last time:
An operation on incompatible types is an attempt by a programmer to invoke a function with arguments of types for which the function is not defined. Let’s illustrate by example:
Suppose that we have a language with a function
add, and thataddis of typeint -> int -> int ->. Suppose that we attempt to write a program like so:The function
addis undefined for this pair of arguments, so we have here an operation on incompatible types. A strongly-typed language has no available course of action and so raises an error. A weakly-typed language, however, has a course of action open to it: it may note that the value ofy, though a string, consists entirely of digits, and coerce anintargument out of them, at which point it would return the integer 7 as a result.You seem to be doing your best to Quine this discussion, however, so I’m going to leave you to your appeals to authority and just say that for some definitions of “safe”, “safe” typing and “strong” typing are similar or synonymous. Not all definitions of “safe” typing, however, exhibit similarity or synonymy with “strong” typing, and those definitions which do exhibit similarity or synonymy with “strong” typing do not always catch or forbid some cases which would be universally considered “unsafe” (for example, it’s possible to do stupid pointer tricks in C which are perfectly acceptable in “strong” typing, but cause “unsafe” behavior).
Gavin: I’m largely concerned with practicality and always have been. I’ve yet to see a language which can boast that its type system makes all well-typed programs execute “correctly”, and I still feel that “if it compiles, it’s correct” is a dangerous attitude — an algorithm which passes type checks may still return the wrong result, for example (say, by returning the correct type, but the wrong value), and for the “type system” to catch this (remember — “correct” is a very large term which does not end at “the program does not raise errors”), the type system itself must evolve into a programmable language of its own. In the world of practical programming, we tend to call such “type systems” by a different name: unit tests.
James wrote An operation on incompatible types is an attempt by a programmer to invoke a function with arguments of types for which the function is not defined. Is your definition of strong typing even applicable to untyped languages?
James wrote appeals to authority Hardly! If someone already has workable definitions I see no reason not to use them.
James wrote stupid pointer tricks in C which are perfectly acceptable in “strong” typing, but cause “unsafe” behavior As your notion of “strong typing” does not include type safety we should ask - what does it guarantee that a practical programmer would actually care about?
James wrote … you seem to be saying very little of substance, so I’ll humor you … You are quick to judge that others have a dismissive attitude - have you examined your own comments for that fault?