Wat’s up, doc?

In much the same vein as Gary Bernhardt’s wonderful talk about JavaScript, there’s a collection of Python “wat” moments which goes around every so often. There’s also an associated quiz linked from that page (which I won’t spoil; you can read through it yourself and then check your answers). Every language has some unintuitive — or at least seemingly-unintuitive — bits, and Python is no exception. But if you’re working in Python, understanding *why* (or, perhaps more appropriately, *wy*) these snippets of code behave the way they do is interesting and potentially useful (OK, probably not useful, but at least it’s interesting). So let’s take a look at them and see what’s really going on.

### “Converting to a string and back”

The given example is:

>>> bool(str(False)) True

This one’s pretty straightforward: `str(False)`

is `"False"`

, and `bool("False")`

is `True`

, because any non-empty string is `True`

(“truthy”, if you want to be precise, since Python boolean checks rarely use the actual `bool`

instances).

### “Mixing integers with strings”

The example:

>>> int(2 * 3) 6 >>> int(2 * '3') 33 >>> int('2' * 3) 222

This one’s a little more interesting, and causes people to argue about Python’s type system. The behavior here comes from the fact that Python supports operator overloading, and doesn’t restrict what types you’re allowed to define your operators on. In this case, the `*`

operator is implemented on the numeric types, where it is a multiplication operator (and, obviously, requires the other operand to be numeric). But it’s also implemented on the sequence types (remember, `str`

is a sequence type in Python), where it’s a repetition operator and requires the other operand to be numeric.

So when given this operator with one operand that’s numeric and another operand that’s a sequence, Python applies the repetition behavior.

### “The undocumented converse implication operator”

Fun times:

>>> False ** False == True True >>> False ** True == False True >>> True ** False == True True >>> True ** True == True True

Understanding this one requires a bit of trivia about Python’s history. Originally there was no built-in boolean type and so (as in plenty of other languages which lack booleans) the convention was to use integer `1`

as the “true” value and integer `0`

as the “false” value. Python 2.2.1 introduced `bool()`

as a built-in function but no boolean type — rather, it defined `True`

and `False`

as built-in aliases for `1`

and `0`

. The `bool()`

function would return `1`

for “truthy” values and `0`

for “false-y” values. Python 2.3 implemented the `bool`

type, as a subclass of `int`

with only two instances: `True`

and `False`

, which had integer values `1`

and `0`

. This has stuck around into the Python 3 line, as you can verify for yourself:

$ python Python 3.5.0 (default, Sep 26 2015, 18:41:42) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> issubclass(bool, int) True >>> isinstance(True, int) True >>> isinstance(False, int) True >>> True + True 2 >>> True - False 1

For more details about the weirdness surrounding the introduction of the `bool`

type, see this blog post from Guido.

### “Mixing numerical types”

>>> x = (1 << 53) + 1 >>> x + 1.0 < x True

The author says “Note: this is not simply due to floating-point imprecision.” Which is technically true, I guess, but slightly misleading: the trick here is pushing past the range in which a double-precision float can represent every integer (the 53-bit shift gives it away, as double-precision floats have only 53 bits of precision in the significand). If you toy with it, you’ll find that you only get even numbers past that point, as expected for this range: in IEEE 754, from `2**51`

to `2**52`

, double-precision floats are spaced by 0.5, switching to spaced by 1 — i.e., all integers and only integers can be represented — up to `2**53`

, and beyond `2**53`

they’re spaced 2 apart so only even integers can be represented.

### “Operater precedence?”

>>> False == False in [False] True

This isn’t exactly about precedence; rather, it’s about Python’s support for chained comparison operators. We’re used to being able to do things like `if x < y <= z`

in Python, and we’re used to it doing the right thing with constructs like that. That chain of operators is equivalent to `if (x < y) and (y <= z)`

, but with `y`

only being evaluated once.

And since `==`

and `in`

are comparison operators, the same holds here: `False == False in [False]`

is equivalent to `(False == False) and (False in [False])`

. Both comparisons are true, so the overall result is true.

### “Iterable types in comparisons”

>>> a = [0, 0] >>> (x, y) = a >>> (x, y) == a False >>> [1,2,3] == sorted([1,2,3]) True >>> (1,2,3) == sorted((1,2,3)) False

This one is a bit of a reach. What’s really going on in the first example is that `a`

is a list, and `(x, y)`

is a tuple. A list and a tuple will not compare equal, even if their contents are identical. Similarly, `sorted()`

returns a list, so you’ll only get a successful equality comparison when comparing the result to a list.

### “Types of arithmetic operations”

>>> type(1) == type(-1) True >>> 1 ** 1 == 1 ** -1 True >>> type(1 ** 1) == type(1 ** -1) False

Python allows arithmetic comparisons of floats and ints to work, so `1 == 1.0`

(and `1**-1`

is `1.0`

— negative exponents always return a float). But `int`

and `float`

are not the same type, so the type-equality check fails.

### “Fun with iterators”

>>> a = 2, 1, 3 >>> sorted(a) == sorted(a) True >>> reversed(a) == reversed(a) False >>> b = reversed(a) >>> sorted(b) == sorted(b) False

This is fun with types again. Python’s `sorted()`

built-in takes a sequence, and returns a list containing the same values sorted. But `reversed()`

returns an iterator object which will traverse the sequence in reverse order.

The iterator returned by `reversed()`

does not implement the `__eq__()`

method, so for equality comparisons Python falls back to calling `__hash__()`

on each operand and comparing the results. The iterator also doesn’t implement `__hash__()`

, so it gets the default implementation from `object`

, which in turn is derived from the memory address of the object. Since two different iterator instances have different memory addresses, the results of two calls to `reversed()`

on the same sequence will compare unequal.

The comparison of results of `sorted()`

in the second example is trickier: the first call to `sorted()`

consumes the iterator returned by `reversed()`

, and produces the sorted list `[1, 2, 3]`

. But the second call to `sorted()`

has nothing left to consume, and returns the empty list `[]`

, and it is the case that `[1, 2, 3] != []`

.

### “Circular types”

>>> isinstance(object, type) True >>> isinstance(type, object) True

This is just one of those things :)

### “extend vs +=”

>>> a = ([],) >>> a[0].extend([1]) >>> a[0] [1] >>> a[0] += [2] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment >>> a[0] [1, 2]

Python won’t let you directly assign to indices in a tuple, either via the normal or augmented (`+=`

and friends) syntax. But it will let you call methods of the objects in the tuple, and if those objects happen to be mutable and happen to define methods which let you mutate them without using assignment syntax, it’ll work.

### “Indexing with floats”

>>> [4][0] 4 >>> [4][0.0] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: list indices must be integers, not float >>> {0:4}[0] 4 >>> {0:4}[0.0] 4

This is slightly sneaky: the first two examples use a list, and list indices must be integers. The second two examples use a dictionary, and any hashable type can serve as a dictionary key.

As to why 0 and 0.0 return the same value, I’m not 100% certain of this (as I haven’t looked at the CPython dictionary implementation lately), but I believe the collision-avoidance allows two keys to get the same value from the dictionary if they have the same hash and compare equal (and since `hash(0) == hash(0.0)`

and `0 == 0.0`

you get the result in the example).

### “all and emptiness”

>>> all([]) True >>> all([[]]) False >>> all([[[]]]) True

Tricky. The argument to `all()`

is a sequence. So in the first example, we’re asking it to evaluate an empty sequence; `all()`

is defined to return `True`

for an empty sequence. The second one gets a sequence containing one item — an empty list — which evaluates `False`

, so returns `False`

. The third one gets a sequence containing one item — a list containing an empty list — which evaluates `True`

(because the list containing the empty list is itself nonempty), and so returns `True`

.

### “sum and strings”

>>> sum("") 0 >>> sum("", ()) () >>> sum("", []) [] >>> sum("", {}) {} >>> sum("", "") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: sum() can't sum strings [use ''.join(seq) instead]

This is another one where a quick look at the documentation for the function reveals what’s going on.

When given an empty sequence, `sum()`

will return `0`

, and the empty string is an empty sequence. When given two arguments, `sum()`

treats the second argument as a starting accumulator value to return when the supplied sequence is empty (in fact, the definition of it is `sum(sequence, start=0)`

, so really in the empty-sequence case with one argument it’s just returning the default value of `start`

); that’s what’s going on in the second, third and fourth examples. In the fifth example, `sum()`

complains that it can’t work with a string value for the second argument, since `sum()`

is defined as able to reject non-numeric types.

There is some wartiness here, though; `sum()`

only type-checks its second argument (if you want to verify, it’s `builtin_sum()`

on Python 2, and `builtin_sum_impl()`

on Python 3, and in either version is located in `Python/bltinmodule.c`

in the source tree). On Python 2, it short-circuits to a `TypeError`

if the second argument is an instance of `basestring`

; on Python 3, it short-circuits to `TypeError`

when the second argument is an instance of `str`

, `bytes`

or `bytearray`

.

But it never checks the type of the first argument, or of the items in that argument (if it’s a sequence); it simply relies on the fact that iteration on a non-sequence will raise `TypeError`

, and addition of a string to an integer will raise `TypeError`

(the latter because you can’t pass a string value to the second argument, and that argument defaults to `0`

when not specified).

### “Comparing NaNs”

>>> x = 0*1e400 # nan >>> len({x, x, float(x), float(x), 0*1e400, 0*1e400}) 3 >>> len({x, float(x), 0*1e400}) 2

`NaN`

is weird. IEEE 754 tells us that comparisons with `NaN`

are unordered; `NaN`

is neither greater than, less than nor equal to any floating-point value, including itself.

So in the first `len()`

call, in theory we should expect an answer of 6; all the values are `NaN`

and none of them are equal to any of the others, so the set literal shouldn’t prune out any “duplicate” values. Similarly, the second `len()`

call should return 3.

What actually seems to be happening is that Python is considering `x`

and `x`

to be “duplicate” values, `float(x)`

and `float(x)`

to be “duplicate” values, and `0*1e400`

and `0*1e400`

to be “distinct” values. ~~Why that is I’m not quite sure. I suppose it’s possible there’s some sort of tricky single evaluation thing going on, but that would require Python to know that ~~`float(x)`

always returns the same value for the same `x`

(and in this case it’s not true in the sense that the calls both return `NaN`

values that compare unequal).

*Edit:* a comment on reddit hits on the solution. Python does appear to be using identity as an optimized short-circuit to avoid doing a potentially-expensive equality check. And indeed, `x is x`

and `float(x) is float(x)`

both return `True`

with `x = 0*1e400`

, but `0*1e400 is 0*1e400`

returns `False`

. If someone else wants to have a bit of fun, take a look into *why* `0*1e400 is not 0*1e400`

.

I’d call this one a wart, but I’m also willing to just call `NaN`

itself, and probably IEEE 754, a wart (it’s just that it’s less warty than the alternatives).