We're back after a server migration that caused effbot.org to fall over a bit harder than expected. Expect some glitches.

Why must dictionary keys be immutable?

The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different.

If you want a dictionary indexed with a list, simply convert the list to a tuple first; the function tuple(L) creates a tuple with the same entries as the list L. Tuples are immutable and can therefore be used as dictionary keys.

Some ideas for alternative implementations have been proposed:

Hash lists by their address (object ID). This doesn’t work because if you construct a new list with the same value it won’t be found; e.g.:

d = {[1,2]: '12'}
print d[ [1,2] ]

would raise a KeyError exception because the id of the [1,2] used in the second line differs from that in the first line. In other words, dictionary keys should be compared using ==, not using is.

Make a copy when using a list as a key. This doesn’t work because the list, being a mutable object, could contain a reference to itself, and then the copying code would run into an infinite loop.

Allow lists as keys but tell the user not to modify them. This would allow a class of hard-to-track bugs in programs when you forgot or modified a list by accident. It also invalidates an important invariant of dictionaries: every value in d.keys() is usable as a key of the dictionary.

Mark lists as read-only once they are used as a dictionary key. The problem is that it’s not just the top-level object that could change its value; you could use a tuple containing a list as a key. Entering anything as a key into a dictionary would require marking all objects reachable from there as read-only — and again, self-referential objects could cause an infinite loop.

There is a trick to get around this if you need to, but use it at your own risk: You can wrap a mutable structure inside a class instance which has both a __cmp__ and a __hash__ method. You must then make sure that the hash value for all such wrapper objects that reside in a dictionary (or other hash based structure), remain fixed while the object is in the dictionary (or other structure):

class ListWrapper:
     def __init__(self, the_list):
           self.the_list = the_list
     def __cmp__(self, other):
           return self.the_list == other.the_list
     def __hash__(self):
           l = self.the_list
           result = 98767 - len(l)*555
           for i in range(len(l)):
                     result = result + (hash(l[i]) % 9999999) * 1001 + i
                     result = (result % 7777777) + i * 333
           return result

Note that the hash computation is complicated by the possibility that some members of the list may be unhashable and also by the possibility of arithmetic overflow.

Furthermore it must always be the case that if o1 == o2 (i.e. o1.__cmp__(o2)==0) then hash(o1)==hash(o2) (i.e., o1.__hash__() == o2.__hash__()), regardless of whether the object is in a dictionary or not. If you fail to meet these restrictions dictionaries and other hash based structures will misbehave.

In the case of ListWrapper, whenever the wrapper object is in a dictionary the wrapped list must not change to avoid anomalies. Don’t do this unless you are prepared to think hard about the requirements and the consequences of not meeting them correctly. Consider yourself warned.

CATEGORY: general