What kinds of global value mutation are thread-safe?
The Global Interpreter Lock (GIL) is used internally to ensure that only one thread runs in the Python VM at a time. In general, Python offers to switch among threads only between bytecode instructions; how frequently it switches can be set via sys.setcheckinterval. Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.
In theory, this means an exact accounting requires an exact understanding of the PVM bytecode implementation. In practice, it means that operations on shared variables of builtin data types (int, list, dict, etc) that “look atomic” really are.
For example, the following operations are all atomic (L, L1, L2 are lists, D, D1, D2 are dicts, x, y are objects, i, j are ints):
L.append(x) L1.extend(L2) x = L[i] x = L.pop() L1[i:j] = L2 L.sort() x = y x.field = y D[x] = y D1.update(D2) D.keys()
These aren’t:
i = i+1 L.append(L[-1]) L[i] = L[j] D[x] = D[x] + 1
Operations that replace other objects may invoke those other objects’ __del__ method when their reference count reaches zero, and that can affect things. This is especially true for the mass updates to dictionaries and lists. When in doubt, use a mutex!
CATEGORY: library cleanup
Comment:
Technically, none of the operations specified are atomic, not even the base L.append(x) . You first need to look up the 'L' variable in the namespace, then the 'append' attribute on L, then you need to look up the 'x' variable in the namespace, then you need to do the call. All of the other examples have similar multi-part executions that make them not atomic.
As for +=, that's not atomic either.
x = 0
def foo():
global x
for i in xrange(1000000):
x += 1
import threading
threading.Thread(target=foo).start()
threading.Thread(target=foo).start()
In one example run, I got x == 1393260 after the threads were done, in another, 1401567.
If you really want to make sure that your operations are atomic in every sense, guard everything with a...
lock.acquire()
try:
#operation
finally:
lock.release()
Or write yourself a synchronized context manager and use 2.5 with statements.
Posted by Josiah Carlson (2006-11-20)
Comment:
The thread safety of python operations depends on the compilation of python statements into byte-codes, which is an implementation detail and should not be relied upon. It is also important to note that even on a byte-code level, the implementation of the byte-code may call back into python. Virtually all dict operations call __hash__ and __eq__, for instance, and __del__ methods can be triggered virtually anywhere. Consequently, reasoning about the thread safety of relying on the GIL is rather difficult and relies on a deep understanding of python internals. Think of the GIL as a guarantee that your concurrent operations won't cause corruption or a segmentation fault (it protects the sanctity of the python interpreter, _not_ the thread-safetly of your code!)
Posted by Mike Klaas (2006-11-20)
Comment:
To follow up on Josiah's comment, you can make some part of a function atomic in 2.5 by doing:
from __future__ import with_statement
import threading
some_rlock = threading.RLock()
def atomic_op():
with some_rlock:
print "some_rlock is locked while this executes"
Posted by Nick Coghlan (2006-11-21)
For the record, Josiah did mention 'with', but that paragraph disappeared during rendering. Sorry for that. /F
Comment:
I agree with Mike Klass: these are unspecified implementation details of the CPython bytecode-compiler and VM. If you depend on these, only people who share this deep knowledge will be able to maintain your code. And these details might change, given enough time. At least one operation in the "safe" list will no longer be atomic in Python 3.0, according to the Python team's current plans. More philosophically, I think relying on opcode atomicity, rather than a mutex, doesn't really remove the mutex-induced complexity and fragility from your code--only the visual warning signs thereof. Btw, as of 11/21/2006, the article doesn't explicitly state *all* the requirements. L, L1, and L2 in some cases must only contain subobjects that don't have __eq__, __cmp__, __lt__ and/or __del__ methods that could execute Python bytecode. Same goes for the keys of D, D1, and D2, but they also care about __hash__, and not __lt__.
Posted by Jason (2006-11-21)
Comment:
Also see this python-dev thread about this entry: http://mail.python.org/pipermail/python-dev/2006-November/thread.html#69981
Posted by Fredrik (2006-11-25)
Comment:
In some sense, this is a list of "how many lookups have to agree?"
if i and j are not really integers, then there arbitrary things might happen before you ask for L1[i:j], but the eventual slice of L1 will still be a single consistent slice.
D[x]=y is atomic for setting D[x], but a strange comparison or hash could sneak in before the assignment transaction, and the deletion of the original value could will not be part of the assignment transaction.
The four "not-atomic" examples all look up the same *name* twice, so there is a chance that L or D could refer to different objects on each lookup.
To make this more concrete:
D1.update(D2) ==>
get D2
get D1
*other thread interrupts, setds D1 to something else*
update the objects already fetched (which is admittedly no longer bound to the name D1)
compared to
L1.append(L[-1]) ==>
get L
get the last element of that list
*other thread interrupts, sets L to something else*
append the element (from the old L) to (the *new*) L
Posted by JimJJewett (2006-11-27)

Comment:
What about x += 1 or D[x] += 1?
Posted by Marius Gedminas (2006-11-12)