A question I’ve gotten from several readers recently is “how does the lock statement actually work in C#?” It appears to be somewhat magical; if you didn’t already have a lock statement, how would you implement it?
Before I go on, I should make one thing very clear: this article is going to be chock full of lies. The mechanisms that the CLR actually uses to implement locks are quite a bit more complicated than the oversimplified sketch I’m presenting here. The intended takeaway here is not that you understand precisely what the CLR does, but rather that you understand how a very simple lock could be built out of even simpler parts.
Let’s start by clearly describing what a lock statement does. It is documented by the specification as having two main properties. First, a lock statement takes a reference to an object which may be “acquired” by a thread; such an object is called a “monitor” for historical reasons. Only one thread may acquire a particular monitor at one time. If a second thread attempts to acquire a particular monitor while a first thread is holding it, the second thread blocks until such time as the first thread releases the lock and the second thread can acquire the monitor. The question then is how this behaviour can be implemented without locking, since that’s what we’re trying to implement. Second, the C# specification states that certain special side effects in multithreaded programs are always observed to be ordered in a particular way with respect to locks; we won’t discuss this aspect of locking in this article.
Once more, before I go on I want to clarify a few other differences between what I’m presenting today and reality. In the real C# language you can lock on any instance of a reference type; we won’t consider that. In the real C# the same thread can acquire the same monitor twice:
We won’t consider that either. And in the real C# language, there are more operations you can perform on monitors than just acquiring and releasing them, such as waiting and pulsing. I’m not going to discuss any of these today; remember, this is a pile of lies intended to get across the idea that locks are built out of more fundamental parts. The real implementation is far more complex than what I’ll present here.
OK, now that we’ve got that out of the way, the next thing to discuss is what the lock statement actually means in C#. When you say
the C# compiler does a very simple transformation of that code into:
As you can see,
Exit are the names in the .NET framework for the “acquire” and “release” operations on a monitor.
Here we have our first major lie. That was the code generated before C# 4, and it has some reliability problems. See my 2009 article on this subject for the real codegen.
To illustrate how a lock is not magical, I need to show how to implement those enter and exit methods without using locks. So again, let’s simplify. Instead of any object, let’s make a specific class of monitor objects to illustrate how it might work. Here’s a naive and broken implementation:
It should be clear why this implementation is unacceptable. Suppose we have:
Threads A and B both call
Foo and both enter
Enter. Both discover that
acquired is false, skip the loop, both set
acquired to true, and both enter the body of the lock. How do we fix this problem?
The method we need is
int Interlocked.CompareExchange(ref int variable, int newValue, int oldValue). This method takes an integer variable (by reference) and two values. If the variable is equal to the second value then it is set to the first value; otherwise, it stays the same. Regardless of whether the variable is changed or not, the original value of the variable is returned. All of this is done atomically. This is the magic building block that we need to build a monitor:
Now if you’re reading carefully you should at this point be protesting that the question has been thoroughly begged. We have implemented an exceedingly simple monitor, yes, but we’ve just moved the magic atomicity from
Interlocked.CompareExchange! How then is this implemented atomically?
By the hardware! The Intel chip has an instruction
CMPXCHG which does an atomic-compare-and-exchange.
Interlocked.CompareExchange can be thought of as a thin wrapper around a machine code routine that uses this instruction. How the hardware achieves an atomic compare and exchange is up to Intel. Ultimately all the magic in any computer program comes down to the hardware eventually. (Of course that instruction has not existed in every chipset since the invention of monitors in the 1970s. Implementing atomic-compare-and-exchange on hardware that does not have this instruction is an interesting challenge but well beyond the scope of this article. On modern hardware we rely on these sorts of instructions.)
The monitor implementation I’ve presented here would work, but it is extremely inefficient compared to real monitors; the real implementation does not simply sit in a loop calling
Thread.Sleep(1). Suppose we have a hyperthreaded processor — that is, we have two threads of execution going in one physical processor. Suppose further one thread has acquired the monitor and its lock body is extremely short: on the order of nanoseconds. If the second thread has the bad luck to ask for the monitor a couple nanoseconds before the monitor is going to be released by the first thread, then the second thread ends up ceding its time to any other thread in the system. What would be better is for it to burn those couple nanoseconds doing no real work and try again; this avoids the cost of the context switch to another thread.
The .NET Framework gives you multiple tools you could use to build a more sophisticated waiting strategy:
Thread.SpinWait puts the processor into a tight loop allowing you to wait a few nanoseconds or microseconds without ceding control to another thread.
Thread.Sleep(0) cedes control to any ready thread of equal priority or keeps going on the current thread if there is none.
Thread.Yield cedes control to any ready thread associated with the current processor. And as we’ve seen
Thread.Sleep(1) cedes control to any ready thread of the operating system’s choice. By carefully choosing a mix of these calls and doing performance testing in realistic conditions you could build a high-performance implementation, and of course this is what the CLR team has actually done.
So that’s it; an extremely simple monitor can be built out of an atomic test-and-set of a flag that indicates whether the monitor is taken or available, and a strategy for waiting that gives good performance in the (hopefully unlikely!) event that the monitor cannot be acquired immediately. We’d need to add a small amount of extra gear to allow the same monitor to be taken in “nested” scenarios, but perhaps you can see how that could be done from this sketch. In order to make a monitor that is robust in the face of thread abort exceptions we’d need even more gear, but again, it could all be built out of judicious uses of
CompareExchange. You might have also noticed that our oversimplified implementation is in no way “fair”: if there are ten threads waiting for a monitor there is no guarantee that the one that has been waiting the longest gets it; this could cause “thread starvation” in practice. And finally, in the real CLR any object of reference type can be used as a monitor. The exact details of how the CLR does so efficiently are beyond the scope of this article; suffice to say that the CLR’s implementation is heavily optimized for the case that a monitor is (1) used only for locking, and (2) is never contended.