Race conditions

From what we've said already, it should be clear that the operating system may interrupt any process at practically any time---simply because interrupts can happen at practically any time. When it is interrupted, a process might be executing a system call, which may be updating some system structures, or a process could be modifying some value which it shares with another process... (section 2.2.1)

Generally, when concurrent processes wish to make use of shared structures, there is a potential 'race condition'. At its very simplest, let us imagine that two processes wish to make use of a shared variable called 'count'. Let's say the variable holds the number of users logged on to the system. One process adds users. Its code looks like this:

addAUser(){
    register int temp1;
    temp1 = count; // Get the current count.
    temp1++;
    count = temp1; // Set the new count.
}
It may look inefficient, but even if the code was written "count++;", this is still what the processor does---fetch a copy of the variable from memory into a processor register, increment the value of the register, and write it back to memory.

The code to remove a user looks like

removeAUser(){
    register int temp2;
    temp2= count; // Get the current count.
    temp2--;
    count = temp2; // Set the new count.
}

All will be fine most of the time, because most of the time, the processes are probably doing other things, and the chances of both add and remove functions being called at the same time is very slim. Chances are, though, something like the following will happen eventually:

Let's assume that count initially has the value 10.
 
'add' process 'remove' process value of 'count'
temp1 = count 10
temp1++ 10
PROCESS SWITCH!
temp2 = count 10
temp2 -- 10
count = temp2 9
PROCESS SWITCH!
count = temp1 11

So 'count' ends up with a value of 11 when it should have remained at 10 (start with 10, add one, remove one, you still have 10).

This is called a 'race condition' and is a potential problem with any concurrent processes which share resources in this way. It is especially a problem in operating systems, since the OS is inherently multiprocessing, and must always preserve the integrity of its data structures (or else the whole system is likely to fall over).

A region of code where a race condition may occur is called a 'critical region'. Generally, one avoids race conditions by enforcing mutual exclusion of critical regions.


last updated 13 February 1997