The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Race Conditions

In the thread about Multi-Threaded applictions there was much discussion of volatile variables but there was also a term new to me 'race conditions'.

Not being a C++ programmer and probably nowhere the same league as most others here could someone please explain the term to me?
Greg Kellerman Send private email
Thursday, October 07, 2004
Vague but friendly answer: it's all about ugly hard-to-find bugs you'll get when things happening in different threads overlap or coincide in weird/undesirable ways that you didn't anticipate or allow for. Eg one thread writing to some shared memory/disk while another thread is in the process of reading from it. Often these things only show up when the system's under a lot of load, as they depend a lot on coincedences of timing they can be sporadic and hard to track down or replicate.

There are methods you can use to avoid this kind of crap of course. Get a good book on concurrent programming, it really is the kind of thing you'd save time by just reading up on than trying to teach yourself by trial and error or from little separate scraps of knowledge gleaned from websites.
Matt Send private email
Thursday, October 07, 2004
It goes back to the Uncertainty Principle in physics...

By measuring the system, you're changing the system.

Now if you're timing car traffic across a bridge, a person standing by the side of the road counting is not going to change much.

If you have an application writing to a log file/database, that may be enough to prevent the threads from conflicting.
KC Send private email
Thursday, October 07, 2004
We say that two concurrently executing threads are "in race condition" if the order in which they are scheduled by the OS affects their final result, for example:

some_global_variable a

  // bunch of stuff done here
  a = 1

 // another bunch of stuff done here
  a = 2;

 // wait for both piece_of_code1 and
 // piece_of_code2 to complete

 print(a) // whats the value of "a" here ?

in this kind-of-pseudo-code example we have the piece_of_code1 and piece_of_code2 methods executing concurrently with each trying to set the value of "a", naturally the one to stick will be the last value assigned and that depends on the order in which the OS will schedule the threads of our program, which is not something that can be relied upon by the programmer.

Thursday, October 07, 2004
Technically a "race condition" or a "race hazard" is where another process just might squeeze in and upset what you're doing.

They turn up in all sorts of critical security and reliability scenarios. Here's an example:

When a machine wants to connect to a network, it sends out a request for a DHCP server. The first response it gets is assumed to be decent. Suppose a machine on the network is malicious. It could try and broadcast DHCP responses ahead of the real DHCP server... which would mean it could transmit false "gateway" settings to the machine, and grab its traffic.

This is a race condition -- the two activites (real DHCP and fake DHCP) are "racing" each other.

The assumptions that cause these sorts of errors are that "short enough" bits of code are safe from being interrupted. The way to avoid them is to lock out interruptions during critical peices of code. Imagine a naive bank system which does the following code:

1: Look at the current bank balance.
2: subtract the withdrawl amount from the current balance
3: store the current balance

This contains a race condition (amongst other flaws) in that if a second transaction starts between lines 1 and 3 of this transaction, it will read a balance which is about to be over-written and do its work on that incorrect data.

These errors can be VERY hard to detect. They don't necessarily show up in testing as people have said -- because they rely on critical timing flaws.[1]  They are not always even catastrophic in that they won't necessarily cause a program to crash -- they can be VERY subtle, as in the example above, which has the potential to cause data errors which are themselves hard to detect (would the bank even notice if its turnover was large enough and the withdrawls were small enough?).

Really, the only way to make sure they don't happen is to reason very carefully about the code. Unfortunately this takes a rather distinct sort of personality; mathematicians and logisticians as opposed to the "multiply it by four for safety" kind of engineers.

{I must admit to being the latter... my other half is the former. When we're solving race conditions, I'll lock everything in sight and get a faster CPU to cope whereas my other half will carefully solve the problem properly...}

It's not, by the way, a C++ problem. Ever wondered what happens if you send a new bid to Ebay, but the auction ends AFTER your new bid starts being transmitted to Ebay but before all the new bid information arrives? HTTP transmissions can take a while if servers are heavily loaded. Presumably something has decided that your bid is valid, but what happens if not everything at Ebay does???

Could you be the winning bidder, but at your losing bid price?

Ebay have probably solved the problem, but I promise you a lot of people haven't.

[1] This is one reason why XP's reliance on testing to detect errors introduced into the code is not necessarily the panacea that it's made out to be.
Katie Lucas
Thursday, October 07, 2004
If you lock everything you'll just end up with a bunch of deadlocks rather than race conditions. At least the deadlocks will be easier to find and debug.

As for testing, race conditions do show up during testing, it just depends how thorough the tests are.

    Flava Flav!
Flava Flav
Thursday, October 07, 2004
Here at Phase 2 Solutions our whole application is a race condition! That’s right rather than take the time to learn how to use synchronization we simply toss in some thread sleeps when there seems to be a problem. So if we need to wait for another app to startup... thread sleep. If it looks like there’s a race condition... make one of the threads sleep.

Yes, no programming problem is too great that it can’t be solved by adding some sleeps to the code.
Dennis Kozura
Thursday, October 07, 2004
Ha!  I've also seen the "fix it with a thread sleep" solution to any and all synchonicity problems.  This was a server application that was deployed and maintained primarily by one developer for many years, during which the hardware got orders of magnitude faster.  It did seem odd that the application never performed any faster than "just barely tolerable."

It wasn't until that developer left and another picked up maintenance that we discovered the thread.sleep()'s all over everything.  Ten minutes with a profiler indicated that, sure enough, the app was slow because it spent most of it's time waiting, by design.

I submit that this is the alternative to Katie Lucas's "lock everything" approach for those who don't have the masochistic personality to truly solve the problem! :)
Ian Olsen Send private email
Thursday, October 07, 2004
Draw UML sequence diagrams which illustrate each lock and each thread which accesses the lock. Its the only way to make sure your design is correct.
hoser Send private email
Thursday, October 07, 2004

I am not sure if the above answers are clear enough for someone coming from a non-specialist background, so I will try to throw in my 2 cents.

What are Race Conditions? These are problems that relate (typically) to multi-threaded programming. In a single-threaded computer program, each line of code can be seen to execute sequentially, in the same way as the code would be read by fellow programmers. However, in a multi-threaded program, the code can effectively spawn off different paths (threads) of execution, with each path doing completely independent tasks. Your programmer can no longer read the code sequentially. He must now consider the fact that two or more sequences of execution are running on the computer at the same time.

So how is multi-threading related to race conditions? Race Conditions occurs when shared data must be processed. Consider writing a program that analyses a file line-by-line, which your boss says must be fast and provide user feedback. To make it fast you realise that you can spawn multiple threads, each analysing a different line. To provide user feedback you decide to update a counter to the screen that states the total number of lines processed. Each thread manages the counter by incrementing a shared variable when a line is successfully processed and printing the value to the screen. Now imagine that ThreadA reads the value of the counter and increments it, but slows down before it can print the value "2" to the screen. In the meantime, ThreadB, C and D have managed to increment the counter to "5", such that when ThreadA regains control and prints its value to the screen, the value would be completely wrong, being "2" instead of "5".

What I provided was the simplest of Race Condition examples, where one thread used an incorrect value because another thread was quicker. Real life problems are rarely this trivial. Which leads to the next topic of Thread Synchronization....

Simon Hayden
Get software updates to your clients the Quick and Easy way
Thursday, October 07, 2004
Thanks everybody, I see it's kind of like a 'Dirty Read' from a database due to asycronous reads and writes.  Somehow I thought it might be something that caused the code to speed up to oblivion and stop the CPU in it's tracks.  Again thanks for the clarification.
Greg Kellerman Send private email
Thursday, October 07, 2004
Over on IBM developerWorks, David Wheeler has written <a href="">an article</a> about race conditions.
Andreas Sikkema Send private email
Sunday, October 10, 2004

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz