?

Log in

No account? Create an account
Race conditions - 'Twas brillig, and the slithy toves did gyre and gimble in the wabe [entries|archive|friends|userinfo]
Thomas

[ website | Beware the Jabberwock... ]
[ deviantArt | the-boggyb ]
[ FanFiction | Torkell ]
[ Tumblr | torkellr ]

Links
[Random links| BBC news | Vulture Central | Slashdot | Dangerous Prototypes | LWN | Raspberry Pi]
[Fellow blogs| a Half Empty Glass | the Broken Cube | The Music Jungle | Please remove your feet | A letter from home]
[Other haunts| Un4seen Developments | Jazz 2 Online | EmuTalk.net | Feng's shui]

Race conditions [Monday 24th November 2008 at 11:03 pm]
Thomas

boggyb
[Playing |Fast Track [Fast Track.mo3]]

So, today I managed to find a really obscure race condition that can only happen under rare circumstances (which occur maybe once every few years for a typical customer), is extremely unlikely to be spotted by a customer if it does happen, and vanishes within a few days of happening leaving no trace. Unlike most obscure bugs, this one is actually tricky to fix.

For those who have no idea what I'm talking about, a race condition is when you have two bits of code that both want to do the same thing. The classic example is a piece of code that wants to add 1 to a number. Let's say the code looks something like this:

Read from memory location 367 into register A
Add 1 to register A
Write register A into memory location 367

Simple enough? Now let's say you've got two of these bits of code, and they're running at the same time. Now, on a classic computer that's not quite true - what happens is your computer is continually switching between little bits of each program you've got running. Each program gets its own set of registers, so our threads each have their own register A that nothing else can touch. Anyway, the two threads could run like this (I'll put one thread in italics):

Read from memory location 367 into register A
Add 1 to register A
Write register A into memory location 367
Read from memory location 367 into register A
Add 1 to register A
Write register A into memory location 367


This works with no problems, and A gets increased by two. Or they could run like this:

Read from memory location 367 into register A
Add 1 to register A
Read from memory location 367 into register A
Add 1 to register A

Write register A into memory location 367
Write register A into memory location 367

And location 367 only gets increased by one, which usually horribly confuses some piece of code anywhere from seconds to weeks later. That's a classic example of a concurrency problem. A race condition is usually a bit more than this - what would happen if one piece of code wanted to add 1, while the other wanted to subtract 1? You could get this:

Read from memory location 367 into register A
Add 1 to register A
Read from memory location 367 into register A
Subtract 1 from register A

Write register A into memory location 367
Write register A into memory location 367

Or you could get this:

Read from memory location 367 into register A
Add 1 to register A
Read from memory location 367 into register A
Subtract 1 from register A
Write register A into memory location 367

Write register A into memory location 367

Both of these get it wrong, and cause problems. Programs tend to want to do this a lot, to keep track of how many things are using a particular widget so the code knows when it can get rid of the widget. In the first case, your widget counter is too small, so the program will get rid of the widget while something else is still using it. In the second case, the widget counter is too large, and so the widget will never be deleted because it always looks like something's using it. This is typical race condition - it's a race to see who can get their bit of code to finish first.

The bug I found was more complicated than this, but the concept is the same. At least it wasn't caused by my changes :)
Link | Previous Entry | Share | Next Entry[ 2 pennies | Penny for your thoughts? ]

Comments:
[User Picture]From: olego
Wednesday 26th November 2008 at 1:44 am (UTC)
And here I was, hoping that after describing race conditions, that you would actually post the bug you found, or what made it more obscure than a simple missing lock... :)
(Reply) (Thread)
[User Picture]From: boggyb
Wednesday 26th November 2008 at 6:00 pm (UTC)
It's in commercial code so I can't really post the actual code. In essence, there's a list of versioned items that one thread keeps track of. That thread populates the list with the existing items when it starts. Another thread uses this list to work whether an item exists (and what version it is), or if it needs to create the item. If it creates the item, it always creates the latest version of it (which has a different name to older versions). The list is fully synchronised, but the thread startup wasn't. This meant that you could get two of the same item but with different versions, which horribly confused code in a completely different module.
(Reply) (Parent) (Thread)