Thomas (boggyb) wrote,
Thomas
boggyb

  • Music:

Race conditions

So, today I managed to find a really obscure race condition that can only happen under rare circumstances (which occur maybe once every few years for a typical customer), is extremely unlikely to be spotted by a customer if it does happen, and vanishes within a few days of happening leaving no trace. Unlike most obscure bugs, this one is actually tricky to fix.

For those who have no idea what I'm talking about, a race condition is when you have two bits of code that both want to do the same thing. The classic example is a piece of code that wants to add 1 to a number. Let's say the code looks something like this:

Read from memory location 367 into register A
Add 1 to register A
Write register A into memory location 367

Simple enough? Now let's say you've got two of these bits of code, and they're running at the same time. Now, on a classic computer that's not quite true - what happens is your computer is continually switching between little bits of each program you've got running. Each program gets its own set of registers, so our threads each have their own register A that nothing else can touch. Anyway, the two threads could run like this (I'll put one thread in italics):

Read from memory location 367 into register A
Add 1 to register A
Write register A into memory location 367
Read from memory location 367 into register A
Add 1 to register A
Write register A into memory location 367


This works with no problems, and A gets increased by two. Or they could run like this:

Read from memory location 367 into register A
Add 1 to register A
Read from memory location 367 into register A
Add 1 to register A

Write register A into memory location 367
Write register A into memory location 367

And location 367 only gets increased by one, which usually horribly confuses some piece of code anywhere from seconds to weeks later. That's a classic example of a concurrency problem. A race condition is usually a bit more than this - what would happen if one piece of code wanted to add 1, while the other wanted to subtract 1? You could get this:

Read from memory location 367 into register A
Add 1 to register A
Read from memory location 367 into register A
Subtract 1 from register A

Write register A into memory location 367
Write register A into memory location 367

Or you could get this:

Read from memory location 367 into register A
Add 1 to register A
Read from memory location 367 into register A
Subtract 1 from register A
Write register A into memory location 367

Write register A into memory location 367

Both of these get it wrong, and cause problems. Programs tend to want to do this a lot, to keep track of how many things are using a particular widget so the code knows when it can get rid of the widget. In the first case, your widget counter is too small, so the program will get rid of the widget while something else is still using it. In the second case, the widget counter is too large, and so the widget will never be deleted because it always looks like something's using it. This is typical race condition - it's a race to see who can get their bit of code to finish first.

The bug I found was more complicated than this, but the concept is the same. At least it wasn't caused by my changes :)
Subscribe
  • Post a new comment

    Error

    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 2 comments