Log in

No account? Create an account
'Twas brillig, and the slithy toves did gyre and gimble in the wabe [entries|archive|friends|userinfo]

[ website | Beware the Jabberwock... ]
[ deviantArt | the-boggyb ]
[ FanFiction | Torkell ]
[ Tumblr | torkellr ]

[Random links| BBC news | Vulture Central | Slashdot | Dangerous Prototypes | LWN | Raspberry Pi]
[Fellow blogs| a Half Empty Glass | the Broken Cube | The Music Jungle | Please remove your feet | A letter from home]
[Other haunts| Un4seen Developments | Jazz 2 Online | EmuTalk.net | Feng's shui]

[Thursday 28th November 2013 at 11:12 pm]

[Tags|, ]
[Playing |The Heist ~ Hans Zimmer/Mission: Impossible 2]

Since I'm sure talismancer will post even more snark if I just leave it at the two previous posts for today, have a not-even-slightly-epic snippet of computer fail.

There's an unexpected downside to reducing the amount of servers you need by consolidating several under-used ones into a bunch of virtual machines on one real server: when (not if) the server fails you lose all the virtual machines.

In this case it was a disk failure that took out the server. Now, disk failures had been planned for by creating a RAID-5 array, which uses MATHS to ensure that you can lose a disk in the array without losing any data.

So Murphy ensured that on the server in question two disks failed a few hours apart, at which point the RAID controller threw its hands up in the air and gave up. IT then had to spend several hours rebuilding the various VMs from backups.

I wonder how common double disk failures actually are? I do recall reading something about this a few years ago - someone had a look at the specified uncorrectable error rate (how likely the drive is to be unable to read a sector). They worked out that with high-capacity hard disks (on the order of 1TB) there's a scarily high chance of a RAID-5 rebuild failing due to an uncorrectable error on one of the remaining disks.
Link | Previous Entry | Share | Next Entry[ One penny | Penny for your thoughts? ]

[User Picture]From: jecook
Friday 29th November 2013 at 2:29 pm (UTC)
Yep. We had that happen to our backup server's storage array one fine morning- we had a disk failure, and as the array was rebuilding to the hot spare, a second drive failed.

This is the most common failing of RAID 5, by the way. It's why most disk array vendors recommend and/or default to RAID 6, aka RAID-DP, aka 'RAID 5 with an extra parity disk to withstand a double failure'.
(Reply) (Thread)