The lights are on, but no one is home…

Server rooms are often reminiscent of a carnival. Lots of flashing lights in an assortment of pretty colors. Oh, and lots of scruffy guys with greying beards mumbling under their breath while they shuffle about looking important, but I digress…

While this can be quite the sight, not all of these flashing lights bring about joy and happiness. Take this one for example:

If the IT guys (or gals) aren't running to fix this, Fire them!

That pretty looking orange light is an indication that the drive has failed. I often take it for granted that everyone would realize that the single orange light among a sea of green might just indicate a problem, but we at FYIG run into this all too often. One drive fails, then another, then another. The IT guys involved either do not realize that this means badness, do not care, or (most recently) attempt to ignore the issue altogether. In one recent incident, pointing out the drive failure resulted in a frenzied attempt to change the subject and pretend like nothing happened.

Why is this bad though? This is why I have RAID right? Not exactly. While RAID does protect you from immediate badness in the case of a failed hard drive, it is intended only as a temporary solution. There are a number of reasons for this, but the most pervasive is the simple fact that hard drives fail in groups. The vast majority of servers (and even many high end storage arrays) utilize drives that were manufactured at nearly the same time (in some cases, we have even observed sequential serial numbers). Essentially this means that any small defects or variations in the drive’s construction will affect all of the drives in a particular device at nearly the same time. In practice, this means that a dead drive is the surest indicator that another drive is going to fail in short order.

Couple this with the proliferation of RAID 5 in recent years, and the situation becomes even worse. Not only does a RAID 5 array suffer a severe performance hit while operating with a failed drive; but it also operates at considerably higher load, increasing the odds further of a subsequent failure.

Side note: Don’t use RAID5. Details in another post.

Ultimately, hard drives are cheap. Really cheap. Even if you’re paying a SAN vendor’s drive tax, they are *still* really cheap, compared to the cost of a rebuild from backup.

So when you see the flashy orange drive light, please, shell out the $300 and replace the darn thing.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s