« Hiring : Office Admin | Main | LTA Experience / New Blog »

11 April 2008

Server Crashes : Life Sux

One of our key webservers in the United States suffered a hard disk failure on Wednesday. Our hosting provider did a horrendous job of communicating anything to us (which leaves us in a difficult position when it comes to our own clients).

After 24 hours, we pulled the plug on the old provider and decided to go out, spend a much bigger chunk of money with a new provider and purchase a new high-powered dedicated server in order to get things running again.

That's where we are now at 2:30 p.m. on a Friday...trying to get things running. Some of our biggest clients are offline and life is not fun. We're providing recurring updates to clients every five hours or so, even if there's nothing particularly new to report. I've learned that providing any information, even if it's not necessarily good information, is better than not communicating anything at all. Some might disagree and argue that we're calling attention to our ongoing inability to fix the issue. I believe they're wrong and anyone who works in this industry is aware of how long it can take to resolve certain problems.

Why am I taking the time to write about this in the midst of such a disaster? Well...I'm not one of my developers and it's not my job to actually get these servers back online. It's my job to communicate what we're doing about the issue.

As frustrating as this is for us and most certainly for our clients who are impacted, I still tend to view these times as the best test of our company's ability to resolve and respond to problems. The failed hard disk was not our fault, of course, but that has little bearing on our client's interest in being back online. Our responsibility is to stay focused on the task at hand.

I hate these times so much yet we'll walk away from the issue with more knowledge than we had on Tuesday before this rollercoaster of horror began. That is one positive in a sea of worry.

Comments