Monday, June 02, 2008

"Catastrophic Power Outage in Houston" Means SteakBurger Website Down

I noticed on Saturday that we weren't getting any SteakBurger emails and then Sunday night I got a call from my brother informing me that the website was down. I tried but could not even get to the hosting company's website. I tracked down the Admin's gmail address and sent him an email asking for a status update. I got this reply last night at 2AM.


Hello Andrew,

There was a major power outage in one of the data centers where we maintain many of our servers in this case the H1 data center in houston. Although we own our servers and have our own datacenter staff we lease access in the datacenter to supply our access poin to the internet backbones (level 3, savis etc) and to provide us with power for our server racks.

The Datacenter has N+1 redundancy and does have UPS battery backup and Backup generator capability but a completely unusual occurance happened where the main Stepdown power transformer shorted resulting in an explostion and small fire in the electrical interface room. Normally when Power goes out the network swiching swiches over to the UPS which is a short term 2 hour battery system while the backup generators go online and are cut into the master power buss.

The problem was that the explosion destroyed the Power panels where the generators connect to the main power busses. This outage effected not only Lowesthostings servers but 40 other major hosts and over 9000 servers so it was a major unexpected outage.

The good news is the new Transformer has finally been aquired and installed and the switches and routers and hvac have been restored and power is now being restored in the data center. We are told it will a systematic powerup and various switches & routers brought back online and tested before power is sequenced to our server racks so it should not be more than another 2-3 hours from now (12:10am pst) before your server comes back online.

Once power is back, your server and all data will come online as before. Anyones mail server that has attempted to send you emails during the outage window will automatically attempt redelivery so you may get a few 18 hour old emails but none will be lost.

Our main Lowesthosting.com website does have an emergency backup server in our other Datacenter but it takes 24 hours for that to propagate so our main customer portal has been down which has frustrated our staff and many cleints for our inability to post news or reply to tickets during the propagation to our emergency server.

We appologize for this window where you were unable to reach us and although we have never had a catastrophic outage in 8 years and pride ourselves on average a 99.84% cleint server uptime we will be installing a new high availability cluster on our main website so should we ever have a major outage our website and news and ticketing system would not go down for even a minute.

We realize the problem is not that the server is down so much as our main site being unreachable and many people didn't print out our phone number.

You should know that we not only backup all of your data in our data center but maintain an emergency archival copy off network so we really do maintain the integrity of your data.

Again we appologize for any inconvineience. Its been a long difficult sunday for our staff as well and we will be posting a full story and pictures on our website shortly.

Regards,

Ray Alin
Assistant Manager
Lowesthosting.com

I did verify for myself that there was such a catastrophe. Here is the AP News Story from www.allheadlinenews.com:


Texas-Based Server Provider Explosion Affects 9,000 Servers, 7,500 Customers
ShareThis

June 1, 2008 12:50 p.m. EST

Amy Beeman - AHN
Houston, TX (AHN) -- An explosion Saturday evening at a Texas-based, privately held server hosting provider has caused server outages effecting 9,000 servers and 7,500 customers.

According to The Planet's website, at about 5 p.m. Saturday electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding their electrical equipment room.

No injuries were reported and no servers were damaged or lost.

The Houston company, which provides servers for small and medium sized companies, said it has its entire support team working around the clock to get the servers back on line.

They estimate they will be up and running by Sunday afternoon.