As noted in the MOTD last night, we had a brief cooling failure in our Santa Rosa datacenter. This turned out fine. We had staff on site, and we have learned some things that will prevent this particular failure in the future.
For those interested in the technical reason for the failure, during the multiple power transitions from utility to generator and back, the variable frequency drives (VFDs) on the four redundant air handlers sensed an over-voltage condition and shut down to protect themselves. To address this, they have now been re-configured; if they have a failure now, they will wait eighty seconds for power to stabilize and re-start automatically.
The interesting thing though was that this presented an opportunity to see what really happens in a large datacenter without AC for a brief period of time. Total cooling downtime was 15 30 minutes, and during that time, the temperature rose 15 degrees. The room is typically kept at 69 degrees fahrenheit, so this pushed the ambient room temperature to about 85.
Meanwhile, in-cabinet temperatures for cabinets with a lot of equipment in them nearly touched 100 degrees F. That’s just ten to twenty degrees prior to when we expect equipment to begin failing, so this was a close call for us.
Datacenters are challenging environments to design. You need fully physically redundant Internet connections, plus fire suppression, physical and electronic security, power backup and redundant cooling. We’re very pleased with the efficiency of our new AC system and it’s VFDs, and it’s clear how critical it is from this incident.
- Row E hot
- Row E normal
- Equipment temperature graph