May 22, 2015 – 9:45 am
One of the three UPSes that handles load in our Santa Rosa datacenter failed early this morning and tripped into bypass. Unfortunately, the internal failure is significant and at least involves the primary IGBTs. We are exploring our repair options but the most likely outcome is that we will be accelerating the planned decommissioning of this UPS and migration of its associated PDU to one of our other two UPSes. This is something that we had planned on completing at some point in the next six to twelve months but have not yet scheduled or scripted. It is a relatively straight forward procedure but must be executed with great care to ensure both the safety of our workers and that live load in the datacenter is not dropped. Updates will be posted as needed.
Current status: Our standby generator is currently running to enable the ATS to transfer load without interruption in the event that our primary PG&E power feed drops.
Update: Friday 14:00, we have electricians on site placing the cable to move the PDU from the failed UPS to one of our other UPSes. We plan to complete the migration as soon as the cable is staged and ready to go. Once the cable is placed, the new target UPS will be placed into maintenance bypass. This allows us to transition the PDU from the old bypassed UPS to the new UPS without dropping its load. Once the cable is terminated, the breaker on the target UPS is closed, the old breaker can be opened completing the transition. At this point, the target UPS will be restarted.
Update: Friday 15:05, we’re beginning the bypass procedure now.
Update: Friday 15:15, unfortunately, load the PDU was dropped momentarily but we are continuing to complete the migration. Power was lost to several of our single PSU systems but most affected services have already been restored. More information forthcoming.
-Kelsey and Russ