Mail Cluster Service Interuption

June 22, 2009 – 10:22 am by Kelsey

One of the four heads in our redundant NFS filer clusters that handle all of our email storage crashed this morning due FCAL bus instabilities after a disk failure.  It’s partner attempted to take over it’s operation but was unable to due to the specific nature of the failure. This is one of the few edge cases where all of the redundancy built into the system isn’t able to help as the only way to reestablish service is to powerfail all of the disk shelves to completely reset the FCAL buses.  No email or data was lost and the systems otherwise performed as expected.  During the 15 minutes while the filer was down approximately one quarter of our users would have been unabled to check their email.  At this time all services have been restored.

Update Mon Jun 22 10:41:48 PDT: IMAP users who’s message stores were on the affected filer may have continued to be unable to check their mail until a few minutes ago due do clock skew between the filer and the servers.

Share this post:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis

Post a Comment

Please Note: Blog comments are not intended as a direct support contact. If you are having a technical problem, or something equally time-sensitive, please contact Sonic.net Support by sending an email to support@sonic.net or calling 707-547-3400 (phones open 6am-11pm M-F, 8am-10pm weekends).