Mail Service Interuption.

September 19, 2008 – 1:20 am by Augie Schwer

Early this morning around midnight one of our clustered NetApp filers suffered a critical failure which caused one of its Ethernet Interfaces to lock up.  Some customers may have noticed timeouts or other errors when while trying to check their Mail. The total downtime for the service was around 15 minutes. We will be investigating the problem further with our vendor in the waking hours.  –Augie and Don.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis

Intermittent mail delivery problem from Customer Web Cluster.

September 17, 2008 – 12:35 pm by Augie Schwer

Today we fixed a problem on the Customer Web Cluster that would have effected mail flow from Customer scripts on our Web Cluster that were using the local Sendmail program to send e-mail. The problem began on Sunday and would have appeared as an intermittent failure to Customer scripts as only two of the servers in our Web Cluster displayed the problem.

We will be crafting more detailed monitoring so that we can resolve this problem much faster in the future.

–Augie and William.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis

VPN Concentrator Certificate Expiry

September 15, 2008 – 4:58 pm by jared

Today the identity certificate for our VPN concentrator expired. This would have prevented new VPN sessions from establishing. We have renewed the certificate, and the VPN concentrator is accepting connections normally now.

-Jared and Nathan

Share and Enjoy:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis

Upstream Routing Issue

September 12, 2008 – 9:56 am by tdo

Starting this morning, we observed a transient problem with one of our upstream providers. Impact would have been inability to connect to certain websites. We have taken action to correct this issue. -Nathan, Tim and the NOC.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis

SpamAssassin Outage

September 11, 2008 – 9:59 pm by williamt

SpamAssasin may have stopped filtering mail for some users this afternoon. No mail was lost during this period. Service has restored to normal at this time and mail is being filtered for all users again. We apologize for any inconvenience this may have caused. -William

Share and Enjoy:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis

Los Angeles DHCP Failure

September 11, 2008 – 8:36 am by williamt

This morning one of our DHCP servers in our Los Angeles PoP suffered a disk failure that prevented it from storing leases that customers obtained. This should not have affected customers connectivity to the Internet. We have swapped over to spare hardware that we had on site to resolve the issue. A small minority of customers may have experienced a brief hiccup while we made the transition while most customers should not have even noticed the transition. -William and Jared

Share and Enjoy:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis

Sonic’s New Member Tools are now Member Tools!

September 10, 2008 – 6:33 pm by Dianne

Today we finished the process of switching our Member Tools system from its old home on http://sonic.sonic.net to https://members.sonic.net/. A few tools still remain on the old server but  https://members.sonic.net/ is now the URL from which you will access all tools.

Moving to our new server will allow us to add new tools more easily and scale out our systems as needed. We’ve also tried to make our tools more reliable and easier-to-use as part of  the move.

If you’ve never tried our tools before, please do check out https://members.sonic.net/. You may be surprised to find how easily they enable you to customize your account to your specific needs.

–Dianne

Share and Enjoy:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis

Mail Cluster Maintenance

September 10, 2008 – 5:31 pm by Kelsey

Tonight, shortly after midnight, we will replace the failing disk shelf in one of our Network Appliance filers responsible for the POP, IMAP and Webmail service interruptions earlier today. Replacing the shelf is not expected to take more than 30 minutes. During the maintenance users may not be able to check their email. However, new mail will be queued for delivery on our MX cluster and all outbound email will continue to flow unaffected. -Kelsey

Uptdate - The faulty shelf has been replaced and all services have been fully restored.  Total downtime for POP, IMAP and Webmail was less than 20 minutes.  -Kelsey

Share and Enjoy:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis

Multiple Service Interruption

September 10, 2008 – 12:56 pm by Don Forbes

Early this afternoon, we experienced a failure in one of our Netapps which caused some content to be unavailable, and a few key systems to present timeout errors. Downtime is estimated to be between 5 and 7 minutes.  We have identified the problem and restored all services. –William, Augie, and Don

Share and Enjoy:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis

LATA1 AT&T ATM Outage

September 4, 2008 – 5:19 pm by jared

At 5:03 PM today, all of our DSL and Business-T subscribers in LATA1 went offline. The problem appeared to be internal to AT&T’s ATM network. As of 5:12 PM, the problem in AT&T’s network appears to have been resolved, and DSL and Business-T customers are back online. We are continuing to monitor the situation and work with AT&T to find out what happened in their network. More details will be forthcoming as we get them.

-Jared, and everyone in the NOC and Support

Update: Word from AT&T and ASI (AT&T’s ATM division) is that they suffered a major fiber issue in one of their major hub Central Offices in San Francisco. Traffic has been routed around the problem, and they are currently working on resolving the fiber issue. There is a small chance that traffic may be interrupted as AT&T works on their issue, but they will be doing everything they can to prevent that. We will be monitoring our ATM links very closely for the next 24 hours to mitigate any potential problems.

Update: AT&T has determined that the outage was caused by a failing ATM fabric card. Service was restored when traffic was moved to the spare fabric card. The failed card has been replaced, with no further service interruption.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Pownce
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis