Month: October 2008

Transport Link Upgrade

Tonight at 12:01 AM we will be performing an upgrade on our transport link which connects our San Francisco POP to our San Jose POP. We will be moving to a new physical transport link which will provide us with the ability to upgrade the capacity of this link in the future.

All customer traffic will be routed away from this link prior to the operation, so there should be no customer impact. The operation is expected to take 30 minutes to complete.

-Jared and Nathan

Update: The transport link upgrade has been completed. Due to a tech in San Francisco accidentally moving an incorrect piece of fiber, there was a 5-10 minute interruption of traffic from one of our core routers. We apologize for this interruption. All network links have been restored to productivity and appear to be functioning normally at this time.

Momentary ATM Outage

While troubleshooting our new ATM OC-12 circuit in San Francisco, an AT&T tech inadvertently removed the wrong fibers from a multiplexer card in the SNFCCA17 central office. This caused two of our other ATM OC-12 circuits to go off-line. These links serve DSL, Business-T and Frame Relay customers. The AT&T tech quickly realized his error and fixed the problem. Total customer downtime was approximately 75 seconds, lasting from 1:29:05pm to 1:30:20pm. We’re currently in touch with management at AT&T to ensure that this does not happen again. -Nathan and Jared

Customer SQL v4 system maintenance

Tonight I will be performing maintenance on custsql.sonic.net, our mySQL 4x host. Just after 12:01AM I will be performing a necessary reboot of the system. Estimated customer impact is approximately 10 minutes at most. This does not effect mySQL v5 databases. -Don

Update: Maintenance completed. Customer impact was just under three minutes.

Squirrelmail config database maintenance

This morning at 12:01AM I performed some needed maintenance on our Squirrelmail configuration database. Some customers may have been unable to update their settings in Squirrelmail during this time. Customer impact was approximately 3 minutes. During this time, I also upgraded our Squirrelmail installation to the latest stable production version.  -Don

Sebastopol DSL Outage

Hardware failure on the Sebastopol Central Office has caused DSL customers to lose connectivity. We are working with AT&T to resolve the issue, and hope to have an estimated time of repair shortly.

-Adam, John and Steve

Update: As of 5:00PM, service appears to be restored to all affected customers.

DSL Aggregation Router Reboot

The DSL aggregation router that serves DSL to the Chico area rebooted itself approximately 20 minutes ago, causing about 5 minutes of downtime for all DSL customers in that area. Currently all traffic levels and customer connectivity look normal at this time, and we will continue to monitor the router, as well as investigate the cause of the spontaneous reboot.

We apologize for any inconvenience this outage may have caused.

-Jared

Trouble with the Sonic.net Website.

Fri Oct 10 10:05:08 PDT 2008 — Trouble with the Sonic.net Website. We are currently experiencing trouble with our corp.sonic.net Web Server that serves part of the Sonic.net Home Page and all of our Blogs. We are working on the problem and will have it back soon. -Augie, William, and Kelsey.

Update: We have restored services and determined the problem being a hardware failure. We apologize for the inconvenience this may have caused. -Don and William

Mail Storage Maintenance.

Tonight at Midnight we will be performing maintenance on the NetApp Filers that store Customer E-Mail; there is no expected downtime during this period and will take less than an hour to be completed.

During this period we will be adding additional capacity to these Filers; this capacity will allow for future growth of Customer E-Mail storage and improved performance for all of Sonic.net’s E-Mail Customers.

–Augie, Don, William, Sal, and Kelsey.

Update: maintenance has been completed; no problems were encountered (other than starting later than scheduled); capacity was nearly doubled on our E-Mail Storage System.  –Augie

Webmail IMAP performance problems solved.

Separate from our earlier post about slow imap.sonic.net performance (http://corp.sonic.net/status/2008/09/26/imap-performance-problems-solved/) – we have also received reports of slow Webmail IMAP performance and timeouts when Customers were using the Webmail clients on http://webmail.sonic.net.

We believe we have isolated the problem, which was a bug in our IMAP Proxy software, and have not received any reports of new problems since the beginning of the week when we implemented a fix for the problem software.

If you see timeouts when using http://webmail.sonic.net, please contact Technical Support (support@sonic.net or 1.707.547.3400) immediately, and provide the error message you receive and the time at which the problem occurred.

Webmail Web Site Time-Warp.

A misguided attempt to update some software on our Webmail Cluster inadvertently took the software, associated web pages, and server configuration back to January of this year.

As a result Customers would have seen inconsistent or broken behavior while trying to access the website from around 2:30am to 8:00am, at which point the data was restored from backups.

We applogize for any inconvenience this caused to our Customers; we will be reviewing our documented procedures so that this type of mistake does not occur in the future.

–Augie