Categories
Announcements Status

Outage Notice Nov. 22, 2009

There was a power outage today at 11:57 UTC-8 caused by this:

DCP_2642

We’re actually about a mile away from it, but as you can see it hit the end of the line, popped the top middle phase, and tripped the whole line we happen to be on. Everything was fine until 14:33 UTC-8 when our new Eaton 9355 installation decided to cut power to the protected load. The alarms it gave were simultaneous “Input AC Over Voltage”, “Utility Fail” followed by “DC Link Over Voltage”. We’re guessing the generator momentarily went outside of tolerance. We’re not clear why that would cause the UPS to shut off (isn’t this the whole point of the UPS?), but it refused to turn back on and kept giving “DC link over voltage”. Because we were offline at this point, we activated the external bypass on the paralleling cabinet to tie the generator directly to the server room distribution and get things back online ASAP. We called the Eaton support hotline and the tech advised us to reset the logic by cycling all the breakers open then closed, however the unit still failed to start citing the “DC link over voltage” alarm. We do have a 24 hour service contract for the unit, so someone is on the way to take a look at it.

[Nov. 23 00:49] Eaton took a look at the unit and was unable to duplicate the problem. The only difference is that it was supplied by utility power and not by generator as we do not wish to experiment without a dummy load and risk an outage. At this time the bypasses have been removed, the generator stopped, and the UPS is back online supplying power to the protected load. We will also have the generator looked at as we suspect the most likely cause of the trouble could be the voltage regulator losing lock or the electronic governor. The units we have are transformerless and the “DC link over voltage” error can be caused by a problem with the neutral reference point going out of bounds.

[Nov. 23 13:00] Our diesel mechanic will be replacing all of the injectors on our generator. One of them had a blown port due to water.

[Nov. 24 10:28] According to the news, the driver of the van dropped something. (Source: KOLO) West Huffaker Lane is 5 lanes wide, four travel and one center turn lane.


View Outage Notice Nov. 22, 2009 in a larger map

Categories
Status

Move (mostly) Complete

We successfully moved the entire facility over the weekend with a few minor hiccups with hosted mail service that didn’t show up during testing. In any case, everything is back to normal as of Monday morning. There’s still a lot of detail work to do and everything else (like office furniture) still needs to make its way over to the new place, but at this point there is nothing left to move that could impact our normal operations.

There, however, is one negative aspect to report: because Verizon is still unable to provide service to us we are running single-homed on our Sprint service. This should not be a problem as long as Sprint doesn’t have any outages, however, this is not something we’re comfortable with. We’ve tried to make it clear to Verizon that no matter what their policy is doesn’t change the fact that we need the service and someone will provide it.

Categories
Announcements Status

Roller Network Facility Move Notice

The time for us to move our facility has come. We originally planned to move the last week of August in order to open in October, but if you’re been following our story with Verizon you already know why that didn’t work out. The Verizon circuit was to be used to allow a minimal disruption outage by tunneling both sites. However, they are still not following through, and we can’t wait forever.

As such, we will be performing a hot-cut of our Sprint service on Friday, October 16, 2009 at 14:00 US/Pacific time (UTC-8). All traffic will be rerouted over our lower capacity SAVVIS circuit beginning at 13:30 UTC-8. We will begin announcing our address space via Sprint at the new location once at least 50% of the equipment has been moved and stabilized. We have timed the hot-cut to correspond with the beginning of the weekend where our average utilization falls within what the SAVVIS circuit can handle, although the slowdown may still be noticeable. Unfortunately this unavoidable due to the Verizon trouble.

THERE WILL NOT BE A COMPLETE OUTAGE. Groups of equipment are staged so that their backup/secondary remains online while the primary is moved. Once the primary is moved the secondary will follow. You may notice some brief hiccups as traffic is rerouted between locations, but you should not expect any substantial outages. Non-critical services (such as our company website and the account control center) do not have redundant counterparts and will be offline for 15 to 30 minutes while the servers are moved. Critical services (mail services, hosted mail, DNS services) will not be interrupted.

Because the move will affect our network in a substantial way we have moved our status page to an offsite location served by third-party DNS. You can follow the move progress online at:

www.rollernetstatus.com

This new status page will be permanently replacing the old status page at status.rollernet.us. If you have bookmarked the old status page, please update your bookmarks.

Our goal is to ensure that the move is as painless and transparent as possible. If you have any questions or concerns please feel free to contact us. Further updates will be posted on the newspipe (www.rollernet.us/wordpress) and status page (rollernetstatus.com) as we progress.

Categories
Status

Slashdotted!

We were slashdotted today after our woeful story of Verizon made the front page. A basic WordPress install is very bad on server performance, and the fact that our public facing pages are all static means our web server wasn’t quite up to the task. But after some tuning and static caching, we’re in better shape.

Technology: Verizon Refuses To Provide Complete IPv6

Slashdot is welcome here; it’s a rare honor to be taken down by a front page link. 😉

Categories
Status

Redundant Switch Fault

On Saturday evening (September 3, 2009) we experienced a short general outage caused by our DNS and SQL cache pool servers rebooting. Normally this wouldn’t be a problem, but the PowerDNS Recursor package didn’t start at boot time, and a lack of DNS meant a general lack of anything useful taking place.

It was odd that some equipment rebooted itself while others didn’t, so after a lot of thinking, we decided that one of the APC 30A redundant switches we use was probably faulted because this isn’t the first time we’ve seen this. We pulled the switch from service (causing another reboot – although if we were right we didn’t want it risking the system anymore) and opened the cover. Inside we found relay contacts with pitting and arching:

DCP_2588 DCP_2590 DCP_2591

The other relays we opened up weren’t nearly as bad, but they still exhibited discoloration and pitting on the contacts. The one in the pictures was loaded between 10 and 15 amps and it’s supposed to be rated for 30 (or 24 derated). Because this is the second failure we’ve had with this device we’ve decided to remove the remaining ones from service as they are likely to suffer the same fate in the future.

We apologize for the recent bumps in the normally smooth operation you’ve come to expect from us. We understand that your mail and DNS service is important to you, and to us, since we use the same services for our mail. As such, a discount/credit will be forthcoming.