Beginning on October 24th at 02:36:56 PDT during a scheduled maintenance window, our circuit to Charter (Spectrum) AS20115 entered loss of service state and was not restored until 57 hours later. Roller Network is disappointed with the delayed response from Charter (Spectrum) to an issue that was created by their own maintenance activities, and the failure of the maintenance group to ensure circuits they removed from service are restored following such activities.
- At 10:02 we called to notify Charter (Spectrum) that our circuit never recovered following maintenance, in which Charter (Spectrum) intentionally placed the circuit into a loss of service state. We were informed that Charter (Spectrum) will not individually troubleshoot our issue because there was a possible related outage and referenced ticket 50233491.
- At 14:45 Oct. 24 we again called asking for an ETA on handling out outage. No ETA was given, however we insisted that a ticket was opened and linked to ticket 50233491 (ticket 50234369). We were assured that someone would follow up with us (this did not happen).
- The following day at 09:26 on Oct. 25 we called to inquire 1) why our circuit was still down and 2) why nobody had followed up with us. We were informed that no followup was made because their notes said the circuit was restored the previous night around 9pm. However, it was not actually restored, and we suggested that ideally someone should have contacted us since we had an open ticket and asked us if it was indeed restored.
- At 12:16 Oct. 25 we called again to inquire on the status. We were informed that according to the notes nobody had looked at it yet since our last call at 09:26. No information as to why not. At this point the circuit has been down for 33 hours.
- At 12:48 Oct. 25, a full 34 hours after first loss of service, we finally received a callback asking us to verify site power (which of course we have power) before they can send a tech out.
- At 13:34 Oct. 25 we received a call from a tech indicating they were en route.
- At around 15:10 Oct. 25 the tech came to the conclusion that the reason for our ongoing outage was that our circuit was migrated to a new core router, however any and all related configuration was discarded with the migration, specifically all of our BGP configuration for both IPv4 and IPv6, which without BGP the circuit is useless.
- At 15:21 Oct. 25 we placed BGP neighbors into “shutdown” state because if Charter (Spectrum) maintenance or whoever is responsible for such work deleted our configurations, they would have to recreate it and we would require an audit on their new configuration before we can restore BGP in a controlled manner since the circuit can no longer be trusted.
- At 21:20 Oct. 25 we stopped actively requesting updates while waiting for Charter (Spectrum) to pass the request to whatever group handled new configurations since Charter (Spectrum) maintenance failed to include migration of existing configurations in their process. However, a decision was made overnight by Charter (Spectrum) to un-migrate our circuit back to its original router and original configuration rather than attempt to migrate our configurations to the new router that maintenance performed to cause the loss of service condition.
- At 11:51 on Oct. 25 we were finally able to obtain a confirmation in writing from Charter (Spectrum) that our circuit was un-migrated and the original configurations that we had last audited with Charter (Spectrum) on August 3rd, and we returned our BGP neighbors to active state. The total outage duration was 2 days, 9 hours (57 hours) from first loss of service to final confirmation that the circuit was restored to its pre-migration condition.
Service was ultimately restored after 57 hours, however internally Charter (Spectrum) does not recognize this since it overlapped two maintenance windows. Since the circuit was physically restored to a location that it was intentionally moved away from, we fully expect Charter (Spectrum) to make a second attempt at a maintenance window for another migration. Whether or not Charter (Spectrum) will be able to perform this task correctly remains to be seen.
Roller Network disagrees with Charter (Spectrum)’s position that “maintenance” is not responsible for failing to return a circuit to service, and we further assert that whether or not an outage is planned – in this case clearly poorly planned – performing maintenance is still an outage. The sole difference is contractual as to what refunds may be owed or whether or not such could be considered as default of contract. Our circuit went into loss of service state directly due to “maintenance” and was not returned to service, thus “maintenance” is the root cause. From a customer service perspective the ethical course of action would be to cancel any future maintenance and revert all changes performed for failing to complete such within its designated window, rare or not (it was argued that doing so is unnecessary because maintenance failing to successfully complete a task is a “rare” occurrence). Roller Network does not believe it is a customer’s responsibility to make sure “maintenance” performs their job(s) correctly.
Editorial Note: This incident highlights why working with a small business like Roller Network is better than a large company. At no time did our account manager (who was CC’d on all correspondence) offer to step in to help or escalate our case, nor did they follow up to see if our issue was being handled properly. Charter (Spectrum)’s maintenance group, the group one would expect to know exactly what they did to break our circuit, disregarded our issue as a problem for another group since it ceases to be their problem past 6AM even if they fail to restore it working condition by that time. At Roller Network, we do not pass blame between departments, and we always strive make sure our customer’s are in working order – it’s literally our job. Our business with Charter (Spectrum) was treated as unimportant and ultimately irrelevant.