One of the servers in an active/standby pair that’s part of our hosted mail service locked up for reasons unknown at this time. The standby didn’t automatically take over so we cut power to the active server to force the standby to promote itself to active. Master ticket ID 2125. We are seeing a successful takeover by the standby at this time and services have been fully restored.
Timeline
[2010-06-09 13:06:04] First soft alert of POP3/IMAP connections refused.
[2010-06-09 13:08:04] Hard outage alert for NOC response.
[2010-06-09 13:13:36] Powered off “active” server in the troubled pair after no login response (SSH and console) and failure of the standby to self-promote.
[2010-06-09 13:14:10] Successful promotion of standby to active. POP3/IMAP connections accepted.
Total outage time for hosted mail boxes on this pair was approximately 8 minutes, 6 seconds.