Sincere apologies for the disruption to service on the morning of 15/08/2014. The disruption occurred because of a loss of internet connectivity for Netsight, the hosting provider who operate the datacentre where your services are hosted. The internet connectivity for Netsight is provided by Level 3, a large and sophisticated internet services provider who market their services based on reliability
A loss of internet connectivity of this kind is an unusual, significant and unacceptable problem. All of Netsight's customers were affected, suffering business continuity issues. It is likely that other Level 3 customers were also affected. Netsight will obviously be asking Level 3 for a full explanation of the outage, and taking whatever steps they judge necessary.
Cause of the isssue
Our understanding is that the following occurred:
- around 10.30 AM UK time 15th August 2014, a Level 3 networking switch lost power, causing the failure of internet connectivity for all customers connected using that switch.
- Level 3 attempted to activate the backup switch which is on standby to minimise the effect of such failures.
- Level 3 engineers were unable to access the secure location where the switches are located, possibly due to lost or misplaced keys.
- Level 3 were unable to provide an ETA for restoring service during the outage.
- Level 3 engineers gained access to the secure location and restored service around 1.05 PM 15th Aug.
Failures in network equipment such as switches do inevitably take place. However this should usually cause minimal disruption as backup systems take over. Level 3 clearly experienced problems activating their backup systems, causing unacceptable disruption.
Our next steps
We will be carrying out our own retrospective into this downtime to see if we can identify steps we can take to mitigate the likelihood of this happening in future. Like you, we want to make sure this doesn’t happen again, and we will keep you updated with any outcomes or changes we may make following this.