Mass service outage - CityFibre ELITE leased line and broadband services - 14/10/20
Postmortem

CityFibre have now provided us with a reason for outage.

Cause

Human error. A CityFibre network engineer failed to follow standard process when commission a new service, and caused a misconfiguration resulting in the widespread disruption to all layer two provisioned CityFibre services nationwide.

This affected all Giganet CityFibre services, including our ELITE (leased lines) as well as FTTP broadband services.

Rectification

The misconfiguration was rolled back by CityFibre network engineers as soon as they realised that the human error was the cause for this. The delay in 30 minutes to rectify the situation was due to their investigations on where the problem was caused.

Key Learnings

CityFibre will be conducting a full investigation to ensure this cannot happen again.

Outage duration

12:27 - 13:04 on Wednesday 14th October 2020

Giganet’s feedback

Human error happens, even with automation systems that many of us operate, there’s often a human touch somewhere (even if it’s designing the automation systems). It was disappointing that this incident occurred and took as long as it did to restore full service.

We’ll be following with CityFibre to ensure that they have improved monitoring so that in the event of future misconfigurations, these are rolled back sooner.

Our monitoring systems highlighted the problem quickly, and after a few minutes of our own internal troubleshooting, we escalated and raised the fault to CityFibre. We also raised the Status Page incident very soon after the incident as can be seen from the timeline of events. Within 5 minutes.

We are also going to be going into more detail with their configuration routines to understand how ‘routine’ their configuration change was on the 14th, and if it was routine, why it caused all services to go offline. Naturally, for ‘business as usual’ provisioning tasks, these are usually extremely low-risk and only impact a single circuit at the time.

Customers who had a managed automated failover service add-on from Giganet would have been unaffected during this incident (aside from up to 180 second BGP failover timers).

We apologise for the disruption that this outage caused.

Posted Oct 19, 2020 - 11:22 BST

Resolved
No further updates have been detected and CityFibre have confirmed that their side is stable.

Their initial feedback is that a config error was to blame.

We shall await the full reason for outage (RFO) and will endeavour to update this incident post once this is received.

We apologies for any inconvenience this outage caused.
Posted Oct 14, 2020 - 14:44 BST
Update
CityFibre's NOC declared an incident affecting their CityFibre on-net circuits as of 13:09.

We await further updates.

But as this time, all our customer CityFibre circuits have been restored as of 13.04.

However we continue to advise caution until the all clear has been notified by CityFibre NOC.

Therefore all CityFibre circuits will continue to be 'at-risk'

Next update from 14.00, or earlier if there is a major issue to report.
Posted Oct 14, 2020 - 13:17 BST
Monitoring
We have just a moment ago seen all CityFibre circuits restore as of 13:04.

We have not received any further updates from CityFibre's NOC as to the status of this incident, so please consider the circuits still 'at-risk'.

Further updates shall be provided as we learn more.
Posted Oct 14, 2020 - 13:06 BST
Update
We continue to see all CityFibre on-net circuits down across Ethernet and FTTP broadband connections.

We have raised this to CityFibre's NOC as of 12:40.

We are waiting an update from their NOC as to the status of this incident.

Our NNIs (interconnect from CityFibre's network to ours), is up and running and there are no other known issues on our network.

We will continue to update this status page as soon as we learn more.

We're sorry for the outage experienced.
Posted Oct 14, 2020 - 12:56 BST
Investigating
We are currently aware of a mass service outage (MSO) affecting all of our CityFibre ELITE (leased line) and UltraBEAM FTTP broadband services.

Start Time: 12:27

Carrier Affected: CityFibre

Areas Affected (if known): UK

Service Impact: Total loss of service.

Further updates will be provided as we learn more.

We apologise for any inconvenience this outage causes.
Posted Oct 14, 2020 - 12:33 BST
This incident affected: M12 Giganet - Internet Services (Carrier - ELITE/IGNITE (Leased line)).