Salisbury - primary backhaul circuit failover - degraded performance
Incident Report for Giganet Status Page
Resolved
Traffic engineering has now been removed.
This incident is now being closed.
Further updates and the RFO (reason for outage) shall be provided once this information has been made available from the carrier.

We apologise once again for the duration of this incident.
Posted Nov 22, 2020 - 08:17 GMT
Monitoring
A damaged fibre splice in Epsom joint was repaired overnight and our primary circuit came back online at 02:27.

We observed all traffic automatically switched to use the primary path.

We shall now remove the traffic engineering we applied to a subset of customer circuits that we applied to mitigate the worst of the congestion. A confirmation will be sent on this ticket once this is complete.

Full redundancy has now been restored.

We apologise for the length this outage/degraded performance. We shall be following up with the carrier to understand why this took so long to repair as this is considerably outside their SLA. A full RFO will be published in due course.
Posted Nov 22, 2020 - 07:37 GMT
Update
Our carrier identified a fibre break in Epsom earlier however they couldn’t extend the disruption due to the circuit providing access to ‘blue light’ services.

Therefore the continuation of the investigations and repair shall recommence from 00.01 tonight.
Posted Nov 21, 2020 - 22:24 GMT
Update
Our carrier’s scheduled investigation works overnight did not take place due to them experiencing a separate MSO which resulted in a lack of engineering resource.
Quite why they don’t have sufficient resource to deal with both is unknown at this stage, as they are a large carrier, and this will be followed up with them after this incident.

The carrier have now rescheduled their investigation works for 12.00 on the Horsham<>Bromley fibre span where they are seeing lower light levels.

We’ve been advised that this outage is affecting over a dozen other network operators in addition to Giganet. However due to some services still operating over this span, they are unable to interrupt the span until pre-defined maintenance windows. So 12.00 is the next opportunity.

Yesterday evening we did experience some slight added congestion on our backup backhaul link in Salisbury as we entered the peak period, but traffic has since remained below such that most customers will not experience any adverse effects.

We are incredible frustrated that this is taking so long to be resolved, and have escalated the issue on multiple occasions to the carrier.

Further updates will be provided after 12.00 when we hope that their investigations will provide some feedback.
Posted Nov 21, 2020 - 09:19 GMT
Update
Following further escalations to the carrier, they have confirmed that the amplification change they made yesterday didn't resolve the problem for all affected customers. It resolved it for a few, but not all.

They are now planning an emergency change control for 21/11/20 00:01 to authorise further investigations on their fibre link between Bromley and Hersham where the loss of optical power resides. This work will interrupt the fibre cable so they can run further light level/ OTDR checks on the fibre and locate the source of the problem.

We are awaiting further information on whether this change control has been authorised. However they have informed us that they have the engineering resource on standby.

Do to them not knowing where the fault lies or what has caused this, there is currently no estimated time for resolution.

The fault within their optical network is affecting multiple customers.

At this time, our Salisbury exchange presence is still operating from our diverse backup backhaul circuit.
We observe occasional congestion to Salisbury which when this occurs, causes reduced performance and packet loss/higher latency.
The scale of this is very minimal at the moment, so most customers will not experience any issues.

We do however apologise for any inconvenience this is causing.

The Salisbury network remains 'at-risk' due to the loss of our primary backhaul circuit.
Posted Nov 20, 2020 - 12:47 GMT
Update
The carrier overnight confirmed that they had increased amplification levels and they believed this had resolved the problem. However we are still observing the circuit down.
We have fed this information back to them and it’s been further investigated.
At this time we’re continuing to rely on the backup backhaul circuit and all performance metrics are nominal.
Traffic engineering is still in place for a subset of customers. However the vast majority should experience no adverse effects.
Our Salisbury network remains ‘at risk’ due to the loss of our primary backhaul circuit.
Posted Nov 20, 2020 - 08:23 GMT
Update
We have escalated this issue once more with the carrier as they have yet to provide any indication of restoration.

The latest is that an engineer is on-site at an optical site (in/near Hersham) where the low light levels are being reported.
At this time they are unsure whether this is a fibre break/damage, or a problem with the optical amplifier.
This is also affecting multiple circuits/customers and has been declared a major service outage.

At this time, our backup backhaul circuit is operational, all customers are online, and the earlier traffic management has ensured that degradation of performance is limited and therefore most customers should experience an acceptable level of service given this incident.

The traffic management and monitoring shall remain inn place overnight.

The next update will be at 08.00 20/11 if not before.
Posted Nov 19, 2020 - 23:20 GMT
Update
We are still awaiting the repair on the primary backhaul circuit that provides connectivity to the Salisbury exchange.
The latest update from the carrier was at 15:35 when they indicated ‘[low light on their fibre]’.

Since the last update, we have sadly implemented some traffic engineering on a minority of customer connections to ensure that the majority of customers in Salisbury remain as unaffected as possible during this incident. When this was implemented, at approx 17:09, some Salisbury customer connections would have briefly dropped (for a few seconds) before automatically re-establishing.
Some customers may notice a decrease in available bandwidth (speed), whilst this incident is ongoing as a result of this incident and the traffic engineering.
These mitigation steps will ensure that packet loss and overall performance for the majority of customers remains as nominal as possible.

We regret this traffic engineering, but this was done with the best intentions for the majority of customers.
Once the incident is resolve, the traffic engineering will be removed, causing a brief outage before normal service is resumed.

Further updates will be provided as we learn more from our primary backhaul carrier.
Posted Nov 19, 2020 - 17:31 GMT
Identified
The carrier for our primary backhaul circuit has confirmed that there are low light levels over the circuit, and they speculate due to a potential fibre disturbance.

They have dispatched field engineers to their hub site to locate the fault.

We are starting to observe some slight degradation creep in as all traffic routes via a backhaul route.

If the primary circuit is not resolved as we enter the peak period, we may have to introduce traffic engineering to ensure the majority of customers have an acceptable service. Currently no traffic engineering is in place as per our normal policies.

We apologies for any inconvenience this incident causes.

Further updates will be provided as we learn more from the carrier.
Posted Nov 19, 2020 - 15:53 GMT
Update
Our backhaul provider are currently investigating the problem.
They have provided no estimated time for resolution.

However, our backup backhaul circuit continues to function well. Due to current customer utilisation out of Salisbury, customers should experience no degradation of service right now. This may change as traffic levels increase however.
All service as considered ‘at-risk’ due to the outage.

Further updates will be provided as we learn more.
Posted Nov 19, 2020 - 13:53 GMT
Investigating
We are currently aware of a incident affecting our primary backhaul circuit in the Salisbury exchange.

All traffic has automatically rerouted over a backup link.

As such there may be degraded (slow transfers, higher latency or packet loss) performance until the primary circuit has been repaired.
The backhaul carrier has been notified and we are awaiting a further update from them.

Start Time: 11:41

Affecting: all broadband and Ethernet circuits terminating in the Salisbury exchange.

Service Impact: degraded performance for affected customers.

Further updates will be provided as we learn more.

We apologise for any inconvenience this outage causes.
Posted Nov 19, 2020 - 12:22 GMT
This incident affected: Giganet - Broadband and Internet (Carrier - SuperBOLT/SuperBEAM/UltraBEAM/UltraBOLT/Legacy (Broadband)) and Giganet - Data Centres & Points of Presence (Giganet Local - Salisbury).