We're still waiting for our network rack to regain power following Equinix and their contractors migrating power supplies onto the new infrastructure following the earlier fault.
There is sadly still no estimated fix time which is most frustrating. They have assured us that they will provide this information when they can.
Equinix are being continually chased for updates.
As you can appreciate this is a P1 issue affecting many 100s of other carriers/ISPs - so it's been given the maximum priority.
Summary:
1. We have lost both A+B feeds to 1 of our 2 Equinix LD8 racks at approximately 4.23am. This follows a UPS failure, which then triggered the fire alarm in the data centre according to reports from Equinix. The rack that we have lost power to houses our core Juniper MX router and Cisco LNS. The Juniper MX router is our core device which is needed for everything in LD8 to function, including terminating a number of leased line connections as well as providing connectivity to our vDC platform. All our equipment power suppliers are dual fed with 'diverse' A+B power feeds provided by the data centre - however after this incident we suspect that there is a lack of resiliency and will be sure to raise this after the incident is resolved as this is clearly unacceptable to experience a power outage of this gravity.
2. Customers with diverse/resilient leased line circuits should have been operational throughout this incident, as their traffic will have re-routed via our THN alternate data centre. If you circuit is still down and you have managed resiliency, please let us know so we can investigate.
Customers with single-fed leased lines that terminate in LD8 will be offline at this time. We are aware that many of our carriers in LD8 are also offline too, so even if our rack was on, our NNI interconnects would be down.
3. All 'broadband' customers (Openreach/CityFibre/Glide ADSL2+/FTTC/G.Fast/FTTP) should have remained online throughout, as our THN data centre is our primary broadband termination location, with LD8 being backup.
4. Customers with single-homed DBX private cloud phone systems that are hosted in LD8 will be down, however we have been arranging network diverts to ensure inbound calls are still routed to customers. We have a rough 50/50 split of DBX Private cloud systems being hosted in LD8 & THN. So some will be unaffected. The vDC infrastructure powering our DBX platform is powered on, as this is the rack unaffected by the outage, however we cannot communicate to this as it's networked via the other rack.
5. Customers with managed PWANs all have N+1 resiliency throughout their design. So the HA firewall, backup connectivity all re-routed automatically where applicable to THN. So the PWAN core, internet breakout, and traffic routing to sites is 100% operational, albeit potentially on reduced bandwidth to sites where backup circuits are lower bandwidths than the primary.
6. Our secondary core application services such as DNS03, NTP03, SMTP02, RADIUS02 are currently down as these are in LD8. The primary ones are in THN and operational, so all customers should be unaffected. The vDC infrastructure powering our core applications is powered on, as this is the rack unaffected by the outage, however we cannot communicate to this as it's networked via the other rack.
7. Some M12/Giganet hosted services such as our Giganet availability checker are currently down as these are hosted in LD8. The vDC infrastructure powering our availability checker is powered on, as this is the rack unaffected by the outage, however we cannot communicate to this as it's networked via the other rack.
8. As reported in the previous update, we have seen two brief interruptions in service affecting broadband and leased line circuits routing via THN. We suspect this is caused by our carriers/suppliers network equipment powering back up/re-learning routes in LD8, and potential downstream effects on any traffic passing over these links & devices. So we advise customers that there could be still some brief outages/packet loss as services are restored.
We are absolutely focused on restoring services as soon as we can, however we're at the mercy of Equinix and their contractors.
You can find out more about Equinix here:
https://www.equinix.co.uk/We are sorry for the continued disruption. We're doing all we can to put the pressure on to get the swift resolution, and of course there will be a lot of analysis later.
We will continue to post regular updates as we learn more.