|We are currently investigating a network interruption in the cloud datacentre that is causing some servers to be down - more info will follow as available.|
|Updated by Sam Pizzey on 28th Apr 2021 @ 17:29pm|
|The root cause of the issue has been identified and we are in the process of restoring connectivity to servers currently.|
|Updated by Sam Pizzey on 28th Apr 2021 @ 17:34pm|
|Network connectivity is currently being restored to the affected servers in batches, we are working through those as quickly as possible and expect all batches to be complete shortly.|
|Updated by Sam Pizzey on 28th Apr 2021 @ 19:26pm|
| This process is now complete and all servers have connectivity again. Apologies for the inconvenience caused by this outage, we are continuing to monitor the network closely and will be investigating further to analyse the root cause, to determine what we can do to prevent similar incidents in future from occurring.
If you do continue to experience any issues with your server this is likely unrelated to the issue, and you should open a support ticket where we can look into it further for you as normal.
|Updated by Sam Pizzey on 28th Apr 2021 @ 20:26pm|
|Our sys-admin team are compiling information from the investigation and will release an RFO as soon as possible.|
|Updated by James Scott on 29th Apr 2021 @ 09:33am|
| RFO: Cloud v2 Network - 28/04/2021
| Outage announcement:
28/04/2021 17:05 - 28/04/2021 21:00
| Initial Investigation:
Our network monitoring tools flagged alerts for multiple servers in the Cloud Network with networking issues, our on-shift engineer investigated, diagnosed the issue at 17:34 and contacted senior sys-admins who were deployed to the datacenter.
| Root Cause:
The third party cloud virtualisation software, running on our old cloud platform, attempted to regain control of all the IPs which were previously assigned to it. This caused a null-routing scenario by creating false entries in the switches ARP tables, sending traffic to an old (non-existent) location rather than the your live cloud platform.
| Actions Taken:
All servers primary & additional IP configurations were refreshed in the switches so traffic was routed to the right location & connectivity was restored.
| Further Work (COMPLETED):
Emergency work was carried out to remove the old virtualisation software completely. Existing customers were migrated within the last week and the software / hardware was retired ahead of schedule. Our teams will continue to monitor the network & our networking team have confirmed all routing within the network is now working as expected.
|Updated by James Scott on 4th May 2021 @ 11:44am|