On July 25th, Google cloud launched a new region with all sorts of fanfare about how the new facility – australia-southeast2 in Melbourne – would accelerate the nation’s digital transformation and make the world a better place in myriad ways.
And on August 24th, the region went down quite hard. Late in the afternoon, local time, users of the region lost the ability to create new VMs in the Google Cloud Engine. Load balancers became unavailable, as did cloud storage. In all, 13 services experienced issues.
Things improved an hour or so later, with some services resuming – but the number of services impacted blew out to 17.
That list grew by one by the time all services were restored, and Google’s final analysis of the incident named 23 impacted services.
That analysis stated that while the underlying impact of the incident lasted 40 minutes, services remained hard to use for a couple of hours afterwards.
Google says the core of the incident was a failure of “Public IP traffic connectivity” and its preliminary assessment of the cause was “transient voltage at the feeder to the network equipment, causing the equipment to reboot.”
“Transient voltage” is a phenomenon that sees enormous but very short spikes of energy, sometimes because of events like lightning strikes.
Data centres are built to survive them … or at least they’re supposed to be. Yet within a month of opening its virtual doors, australia-southeast2 succumbed to one.
Google hasn’t said if the networking equipment that rebooted belonged to it, or a supplier. Either way, it’s another lesson that clouds are far from infallible. ®