Facebook has struggled back online today, though at the time of writing glitches are still very much a part of The Social Network™ experience.
WhatsApp and Facebook became available to users at around 2210 UTC on October 4 after falling off the internet about six hours prior. Instagram and Facebook Messenger should be not far behind.
In the past hour, Facebook tweeted: “To the huge community of people and businesses around the world who depend on us: we’re sorry. We’ve been working hard to restore access to our apps and services and are happy to report they are coming back online now. Thank you for bearing with us.”
CTO Mike Schroepfer earlier said: “Sincere apologies to everyone impacted by outages of Facebook powered services right now. We are experiencing networking issues and teams are working as fast as possible to debug and restore as fast as possible.”
Founder Mark Zuckerberg chipped in: “Facebook, Instagram, WhatsApp and Messenger are coming back online now. Sorry for the disruption today – I know how much you rely on our services to stay connected with the people you care about.” His otherwise most recent missive was a video of him on a yacht.
The Register staff in the United States and Australia have experienced different levels of service since the resumption.
One Vulture in the USA was able to post to Facebook without issue. Antipodean staff were unable to post and saw errors such as the following…
Attempts to view notifications produced a “query error” dialog. WhatsApp was flaky – linking devices took over a minute. Instagram was down, at least in Australia, where its favicon loaded in a browser tab, but the site produced only the message, “Oops, an error occurred.”
Theories about the cause of the outage have focused on Facebook’s seemingly accidental and sudden withdrawal of its BGP routes from the rest of the internet, causing a loss in connectivity with its peers as well as the knock-on effect of kicking over its own DNS. This apparently even caused door keycards to stop working on Facebook’s campus, and staff were left to use Outlook and Discord to organize the recovery of their systems.
In May this year, Facebook announced it had built an automated peering configuration system. This software may or may not have been at the heart of today’s outage. Someone claiming to work at Facebook posted on Reddit, and since deleted their missive, that Facebook’s peering routers went down likely due to a configuration blunder. Which makes sense given the circumstances.
The IT breakdown was such that engineers needed to get physical access to the routers to fix and restart them, and a team was sent into Facebook’s Santa Clara data center to do that, according to the New York Times.
How could a company of Facebook’s scale get BGP wrong? An early candidate is that aforementioned peering automation gone bad. The astoundingly profitable internet giant hailed the software as a triumph because it saved a single network administrator over eight hours of work each week.
Facebook employs more than 60,000 people. If a change designed to save one of them a day a week has indeed taken the company offline for six or more hours, that’s quite something.
The outage comes at a terrible time for Facebook, which in recent days has been the subject of damming leaks that suggest the company is knowingly indifferent to harms its platforms can create, including increased likelihood of self-harm by users, facilitating human trafficking, and ineffectual efforts to suppress hate speech and misinformation.
Documents shared by whistleblower Frances Haugen, a former Facebook employee, have also suggested The Social Network™ ignored rules about content for high-profile users, and employed woefully insufficient numbers of staff who speak users’ native languages, thereby allowing vile content to circulate without checks.
Haugen has filed a complaint with the United States’ Securities and Exchange Commission, suggesting Facebook withheld information investors need to make informed decisions. That’s the kind of indirect but effective tactic that sees authorities chase mobsters for unpaid taxes rather than trying to secure evidence of murders.
The Register does not suggest Facebook has murdered anyone.
But today’s outages may have been extremely serious for those who rely on its services for day-to-day communications, both in their personal lives and for businesses that have gone all-in on Facebook as a customer communication and sales channel. ®