Skip to main content

Tip Sheets

Better failsafes could prevent future Facebook outages

Media Contact

Abby Butler

Facebook, Instagram and WhatsApp experienced an extended outage yesterday.


Nate Foster

Associate Professor of Computer Science

Nate Foster is a professor of computer science who works to develop languages and tools that make it easy for programmers to build secure and reliable systems. 

Foster says:

Cause of the Outage: "Facebook is likely to publish a detailed "incident postmortem" about this outage in the coming days. But based on information released so far, it seems it was caused by a bug in the configuration for an Internet router (i.e., BGP) that also disconnected Facebook's domain name servers (i.e., DNS) from the rest of the world. In addition, there have been some reports that the bug was introduced by a flaw in an automated network management system, though these reports haven't been publicly confirmed. So far, there is no indication that the outage was due to malicious activity."

Impact of the Outage: "The impact of the outage was significant, both externally and internally. For external users, the outage meant that Facebook, Instagram, and WhatsApp were all unavailable for much of the day. In addition, Facebook is now used by many small businesses (e.g., on their marketplace) and as an authentication service for other websites. Within Facebook, there were some reports, again unconfirmed, that employees could not get access to buildings at corporate sites, presumably because the ID card system relies on Facebook's network. So the impact of this disruption was enormous."

Preventing Future Outages: "There are technical approaches that could be used to prevent outages like this in the future. For instance, one can imagine redesigning Facebook's network architecture to provide better failsafes, even if the main routes to the Internet are disconnected. Another approach is to use formal verification to validate the router configurations before they are deployed -- an idea that has been used by other large network operators in an attempt to reduce the frequency and severities of outages."

 

Cornell University has television, ISDN and dedicated Skype/Google+ Hangout studios available for media interviews.