What Is BGP and How Its Failure Took Facebook Down?

[post-views]
October 08, 2021 · 4 min read
What Is BGP and How Its Failure Took Facebook Down?

On October 4, 2021, Facebook – and all the major services Facebook owns – went down for approximately six hours. The social media “blackout” started at 11:40 Eastern Time (ET) right after Facebook Domain Name System (DNS) records had become unavailable.

The incident analysis from Cloudflare details that DNS names for Facebook just stopped resolving, and the infrastructure IPs of the social media giant became unreachable. However, the DNS issues seem to be only a consequence but not the root cause of the problem. The initial failure occurred in the Border Gateway Protocol (BGP) routing into Facebook web resources. 

What is BGP?

Border Gateway Protocol (BGP) is a standardized mechanism that powers the exchange of routing info between autonomous systems (AS) on the Internet. As separate networks need to peer with each other to form a global web, they promote its presence by communicating the routing information. This data is further stored in a routing information base (RIB). 

RIB acts as a huge, constantly updating map that exists to path the way across a variety of destinations. BGP can access the RIB database, listing every possible route to deliver data and choosing the most efficient one. In case BGP fails, one network (Facebook in this case) can’t advertise its presence, therefore, other networks can’t reach it anymore. As a result, the affected network seems to be cut out of the Internet.

Why Facebook Was Down

According to the explanatory blog post from Facebook, the problem occurred after a major configuration change. It affected the system that manages Facebook’s global backbone network capacity responsible for linking all vendor’s data centers. Furthermore, this configuration change resulted in Facebook’s routes being withdrawn and the social media giant’s servers going offline. 

With those configuration changes and route withdrawals, Facebook literally cut off itself from the Internet, alongside its popular services Instagram, WhatsApp, and Oculus VR. Apart from disappearing from the Internet, Facebook left its employees without the ability to enter the office buildings since the smart cards were also affected by the outage. Moreover, Facebook’s internal workflow platform Workplace was also blocked, leaving the employees no ability to proceed with the daily tasks.

As the problem seems to occur due to the incorrect config update by Facebook network engineers, the solution also came from  technicians  who accessed the routers locally to fix the issues. Six hours after the outage started, Facebook resources were restored, and puzzled users were able to access their social media accounts. As of October 8, 2021, Facebook systems are fully functional. 

Detecting BGP Failures

In a view that even slight BGP routing issues may cause major problems within your infrastructure, it is important to track any changes related to its config. To monitor BGP outages and failures, Massimo Candela, a Senior Software Engineer at NTT Global Networks, developed a dedicated tool called BGPalerter. It is a self-configuring tool that performs the analysis of BGP data streams from various sources on the fly. It powers the real-time detection of visibility loss, RPKI invalid announcements, hijacks, and more. 

To make BGP outage tracking even easier, the SOC Prime Team released a Sigma rule that detects high and critical events generated by BGPalerter. The rule is available for free download from the SOC Prime platform upon registration.

BGP Suspicious Changes (via BGPalerter tool)

The detection has translations for the following SIEM SECURITY ANALYTICS platforms: Azure Sentinel, ELK Stack, Chronicle Security, Sumo Logic, ArcSight, QRadar, Humio, FireEye, Carbon Black, LogPoint, Graylog, Regex Grep, Microsoft PowerShell, RSA NetWitness, Apache Kafka ksqlDB.

The rule is mapped to MITRE ATT&CK methodology addressing the Impact tactics and the Network Denial of Service technique (t1498).

Register to the SOC Prime platform to make threat detection easier, faster, and simpler. Instantly hunt for the latest threats within 20+ supported SIEM & XDR technologies, automate threat investigation, and get feedback and vetting by 20,000+ community of security professionals to boost your security operations. Eager to craft your own detection content? Join our Threat Bounty program, share your Sigma and Yara rules in the Threat Detection Marketplace repository, and get recurrent rewards for your individual contribution! Enthusiastic to enhance your threat hunting skills? Learn what are Sigma rules and how to start creating ones with our guide for beginners.

Go to Platform Join Threat Bounty

Table of Contents

Was this article helpful?

Like and share it with your peers.
Join SOC Prime's Detection as Code platform to improve visibility into threats most relevant to your business. To help you get started and drive immediate value, book a meeting now with SOC Prime experts.

Related Posts