Abstract
As the solitary inter-domain protocol, BGP plays an important role in today's Internet. Its failures threaten network stability and will usually result in large-scale packet losses. Thus, the non-stop routing (NSR) capability that protects inter-domain connectivity from being disrupted by various failures, is critical to any Autonomous System (AS) operator. Replicating the BGP and underlying TCP connection status is key to realizing NSR. But existing NSR solutions, which heavily rely on OS kernel modifications, have become impractical due to providers' adoption of virtualized network gateways for better scalability and manageability.
In this paper, we tackle this problem by proposing TENSOR, which incorporates a novel kernel-modification-free replication design and lightweight architecture. More concretely, the kernel-modification-free replication design mitigates the reliance on OS kernel modification and hence allows the virtualization of the network gateway. Meanwhile, lightweight virtualization provides strong performance guarantees and improves system reliability. Moreover, TENSOR provides a solution to the split-brain problem that affects NSR solutions. Through extensive experiments, we show that TENSOR realizes NSR while bearing little overhead compared to open-source BGP implementations. Further, our two-year operational experience on a fleet of 400 servers controlling over 31,000 BGP peering connections demonstrates that TENSOR reduces the development, deployment, and maintenance costs significantly - at least by factors of 20, 5, and 10, respectively, while retaining the same SLA with the NSR-enabled routers.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the ACM SIGCOMM 2023 Conference |
Publisher | ACM |
DOIs | |
State | Published - Sep 2023 |
Bibliographical note
KAUST Repository Item: Exported on 2023-09-04Acknowledgements: We sincerely thank the anonymous reviewers for their valuable feedback on earlier versions of this paper. We also thank the teams at Tencent for their contributions to the work. Congcong Miao and Jilong Wang are the corresponding authors. This work was supported in part by the National Key Research and Development Program of China under Grant No. 2020YFE0200500.