Recently, approaches based on tree-decompositions (TDs) of control-flow graphs (CFGs) have been introduced for many classical problems in compiler construction. Some found practical use in SDCC, a mainstream C compiler for embedded systems. Using TD-based algorithms, SDCC generates faster and often smaller code for its target architectures than other compilers. The width of the TD crucially influences runtime, hence good algorithms computing TDs of small width are essential. The current standard approach to obtain TDs of CFGs is Thorup's heuristic, which was also used in SDCC. Thorup claims that his heuristic will give TDs of width at most 6 for CFGs from C code that contains no goto statements. We present a counterexample resulting in a tree-width of 7. We show how to construct C code without goto statements for which the CFG has tree-width 3, but Thorup's heuristic will yield TDs of arbitrary width. We demonstrate how this flaw adversely affects the compilation of real-world code. We present a constructive proof giving a tight bound on the tree-width of CFGs of C programs. This corrects Thorup and shows the effect that goto has on the tree-width. We empirically evaluate various approaches to finding TDs of CFGs and their impact on compiler runtime and code quality in SDCC. Our research resulted in the replacement of the unconditional use of Thorup's heuristic by a better approach in SDCC 3.7.0, drastically reducing compiler runtime.
Bibliographical noteKAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This research was supported by Deutsche Forschungsgemeinschaft, Projekt Graphstrukturtheorie im Übersetzerbau, KR 4970/1-1.