Originally published on ieee.org.
37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 1
Denis Gopan, Evan Driscoll, Ducson Nguyen, Dimitri Naydich, Alexey Loginov and David Melski
Detecting memory-safety violations in binaries is complicated by the lack of knowledge of the intended data layout, i.e., the locations and sizes of objects. We present lightweight, static, heuristic analyses for recovering the intended layout of data in a stripped binary. Comparison against DWARF debugging information shows high precision and recall rates for inferring source-level object boundaries. On a collection of benchmarks, our analysis eliminates a third to a half of incorrect object boundaries identified by an IDA Pro-inspired heuristic, while retaining nearly all valid object boundaries. In addition to measuring their accuracy directly, we evaluate the effect of using the recovered data for improving the precision of static buffer-overrun detection in the defect-detection tool CodeSonar/x86. We demonstrate that CodeSonar’s false-positive rate drops by about 80% across our internal evaluation suite for the tool, while our approximation of CodeSonar’s recall only degrades about 25%.