Originally published by 2025 IEE Symposium on Security and Privacy (SP)
Authors:
Antonio Flores-Montoya, Junghee Lim, Adam Seitz, Akshay Sood, Edward Raff, James Holt
Abstract:
Disassembly is the first step of a variety of binary analysis and transformation techniques, such as reverse engineering, or binary rewriting. Recent disassembly approaches consist of three phases: an exploration phase, that overapproximates the binary’s code; an analysis phase, that assigns weights to candidate instructions or basic blocks; and a conflict resolution phase, that downselects the final set of instructions. We present a disassembly algorithm that generalizes this pattern for a wide range of architectures, namely x86, x64, arm32, and aarch64. Our algorithm presents a novel conflict resolution method that reduces disassembly to weighted interval scheduling. Additionally, we present a weight assignment algorithm that allows us to learn optimal weights for the various disassembly heuristics in the analysis phase. Learned weights outperform manually tuned weights in most cases while reducing the number of necessary heuristics by 40% (by setting their weights to zero). Our implementation, built on top of Ddisasm, outperforms state-of-the-art disassemblers in several metrics and achieves the largest proportion of perfectly disassembled binaries by a wide margin in all evaluated datasets.