Software systems face the threat of reverse engineering. Given enough time and resources, a determined hacker can recover the design of a software program by examining its executable. The consequences of this can be dramatic: the hacker may gain unauthorized access to sensitive computer systems, allowing him to wreak untold havoc.
Under previous research, GrammaTech had developed a software protection platform called DARE in order to assist software developers in responding to the threat of reverse engineering. Due to the high quality of the IR (internal representation) constructed by the tool, a wide range of high-level transformations can be supported. These include transformations that relocate code and data in memory, fine-tune individual instruction selection, shuffle instruction ordering or basic ordering, and embed extra code inline with the original code. Such 'aggressive' code transformation capabilities can support a wide array of defensive techniques.
Unfortunately, the original DARE tool was hampered by the method in which it constructed its IR. Relying solely on disassembly as the source for information concerning a program's structure and behavior, DARE was often subject to an imperfect IR that resulted in applications that, while protected, often incurred critical semantic errors that rendered the protected application useless. This is due primarily to the fact that disassembly is an imperfect science. Even the best disassemblers make heuristic guesses about the structure of a program that can, from time to time, prove to be incorrect. Our experience shows that such errors are rare, but they are still prevalent enough that the IR constructed by DARE for any one program is likely to include at least one such error, and that is sufficient to bring a protected application to its knees.
The focus of this project was to build on GrammaTech's existing executable-rewriting technology (DARE) in order to develop an improved software protection platform. The platform is still capable of operating directly on executables, but is designed to be flexible enough to draw information about the executable's behavior and components from a variety of sources including source code, compiler information, and debugging information. By drawing on these other sources of information, the new system can eliminate much of the guesswork performed by a disassembler. As a result, the IR used by the tool is more accurate, thus avoiding semantic errors in transformed applications.