CodeSurfer/x86

Overview

CodeSurfer/x86 is a prototype system for analyzing x86 executables. It is the outcome of joint research conducted by the University of Wisconsin and GrammaTech. The goal of this research is to provide a platform that an analyst can use to understand the workings of COTS components, plugins, mobile code, and DLLs, as well as memory snapshots of worms and virus-infected code. The research is sponsored by several government agencies, including the US Air Force, the US Navy, the Office of the Secretary of Defense, and the Department of Homeland Security.

Motivation

While many static-analysis tools have been developed, nearly all focus on analyzing source code (e.g., C, C++, or Java). In contrast, CodeSurfer/x86 analyzes executables (i.e., program binaries). The ability to analyze executables has important advantages, including:

  • Binaries reflect actual behaviors that may arise during program execution because machine code is what is really executed. In particular, many key decisions that can affect security or reliability are made by the compiler. These include memory layout (e.g., assignment of stack-frame offsets for variables, introduction of padding between structure fields, etc.), register allocation, and optimization. The results of these decisions are visible in the executable, but a source-code analyzer must make assumptions about these aspects (which may be either uncheckable or difficult to check).
  • Commercial-off-the-shelf (COTS) components for which the source code is unavailable can be analyzed.
  • Source code may contain inlined assembly code. Source-level analysis tools typically either skip over inlined assembly or do not push the analysis beyond it. This issue disappears when the analysis is performed directly on a executable.
  • Applications written in a wide variety of languages can be analyzed.
  • There is no need to trust the compiler.
  • It is not necessary to rely on the potentially unsound models of library functions that are typically used for source-level analysis because a executable analyzer can examine external libraries.
The following example illustrates how direct analysis of an executable is more accurate than source-level analysis. Consider the following lines of source code taken from a login program:
memset(password, '\0', len);
free(password);

The login program must (temporarily) store the user's password in clear text. To minimize the lifetime of sensitive information, a conscientious programmer has zeroed-out the password buffer before returning it to the heap. Unfortunately, a compiler that performs "dead-code" elimination will see that the program never uses the values written by the memset statement and will remove the statement, thereby leaving sensitive information exposed in the heap. This vulnerability is invisible in the source code; it can only be detected by examining the low-level code emitted by the optimizing compiler. This example is not just hypothetical; a similar vulnerability was discovered during the Windows security push in 2002.

Technology

CodeSurfer/x86 uses a variety of static-analysis algorithms to recover intermediate representations that are similar to those that a compiler creates for a program written in a high-level language. A key feature of CodeSurfer/x86 is its ability to understand memory-access operations. Understanding memory-access operations in executables is challenging because:
  • Many memory operations use indirect addressing via address expressions.
  • Arithmetic on addresses is pervasive. For instance, even when the value of a local variable is loaded from its slot in an activation record, address arithmetic is performed.
  • There is no notion of type at the hardware level, so address values cannot be distinguished from integer values.
  • Memory accesses do not have to be aligned, so word-sized address values could potentially be cobbled together from misaligned reads and writes.

An additional challenge is that for many programs, symbol-table and debugging information is entirely absent. Even if it is present, the executable may be untrustworthy, in which case symbol-table and debugging information cannot be relied upon. For these reasons, CodeSurfer/x86 employs value-set analysis (VSA), an algorithm developed by Balakrishnan (Wisconsin) and Reps (Wisconsin/GrammaTech), to recover information about the contents of memory locations and how they are manipulated by the executable.

A key feature of VSA is that it tracks address-valued and integer-valued quantities simultaneously. VSA is related to pointer-analysis algorithms that have been developed for programs written in high-level languages, which determine an over-approximation of the set of variables whose addresses each pointer variable can hold:

VSA determines an over-approximation of the set of addresses that each data object can hold at each program point.
At the same time, VSA is similar to range analysis and other numeric static-analysis algorithms that over-approximate the integer values that each variable can hold:
VSA determines an over-approximation of the set of integer values that each data object can hold at each program point.

More information about VSA can be found in the papers listed below.

An Analysis Platform

CodeSurfer/x86 provides an analyst with a powerful and flexible platform for investigating the properties and behaviors of an x86 executable, using either (i) CodeSurfer/x86's GUI, (ii) CodeSurfer/x86's scripting language, which provides access to all of the intermediate representations that CodeSurfer/x86 builds for the executable, or (iii) GrammaTech's Path Inspector, which is a tool that uses a sophisticated pattern-matching engine to answer questions about the flow of execution in a program.

One of the core program representations calculated by CodeSurfer/x86 is the system dependence graph (SDG). The CodeSurfer/x86 GUI supports browsing ("surfing'') of an SDG, along with a variety of operations for making queries about the SDG-such as slicing and chopping. The GUI allows a user to navigate through a program's disassembler listing using these dependences in a manner analogous to navigating the World Wide Web. (The CodeSurfer/x86 GUI is very similar to the CodeSurfer/C GUI, but has been augmented to provide executable-specific information.)

CodeSurfer's scripting language provides a programmatic interface to queries, as well as to lower-level information, such as the individual nodes and edges of the program's SDG, call graph, and control-flow graph, and a node's sets of used, killed, and possibly-killed variables. By writing programs that traverse CodeSurfer's IRs to implement additional program analyses, the scripting language can be used to extend CodeSurfer's capabilities.

Status

CodeSurfer/x86 is a research prototype and not a commercial product at this time. However, it is being used experimentally at a number of sites. For more information about CodeSurfer/x86, please contact Mark Zarins via email at mzarins@grammatech.com or via telephone at 408-246-9100.

Bibliography

Balakrishnan, G. and Reps, T. Analyzing memory accesses in x86 executables. In Proc. Int. Conf. on Compiler Construction, Springer-Verlag, New York, NY, 2004, 5-23. (Awarded the EAPLS Best Paper Award at ETAPS 2004.)

Balakrishnan, G., Gruian, R., Reps, T., and Teitelbaum, T., CodeSurfer/x86 -- A platform for analyzing x86 executables, (tool demonstration paper). In Proc. Int. Conf. on Compiler Construction, April 2005.

Balakrishnan, G., Reps, T., Kidd, N., Lal, A., Lim, J., Melski, D., Gruian, R., Yong, S., Chen, C.-H., and Teitelbaum, T., Model checking x86 executables with CodeSurfer/x86 and WPDS++, (tool-demonstration paper). In Proc. Computer-Aided Verification, 2005.

Balakrishnan, G., Reps, T., Melski, D., and Teitelbaum, T., WYSINWYX: What You See Is Not What You eXecute. To appear in Proc. IFIP Working Conference on Verified Software: Theories, Tools, Experiments, Zurich, Switzerland, Oct. 10-13, 2005.

Reps, T., Balakrishnan, G., Lim, J., and Teitelbaum, T., A next-generation platform for analyzing executables. In Proc. 3rd Asian Symposium on Programming Languages and Systems, (Tsukuba, Japan, Nov. 3-5, 2005).


Free Trial | Products | Customers | Support | News | Jobs | About Us         © 2007, GrammaTech, Inc. All rights reserved.