Sponsored Research
Our expertise in software analysis and binary transformation comes from decades of experience of high-tech research with the U.S. government and other organizations. Over the past two decades, we have partnered with several groups to help solve some of the most complex software challenges that impact devices' resiliency, safety, and security. Our work has been focused in three areas:
- Software Assurance: new techniques and technologies for analyzing and correcting software to ensure runtime integrity and prevent unplanned system breaches and failures.
- Software Hardening: technologies solely focused on system resiliency.
- Autonomic Computing: providing software systems with the ability to ‘self-protect’
Current and Completed Research Projects
Below is a listing of current and completed projects with our government sponsors, which highlights the areas in which we have a particular focus.

Automatic Artificial Diversity for Virtual Machines
Air Force Research Laboratory - Rome
We propose to introduce artificial diversity to each installation of a standard platform by running the system using a combination of hardware virtualization and software dynamic translation. Automatic, transparent diversification offers powerful protection for systems that would otherwise remain homogenous. Code exploits are usually highly dependent on the details of the software and the vulnerability they target. Diversification ensures that those details change from one instance to the next, thereby requiring that a customized exploit be developed for each machine - frequently an insurmountable challenge for the attacker. Diversification is also attractive because it offers some protection against unknown attack vectors and methodologies.
BENEFIT: Standardization of computer platforms is an important tool for improving security. Up to 80% of the vulnerabilities that are exploited during penetration testing of government networks result from misconfigured software. Standardized platforms allow security experts to ensure that these vulnerabilities are closed. Unfortunately, wide distribution of a standard platform also means wide distribution of any vulnerability in that platform. While adoption of a standard platform may be the only hope an enterprise has of managing and avoiding known vulnerabilities, it also dramatically increases the potential damage from exploits of newly discovered vulnerabilities: a novel attack may subvert or disable all standardized machines. Our approach to artificial diversity will enable the security benefits of a standardized computing platform without the coincident standardization of security vulnerabilities.

Deep Static Analysis – Software Binaries
Air Force Research Laboratory - Rome
Market forces are increasingly pushing companies to deploy COTS software when possible—for which source code is typically unavailable—and to out-source development when custom software is required. Moreover, a great deal of legacy code—for which design documents are usually out-of-date, and for which source code is sometimes unavailable and sometimes non-existent—will continue to be left deployed. An important challenge during the coming decade will be how to identify bugs and security vulnerabilities in such systems. Methods are needed to determine whether third-party and legacy application programs can perform malicious operations (or can be induced to perform malicious operations), and to be able to make such judgments in the absence of source code.
Recent research in programming languages, software engineering, and computer security has led to new kinds of tools for analyzing code for bugs and security vulnerabilities. In these tools, static analysis is used to determine a conservative answer to the question "Can the program reach a bad state?" In principle, such tools would be of great help to an analyst trying to detect malicious code hidden in software, except for one important detail: the aforementioned tools all focus on analyzing source code written in a high-level language. Even if source code were available, there are a number of reasons why analyses that start from source code do not provide the right level of detail for checking certain kinds of properties, which can cause bugs, security vulnerabilities, and malicious behavior to be invisible to such tools.
In contrast, our work addresses the problem of finding bugs and security vulnerabilities in programs when source code is unavailable. Our goal is to create a platform that carries out static analysis on executables and provides information that an analyst can use to understand the workings of potentially malicious code, such as COTS components, plug-ins, mobile code, and DLLs, as well as memory snapshots of worms and virus-infected code.

Cognitive Techniques for Analysis of Complex Software Systems
Air Force Research Laboratory - Rome
The problem of finding flaws in large complex software systems is acute and getting worse because many systems are employing new cognitive techniques to increase their capabilities. Such techniques are typically highly dynamic and concurrent, which increases the complexity of the system and makes it correspondingly harder to debug. Traditional approaches fail because they are incapable of handling such levels of complexity. Often the cause of many complex flaws can be traced to errors in how the components of the system communicate and interact.
Recently new methods have emerged that use sophisticated data mining and machine-learning techniques to automatically locate the source of flaws. These work by learning the rules for legal interactions between components by observing the behavior of the system during normal operations. These rules can then be automatically checked either statically or dynamically. We propose to create a prototype of a system that uses these techniques. It will learn temporal properties from traces, and feed the resulting rules to a static checker. The checker will report if any part of a component may violate these rules. This is feasible because we are able to leverage a great deal of existing technology and expertise.
The software we propose to develop will reduce the time required to develop and debug large software-based systems, especially those that are highly dynamic and adaptive. It will be applicable across a wide range of markets including aerospace and defense, finance, engineering, and medical.

Source Code Vulnerability Detection
Air Force Research Laboratory - Rome
The growing dependence of the economy and the armed forces on networked information systems has increased the importance of information security. Programming errors cause many security vulnerabilities. Popular languages like C and C++ are particularly prone to programming errors that expose systems to attack. For example, a commonly exploited vulnerability is inadequate bounds checking on C/C++ buffers. By overrunning a stack buffer, an attacker can overwrite critical system bookkeeping information and take control of a system. Other programming errors, such as null pointer dereferences, can also create vulnerabilities.
The seriousness of the problem has led to the development of a variety of tools. Some of these tools do run-time monitoring—but such tools require significant computational overhead and/or miss classes of vulnerabilities. Furthermore, run-time tools do not completely eliminate the vulnerability, so it can still be exploited through a denial-of-service attack. In contrast, source code scanning tools have the potential to completely eliminate many types of vulnerabilities, without run-time overhead.
Unfortunately, simple scanning technology is often very inaccurate in practice. This means that users must spend an inordinate amount of time on code audits. Often, the overwhelming number of potential vulnerabilities that must be investigated makes a code audit totally impractical; the audit is abandoned and the vulnerabilities remain.
Under an Air Force SBIR contract, GrammaTech is developing a solution to automatically detect vulnerabilities using advanced static analysis techniques. The technology has the potential to greatly increase the accuracy of automatic vulnerability detection, drastically reducing the amount of manual source code analysis required. Furthermore, the remaining manual investigation will be greatly simplified by GrammaTech’s program understanding tool, CodeSurfer®. If this project is successful, the technology will enable programmers to rapidly identify and fix vulnerabilities before applications are deployed.

Detecting Malicious Code in Firmware
Air Force Research Laboratory - Rome
The problem of information security has become critical because of the growing dependence of the economy and the armed forces on complex networked information systems. Of particular concern is malicious code, which has been described as code that is intentionally added to or changed in a software system to cause harm or to subvert the intended function of the system. Examples of malicious code include time-bombs, viruses, worms, and Trojan horses.
Unfortunately, most ongoing security research efforts into type-safe languages, proof-carrying code, static scanning etc. do not address the problem of firmware security, as was noted in a recent report by the Infosec Research Council.
Existing technologies for detecting malicious code such as computer viruses, are generally based on naïve techniques that scan a program looking for surface-structure features such as lexical patterns. These techniques are best suited for looking for viruses that have already been detected by other means. They are not well suited to detecting entirely new viruses or polymorphic viruses which can modify themselves to elude detection. Furthermore, they are entirely unsuitable for detecting other forms of malicious code, such as trojan horses inserted via insider attack.
We believe that only techniques based on inspecting a program's deep structure stand a chance of being effective at helping to reliably detect malicious code. Deep structure representations, such as dependence graphs, capture a program's essential semantics. Once in such a form, malicious code detection is a matter of pattern matching to identify anomalous properties of the graph. Two powerful techniques are are available to help in this task: context-free language reachability, and model checking.
Under an Air Force Research Laboratory SBIR contract, GrammaTech is developing techniques for using context-free language reachability and model checking to detect malicious code in firmware. The proposed system will disassemble/decompile firmware into a high-level representation such as control-flow graphs or C code that can be examined by a security analyst using CodeSurfer®. A sophisticated querying mechanism will be used to enable detection of deep-structure patterns indicative of malicious code. A new high-level user interface will permit the analyst to easily access this querying capability to search for tell-tale signs of possibly malicious code, and view the relevant portion of the suspicious firmware. To make the solution scalable to large pieces of firmware, we will leverage existing technology in CodeSurfer, such as slicing and chopping, to reduce the size of code that must be examined when a tell-tale sign is found.

Fault Isolation in Hypervisors with Live Migration
Air Force Research Laboratory - Rome
Cloud computing promises dramatic savings through economies of scale for the government and private sectors, but is also brings security concerns. There is a particular need to prevent faults in or attacks occuring on a virtual machine (VM) from affecting the virtual machines of other users. GrammaTech proposes to address this problem by developing a more secure virtualization system which isolates virtual machines from the controlling hypervisor. Our approach also augments security through dynamic monitoring. Although isolated, the system will maintain the ability perform control tasks such as live migration to relocate virtual machines. This Phase I SBIR research will include development of a Phase I virtualization system prototype with live migration capability. The performance and security of this prototype will be evaluated and compare to existing hypervisor systems.
GrammaTech's approach has the benefit of providing strong isolation between VM's while not sacrificing features such as live migration needed for for practical cloud management. The approach also has the additional benefit of allowing the virtual machines to perform near the level achieved without virtualization. Finally, the approach allows the addition of software protection techiques to automatically attached to the virtual machine using GrammaTech's KATE technology. This technology could be used by cloud providers to secure customer virtual machines and data from attack by other customers. It could also be used by enterprises to protect VMs which provide services which may be attacked, such as public webservers or other network accessible services. This technology can also help such enterprises satisfy regulatory requirements for risk reduction.

Covert Loading and Execution of Software Protections to Reduce Adversarial Detection
Air Force Research Laboratory - Rome
A number of software defenses exist that frustrate attempts to examine or tamper with a protected application. However, attackers are unlikely to approach the defenses head-on. One weak point is during installation and deployment of defenses. There is sort of a "who came first" game played between the protected application and the attacker. If the attacker arrives first, then they can observe the defenses as they are setup, gaining great insight into how the protections can be subverted. This opening must be prevented.
We envision a technique that enhances existing defenses by protecting the loading phase of the sensitive application. Under this system, attackers are denied access to system initialization and sensitive software is made inseparable from the OS, thereby preventing attack before protection can be raised against it.

Run-Time Process Monitoring
Air Force Research Laboratory - Rome
Malicious logic may be intentionally inserted into software as is the case with insider attack, or innocent mistakes may open vulnerabilities to worms and the like. We propose a double edged approach to inserting monitoring logic for machine code on all operating systems in order to ensure that: (1) Interactions with the operating system are consistent with the original code, and (2) the interactions with the operating system obey arbitrary security policies that may be specified at will by the user. Approach (1) restricts the program to its intended behavior; this prevents attacks from worms and viruses and offers protection against unanticipated attacks that cause the program to behave in an abhorrent fashion. Approach (2) prevents insider attack by explicitly disallowing malicious behavior that the attacker has managed to insert into the "intended" program behavior.

Data Sandboxing
DARPA
Data-rich applications, such as web servers, web browsers, and document editors, are inherently vulnerable to non-control data attacks, i.e., attacks that exploit memory corruption vulnerabilities to leak or corrupt sensitive data. In this project, we are building Dobby, a tool that statically transforms software binaries to ensure the integrity and confidentiality of the data they operate on by preventing illicit data flows caused by memory corruption errors. To minimize the impact of dynamic checking on software runtime, the tool will apply protections heterogeneously—stronger, and thus more expensive, protections will be used to secure the data that is security-sensitive, while low-cost, coarse sandboxing will protect the rest of the data. Additionally, Dobby will attempt to detect and isolate from each other private data contexts in data-rich applications, such as, independent requests handled by a web server or pages visited by a web browser.

ConSec
DARPA
Modern network composed systems—even the systems used in critical applications—incorporate many general-purpose commercial off-the-shelf (COTS) components that must be configured appropriately for the larger system to meet its operational requirements. The configuration of such systems is often done in an ad-hoc way with many critical parameters left in their factory settings, exposing unnecessary attack surfaces and weakening the security of the system. Under ConSec, we are building ConfINE (Configuration Identification, Normalization and Enforcement), a tool that enables rapid, rigorous, and largely automated generation and deployment of secure configurations for both industrial systems and critical military systems.
ConfINE analyzes a complex system — its implementation, user manuals, standard operating procedures (SOPs), and other available documentation—to infer formal specifications of required functionality and to generate a configuration-aware model of the system’s behavior. The system’s specification and model are used [via the technology built by other ConSec participants] to construct system-wide configuration settings that allow the system to meet its objectives while reducing attack surfaces and eliminating configuration-based vulnerabilities. ConfINE then automatically deploys the generated configuration settings throughout the system and continuously monitors their integrity. In this effort, GrammaTech collaborates with teams of researchers from LGS, SRI, and AIS.

Software Protection to Fight through an Attack
Air Force Research Laboratory - Rome
New vulnerabilities and attacks on software applications and the underlying systems are discovered daily. While most security research focuses on early detection and prevention of attacks, in reality, successful attacks will continue to be carried out. Recently, the Confiker worm demonstrated that standard software defenses can still be breached with ease. The next line of defense is placed at the application level. At that level, defensive techniques monitor application behavior for abnormalities (e.g., unexpected control transfers or unexpected sequences of system calls) and respond when an abnormality is detected. A typical response is to terminate and restart the attacked program. While such a response eliminates the threat of being compromised, it is not appropriate for many types of systems. In particular, for safety-critical systems, systems that must remain operational for long periods of time, and systems that use persistent data (such as file systems and databases), rebooting to restore a safe state (even when an attack is detected) is often not an option. Such systems need to be able to maintain a safe operating environment through attacks and environment faults.
A particularly nefarious class of attacks is the so-called non-control-data attacks. Instead of directly modifying targets of control-flow transfers, non-control-data attacks corrupt application data in a way that makes an apparently normal execution of the application carry out the attacker's goals. Work at the University of Illinois and NCSU demonstrated the existence and exploitability of non-control-data attacks in FTP, SSH, Telnet, and HTTP servers. It is hard to recover from non-control-data attacks because they may corrupt data structures whose consistency is crucial for the safe execution of a program and the underlying systems code. For instance, free-heap data structures maintained by operating-system code responsible for memory management, request queues maintained by web servers, and flight paths maintained by aviation controllers, are all essential to the correct execution of their respective systems.
There is a need for tools and techniques that allow critical applications to recover and maintain safe operational state while under cyber attacks and, in particular, under non-control-data attacks. We envision a tool that will allow critical applications to recover from attacks and remain operational. The tool will operate as follows: it will analyze the application offline to learn important data-structure invariants of the application. Online, the tool will monitor the operation of the program to detect anomalies that may be indicative of attacks or faults. Once an anomaly is detected, the tool will use the invariants learned offline to repair the corrupted data structures and restore the application to its operational state.

Deobfuscating Tools for the Validation & Verification of Tamper-Proofed Software
Air Force Research Laboratory - Rome
Recently, there has been an increase in the use of anti-tamper techniques (e.g., obfuscation) in all types of software. However, applying anti-tamper techniques is technically challenging, and when applied to large, sophisticated software, there is a danger of introducing subtle bugs, or not introducing sufficient protection. The existing state of anti-tamper technology is undesirable in that it (a) is much too effective at protecting (small) malware samples, but (b) does not offer sufficient guarantees of correctness and protection for (large) legitimate applications.
We propose a deobfuscation tool that uses machine-code analysis to check that the (self-protecting) program output from a tamper-proofing tool is indeed protected, and has the same behavior as the input program. This deobfuscator leverages concolic analysis techniques. Program analysis techniques can be divided into dynamic analyses that observe the subject program when executed on a set of inputs, and static analyses that consider all possible executions, without executing the subject program on any particular input. Each approach has its strengths and weaknesses. Concolic analysis combines static and dynamic analyses and leverage the strengths of each to counter the weaknesses of the other.

Software Protection to Deter Malicious Forensic Data Collection & Exploitation
Air Force Research Laboratory - Rome
Global networks bring both new power and new risks. Mission-critical applications are increasingly spread over national or global networks to enable information gathering and sharing on an unprecedented scale. While this large-scale information sharing promises new efficiencies, each endpoint in a network represents a new point of attack. Sensitive applications running on a network must cope with the possibility that some endpoints have been compromised and are gathering forensic data for malicious ends.
We propose a combination of technologies and techniques for protecting sensitive software from malicious forensic data collection. Our approach incorporates the following ideas: Offloading portions of the code to reconfigurable hardware; disguising sensitive applications as benign; using covert channels to communicate with relocated code; and providing artifacts that demonstrate the correctness of the protection.

Sanitizing Software of Malicious and Unauthorized Code
Air Force Research Laboratory - Rome
Software systems face the threat of reverse engineering. Given enough time and resources, a determined hacker can recover the design of a software program by examining its executable. The consequences of this can be dramatic: the hacker may gain unauthorized access to sensitive computer systems, allowing him to wreak untold havoc.
Under previous research, GrammaTech had developed a software protection platform called DARE in order to assist software developers in responding to the threat of reverse engineering. Due to the high quality of the IR (internal representation) constructed by the tool, a wide range of high-level transformations can be supported. These include transformations that relocate code and data in memory, fine-tune individual instruction selection, shuffle instruction ordering or basic ordering, and embed extra code inline with the original code. Such 'aggressive' code transformation capabilities can support a wide array of defensive techniques.
Unfortunately, the original DARE tool was hampered by the method in which it constructed its IR. Relying solely on disassembly as the source for information concerning a program's structure and behavior, DARE was often subject to an imperfect IR that resulted in applications that, while protected, often incurred critical semantic errors that rendered the protected application useless. This is due primarily to the fact that disassembly is an imperfect science. Even the best disassemblers make heuristic guesses about the structure of a program that can, from time to time, prove to be incorrect. Our experience shows that such errors are rare, but they are still prevalent enough that the IR constructed by DARE for any one program is likely to include at least one such error, and that is sufficient to bring a protected application to its knees.
The focus of this project was to build on GrammaTech's existing executable-rewriting technology (DARE) in order to develop an improved software protection platform. The platform is still capable of operating directly on executables, but is designed to be flexible enough to draw information about the executable's behavior and components from a variety of sources including source code, compiler information, and debugging information. By drawing on these other sources of information, the new system can eliminate much of the guesswork performed by a disassembler. As a result, the IR used by the tool is more accurate, thus avoiding semantic errors in transformed applications.

Light-Weight Virtualization as a Defense against Reverse Engineering
Air Force Research Laboratory - Rome
The victor in a conflict is often the party that can gain the "upper hand". In the battle between software protection and reverse engineering, however, the victor is the party that gains the "lower hand". We propose a novel software protection system based on hypervisors that gives the advantage to the defended application by lowering it below the operating system, effectively making the application inaccessible to attackers.

Trace-Based Disassembly
Air Force Research Laboratory - Rome
We propose to develop a trace-based disassembler and integrate it with our break-through binary analysis tool, CodeSurfer®/x86. A trace-based disassembler builds a trace of the instruction sequence that is executed at run time (during one or more runs of the program). The trace is analyzed to construct control-flow graphs for each of the procedures, which are then used to generate an assembly listing. This approach will make CodeSurfer/x86 applicable to binaries that employ anti-tamper techniques known as control-flow obfuscations. In particular, we will focus on the potential to unravel self-modifying code. CodeSurfer/x86 will also be used to drive a static disassembler to help "flesh out" the program listing generated by the trace-based disassembler.

Software Protection through Specialized Commodity Processors
Air Force Research Laboratory - Rome
The protection of intellectual property is a key problem being addressed both in public and private research. However, even with the amount of resources given to further research into the state-of-the-art, existing technologies are far short of being sufficiently protected against reverse engineering. Of particular concern is the use of software-only techniques, which are susceptible to a variety of known attacks through emulation, debugging, and disassembly. Additionally, hardware-only defenses are vulnerable to mimicry and circumvention; the most effective solution to the problem will evolve as a hybridization of both software- and hardware-based methods.
Modern computing systems are not monolithic devices, and contain several commodity processors that perform various operations such as network, sound and graphics processing. Monolithic software protections are limited in their effectiveness due to relatively low complexity; that is, the reverse engineer has a single program to attack, and must only emulate known system architectures and behavior if necessary. The work proposed herein will demonstrate the feasibility of offloading software protections to commodity processors using the existing anti-tamper framework at GrammaTech. In doing so, it will be possible to establish stronger and more efficient software protections due to the expanded computational abilities afforded by commodity hardware.

Hardware-Assisted Software Anti-Tamper
Air Force Research Laboratory - Rome
Defense research and development is increasingly using software to perform critical roles. It is important that these assests be protected. Great progress has been made in providing in-band protection of software, including on-going work at GrammaTech. However, strong protection requires defensive measures at both the hardware and software levels. By adding additional reconfigurable hardware components to a system, a reverse engineer is confronted with additional devices to decipher, more hardware requirements for analysis, and the added complexity of additional inter-device communication to understand.
We expect our Phase I effort to result in a prototype that will: (1) identify portions of a software executable that can be translated to an FPGA hardware environment; (2) perform sound binary-to-binary transformations to lift software algorithms out of a sensitive program and onto an FPGA.

Reverse Engineering Kernel Mode Rootkits
Air Force Research Laboratory - Rome
While the tools to combat malicious rootkits have evolved over the years, their effectiveness has paled before the increasing stealth of malware designed specifically for hiding from such tools. A next-generation reverse-engineering tool is needed to counter the growing threat of modern rootkits. This new forensics tool will utilize a powerful set of techniques to identify a broad class of malicious rootkit behavior.

Defense Against Reverse Engineering
Air Force Research Laboratory - Rome
Existing software systems face the threat of reverse engineering. Given enough time and resources, a determined hacker can recover the design of a software program by examining its binary. The consequences of this can be dramatic: the hacker may gain unauthorized access to sensitive computer systems, allowing him to wreak untold havoc. At worst, this may allow him to compromise national security, or to perpetrate a terrorist attack. A more common result is for the hacker to "crack" software protection, thereby enabling illegal and widespread dissemination of the intellectual property rights found in, or protected by, the hacked software.
GrammaTech is investigating tools and techniques for reverse engineering, and developing innovative defenses that prevent reverse engineering. To accomplish this work, GrammaTech is drawing on its expertise in static analysis, expertise in dynamic analysis, and experience building code understanding tools for C/C++ and x86 binaries. The commercial potential for strong software protection is significant. Software companies lose billions of dollars to software piracy. Furthermore, techniques and tools that offer protection against reverse engineering would be invaluable for increasing the security of software systems. Thus, there are both DoD and commercial applications.

Inserting Code into Firmware Images
United States Navy
We see an opportunity to develop a tool that enables developers to fix and update firmware for embedded systems by abstracting away low-level details of the target system. Presenting a source-level view of the firmware image and allowing the developer to create patches at that same level of abstraction will greatly reduce the barrier-to-entry for firmware modification and minimize the need for architecture-specific knowledge.
GrammaTech proposes SCALPEL, a tool for rewriting embedded firmware using high-level views of the unpacked target image. SCALPEL will enable a developer to understand and alter a device’s firmware by:
- Providing a simple API for patch development via a high-level, architecture-independent language.
- Providing a suite of locators to help identify points of interest in the target image.
- Presenting a readable, architecture-independent view of the system via decompilation to C.
- Automatically determining how to safely translate a high-level patch description to a sequence of low-level machine code rewriting operations.
Using SCALPEL to develop a firmware patch, developers will be able to focus on what effect the patch should have and where the patch should be applied — the myriad low-level details about how to safely implement the patch are carried out automatically by SCALPEL. By describing a patch in terms of source locations appearing in the firmware’s decompiled C code, it is even possible to develop safe firmware patches with SCALPEL without the developers’ knowledge of the underlying CPU architecture!

Multi-Abstractions System Reasoning Infrastructure toward Achieving Adaptive Computing Systems (OATS)
Department of Homeland Security (DHS)
GrammaTech proposes to develop a marketable tool for protecting and monitoring systems subject to cyber attack. This tool uses a combination of binary rewriting and system-wide event tracking to both harden systems and detect sophisticated attacks. GrammaTech’s binary rewriting technology is the result of multiple research projects on hardening systems, and is based on GrammaTech’s leading binary analysis toolkit. The resulting tool will allow government and industry users to detect and block cyber attacks that exploit software vulnerabilities not anticipated by the original software developers.

Static Tool Analysis Modernization Project (STAMP)
Department of Homeland Security (DHS)
GrammaTech, along with teammates Fraunhofer USA and Secure Decisions, propose to develop and implement a repeatable methodology for testing, evaluating, and modernizing existing open-source static-analysis tools. The selected tools, targeting multiple programming languages, will be greatly improved by this effort and the updated tools will be open-sourced. It is our position that commercial participation is critical for long-term adoption and viability of open-source static-analysis tools.
Our overall STAMPOut framework will prioritize tests, evaluation effort, and tool improvements based on the needs of the targeted languages, application domains, and strengths of the tools in question. We will improve static-analysis tools across multiple important languages, including static and dynamic languages.

WARP: Weaving Application-specific Runtime Policies in Embedded System Binaries
Air Force Research Laboratory
GrammaTech proposes WARP, a tool for Weaving Application-specific Runtime Policies into embedded systems binaries. Perimeter defenses such as firewalls or network filters provide insufficient security and generic inline defenses such as control flow or memory protection do not address application specific behavior. Instead, WARP will weave runtime checks enforcing userdefined application-specific security policies into the fabric of existing embedded systems firmware without requiring help from the original developers or access to source code.

Mnemosyne
Air Force Research Laboratory – Rome
Mnemosyne brings Program Synthesis research to the modern IDE, helping software developers write and maintain program code, tests, and types. Mnemosyne leverages statistical machine learning, formal methods, and Search Based Software Engineering (SBSE). Mnemosyne speaks LSP, meaning it integrates seamlessly with most popular text editors and IDEs. Mnemosyne is extensible, easily collaborating with varied program synthesis modules or "Muses."

Virtual Trusted Platform Module
Air Force Research Laboratory – Rome
Trusted platform module (TPM) devices provide the core root of trust for modern computer systems. These devices add trust in secure systems by supporting device identification, authentication, encryption, measurement, and device integrity. However, many systems are now executed in a virtualized environment. Currently hypervisor technologies either do not provide guests with the needed TPM functionality, or provide a limited and insecure virtual TPM approach.
GrammaTech’s virtual trusted platform module (vTPM) technology is being developed to provide the same level of security and features as that of a physical TPM to virtualized environments. GrammaTech’s solution will provide virtual machine clients in QEMU, Xen, OpenXT, and SecureView with either a TPM 2.0 device instance or TPM 1.2 device instance that, from the client’s perspective, is equivalent to a physical TPM. The technology fully emulates the physical device which means no special modification of the client to support our vTPM is needed. Of-the-shelf Linux and Windows virtual machine clients can immediately leverage our vTPM technology. This technology could also be ported to embedded or mobile environments in the future.
Although our TPM instances are virtual and can be migrated with the virtual machine, they are also secured using a physical TPM to prevent the use of the TPM instance outside of an approved system. Security is a key design requirement. This means that no vTPM data is shared between vTPM instances and the instance data is never exposed outside of the vTPM server, not even to the host system. The vTPM server itself is isolated in its own virtual machine and provides the actual TPM functionality to clients.
During Phase I, GrammaTech developed a secured virtual TPM server technology, which was leveraged by a modified QEMU system to provide multiple guests with virtual TPM instances. This version supported the TPM 1.2 standard. Also developed was a test methodology to support further development and testing of the system.
Phase II continues this effort to develop stronger security and new capabilities. As part of this effort, TPM 2.0 support has been added. Also being added is support for vTPM instance migration. Our technology has been ported to Xen and is in the process of being transitioned to OpenXT and SecureView hypervisor systems.
GrammaTech’s vTPM technology will enable sensitive systems which now operate on dedicated hardware to be virtualized without sacrificing the security of the data on those systems. Virtualizing these systems can lead to more flexible deployment options and more efficient utilization of resources. Further, this work is an effort to help support directives to use TPM enabled security in all DoD systems and enable advanced features such as secure boot and BitLocker in virtual systems. By making this technology available to the Air Force’s SecureView hypervisor, we believe the benefit can be leveraged across DoD interests.

CyDir: In-flight Rapid Cyber Diagnostics and Response for Mission Assurance
Air Force Research Laboratory – Rome
CyDir is a system for automatically detecting anomalous behavior in avionics and ISR systems. CyDir will automatically diagnose anomalous behavior’s cause and immediately direct a response to maximize mission success by minimizing the effect of the anomalous behavior (and any malware which caused this behavior). CyDir is specifically designed to detect and mitigate malicious behavior, but can also provide responses to hardware malfunctions, misconfigurations, and other non-malicious faults.
CyDir combines synchronous and asynchronous anomaly detection with a system reasoner and GrammaTech’s efficient monitoring and instrumentation technology. If CyDir is certain of a fault, it can prevent the attempted action from occurring, preventing malicious actions from occurring. Where the cause of fault is initially uncertain, CyDir passes information to the system reasoner until it can determine the proper response to mitigate the threat and maximize the probability of mission success. When the system reasoner is uncertain, it can present all information relevant to a potential fault to knowledgeable operators along with a recommendation for action. CyDir will then learn the proper response by observing the operator, increasing its future response capabilities.

Preventing Exploits Against Software of Uncertain Provenance (PEASOUP)
Air Force Research Laboratory - Wright-Patterson
PEASOUP is part of the Securely Taking On New Executable Software of Uncertain Provenance (STONESOUP) program, an initiative of the Intelligence Advanced Research Projects Activity (IARPA) Office of Safe and Secure Operations and administered by the Air Force Research Lab (AFRL).
STONESOUP seeks to address a key problem in today's world: How can we use software securely if we do not know how or by whom the software was created or where its component parts originated? Software is produced around the world; component parts come from many different places and are integrated into larger systems. The production of software increasingly involves contract software engineers and off-shore suppliers because it is often prohibitively expensive to generate a major system completely in-house. Accordingly, security-conscious users require ways to assure that the software they utilize performs no malicious actions. GrammaTech, Raytheon, the University of Virginia, and the Georgia Institute of Technology will combine state-of-the-art technologies that together will make a significant contribution to solving this problem.

Prioritization of Weapon System Software Assurance Assessment
Air Force Research Laboratory – Wright-Patterson
The cost and timeliness of deployment of weapons software may benefit greatly from the use of shareware, freeware, open-source, and COTS components. However, critical, safety-of-flight, and sensitive-data applications require higher assurance than can be expected of many commercial components.
To make the use of commercial software cost-effective, providing technical assurance of software quality (and assessing risk in its deployment) requires automatic tools. Unfortunately, no single existing tool captures the broad spectrum of security vulnerabilities that need to be analyzed to assess risk in sensitive use cases. In fact, all possible attacks against a component may not be known. Also, most security-analysis techniques do not scale to today's complete software systems. (Theoretical limitations exist on properties that can be detected or verified, as well.) Finally, security-analysis tools generally rely on the use of source code, which precludes evaluating shareware, freeware, and COTS components available in binary form only.
We propose to address the above challenges with a comprehensive solution based on a hierarchy of automated program-analysis techniques that provide varying levels of detail about the analyzed software. The coarser (and computationally cheaper) analysis techniques will provide rough estimates of risk; their answers will inform the choice of finer (and computationally more expensive) analysis techniques that will yield more precise estimates of risk, limiting human review to only the most suspicious components. A hierarchical solution is necessary for a technique to be applicable to non-trivial applications. Such a solution can focus the power of more expensive resources (deeper analysis, testing, or human review) on the riskiest and most complex components.

Analysis Tools for System Surety (ANTSS)
United State Air Force
This project will improve both requirements management and program analysis tools for critical software development and V & V. First, building from existing products and technology, GrammaTech will construct an integrated tool suite of program analysis and test management tools: checking compliance against software coding standards, checking for safety and security programming flaws, verifying software against formally expressed requirements, managing automated test execution and generation, and measuring test coverage. Tools that exist for C programs will be enhanced, adapted, and integrated, as needed, for the critical-software domain. Corresponding tools for Fortran programs generally do not exist and will be designed and prototyped to provide analogous functionality. Second, the requirements tracking and change management tool chain will be enhanced with automated consistency and change checking, based on a SysML infrastructure. In particular, we will connect formal verification tools, which are currently not integrated into requirements management processes, to other requirements management tools. The result will be an integrated tool suite that provides better requirements and test management and contains improved tools for C and Fortran program analysis, enabling engineers to be more efficient and more accurate in creating and certifying critical software.
The project will generate these new capabilities, as assistants to human creativity and insight in producing critical software systems: - The ability to automatically trace requirements and manage change across the application life-cycle tool chain, including into software coding and formal verification - An integrated suite of tools supporting program analysis and test management for both C and Fortran The effect of these capabilities will be more efficient and more accurate development and V & V of critical software. The immediate customers for the project are organizations performing nuclear-critical software development or independent V & V; the capabilities are equally beneficial for any safety- or security-critical software development. GrammaTech will integrate the technology resulting from this project into its current program-analysis product line and market it to its existing and new customers developing embedded software systems.

Cyber Grand Challenge
DARPA
Leading software analysis experts from GrammaTech and UVA came together to compete in DARPA's Cyber Grand Challenge, in which machines played an automated game of capture-the-flag in the name of cyber security research and development.
DARPA challenged the global innovation community with a $2M prize to build a computer that could hack and patch unknown software with no one at the keyboard. On August 4, 2016, DEF CON 24 hosted the world's first all-machine hacking tournament. Seven systems battled against each other using advanced autonomous technologies including: reverse engineering unknown binary software, authoring new IDS signatures, probing the security of opponent software, and re-mixing defended services with machine-generated patches and defenses. GrammaTech's team TECHx – with their robot Xandra – earned 2nd prize, securing a $1M prize.
To learn more about our work in Cyber Grand Challenge, click here.

Static Analysis Tools for SWRL
DARPA
The semantic web is a worldwide effort aimed at attaching machine-understandable semantic information to content, and to develop applications that make use of that information. SWRL (Semantic Web Rule Language) is a logic programming language that is becoming a standard for providing reasoning capabilities on semantic-web information. Like any other language, programs written in SWRL are not immune to programming flaws and security vulnerabilities.
Under this DARPA-sponsored SBIR contract, GrammaTech is developing a tool for allowing the static analysis of SWRL programs. This draws on work on the analysis of logic programs, and on GrammaTech's experience with the static analysis of sequential imperative programs. The tool will allow users to reason about the behavior of their SWRL programs and permit queries aimed at finding programming errors and security vulnerabilities. It will allow users to proactively assess the security and assurance properties of their programs before they are deployed. An additional benefit will be facilities for helping programmers understand SWRL programs, and for assessing the effect of proposed changes.
The tool will benefit the development of the semantic web by allowing deployers of semantic-web applications to reason about the behavior of their applications in advance. This will allow them to reduce error rates and identify security vulnerabilities.

Verification of Hierarchical Graph Structures
DARPA
Embedded systems software is being used for increasingly complex and safety-critical applications (intercontinental ballistic missile interception, landing a probe on a comet, unmanned flight control, etc). In order to ensure the safe and successful operation of these applications we must verify their safety and mission-critical properties, and reduce reliance on a testing process that can only detect the presence of faults. To ensure the safety properties of these complex systems we must augment traditional testing techniques with software verification.
GrammaTech believes that the current software validation process needs to be augmented with verification of safety and mission-critical properties. We propose to use model checking to address this verification need. Model checking is a verification technique that can be used to check cross-cutting concerns of a system such as liveness and deadlock-freedom. Model checking has been applied successfully to hardware designs, where the state space is flat. However, it has yielded only limited success in the software-verification realm due to the fact that the global state space of a program is inherently non-flat—it has a hierarchical form caused by the procedure call-return patterns corresponding to feasible executions of the program. Attempts to get around this by inlining procedures leads to an explosion in the size of the model to be checked, which causes severe scalability problems.
Exploitation of modularity has been recognized as a means of increasing the scalability of model checking. GrammaTech co-founder and President Tom Reps has exploited the inherent modularity in procedural abstraction in his work on dataflow analysis. Under this DARPA SBIR contract, GrammaTech is developing similar methods for model checking. GrammaTech is developing a novel technique for model checking of hierarchically-structured graphs based on the concept of context-free-language reachability. This technique will be applicable to cross-cutting concerns of programs. Such concerns are inherently global properties of the system, and hence must be inter-procedural in nature.

Multi-Lingual Dependence-Graph Components for Software and Hardware Analysis, Design, and Specialization
DARPA
With SBIR support from DARPA and ONR, GrammaTech is developing a toolkit of reusable, commercial-quality, multi-lingual components for building and operating on the dependence-graph representations of programs and program specifications.
Current analysis and specialization tools for software programs and hardware-design specifications use simple-minded methods that are too coarse-grained to provide adequate levels of precision and completeness. An extreme example of the consequence of imprecise and incomplete analysis is the Ariane 5 rocket failure, but less extreme examples are commonplace in both the hardware and software domains, and in the aggregate, just as costly. As systems become larger, more complex, and more expensive to correct or replace, new methods will be not merely desirable, but absolutely necessary. The need for better methods is an opportunity for GrammaTech to develop and market tools employing a new technology: deep-structure analysis. The successful introduction of this new technology to the commercial marketplace would benefit both industry and the DoD.
The difficulties inherent in the design and implementation of hardware and software systems stem from their size, their intricacy, and the interactions among separate entities within them — in short, from their enormous mass of detail. Because even the most minute detail can have a system-wide influence, it is notoriously difficult to understand, predict the behavior of, and modify software programs and hardware-design specifications. (For our purposes, both software programs and hardware-design specifications are considered "programs".) To address these issues, analytic tools are essential.
Historically, analysis and development tools have operated on the level of a program’s surface structure, with the simplest ones operating only on flat text. Tools based on abstract syntax are more precise and powerful, but even though GrammaTech has pioneered the development of such tools, we recognize that abstract syntax is still too coarse and superficial to provide more than the first steps toward a fully satisfactory analysis. Both interactive and non-interactive tools must move beyond the inherent limitations of surface-level methods.
There is a mismatch between the superficial analytic capabilities of today's interactive tools and the deep structure of the programs on which they operate; that is, their complex networks of interconnections and dependences. The reason that such tools are useful at all is that substantial manual effort can transcend their limitations. Skillfully employing clumsy tools, users build up and operate from mental maps of deep structure. Even for moderately complicated programs, however, users are often thwarted by obscure organization and the daunting mass of detail in the deep structure. Too much of the burden falls on human memory and reasoning power. For truly complex programs, analysis is so limited in scope and depth that it breaks down. More powerful tools are required both to inspect program structure with greater precision and completeness, and to organize the detail that is revealed.
Non-interactive tools are also limited by the lack of analytic depth. For example, consider a class loader for object-oriented code. Typically, only a fraction of the functionality of a general-purpose class is used by a given client, leading to the opportunity for specialization to achieve a smaller footprint and faster download time. Analysis at the level of a procedure call graph only enables removal of dead methods. Deep-structure analysis at the statement and expression level is required to identify and remove dead fields and dead code within live methods.
The need for better tools is common to both the DoD and commercial spheres, is present for all programming languages, and indeed is not restricted to software: hardware-design languages are structured symbolic systems and their analysis is subject to the same difficulties as software. The problem is fundamental: it reflects a commonality that cuts across all domains of computing, namely structural complexity that is inadequately addressed by surface-structure methods.
GrammaTech's approach to the problem is equally fundamental: to achieve the next level of sophistication, tools based on the semantics of programming and hardware-specification languages will operate directly on a representation of the program’s deep structure.

Dependence Graphs for Information Assurance of System
DARPA
Under the DARPA Information Assurance Science and Engineering Tools (IASET) Program, GrammaTech is developing a tool for static analysis of dependencies in systems, and mechanisms that exploit such dependence models to address Information Assurance & Security (IA&S) needs during design and assessment.
A system's information assurance and survivability (IA&S) requires the prevention of certain information flows and the preservation of other flows. For example, we must prevent flows of classified information to unauthorized users, and we must preserve flows of essential command and control information even when some subsystems fail or are selectively shut down in response to an attack. Dependences among and within system components provide a good basis for understanding which information flows in a system are possible, and which are not. Under IASET (Design and Assessment Tools), GrammaTech proposes to develop SystemSurfer, a tool to model dependences in systems (both inter- and intra-component), and the Information Assurance Workbench, a tool to support reasoning about those dependences to meet IA&S needs.
SystemSurfer will enable an analyst to browse and manipulate the system dependence graph (SDG) for a system. SDG vertices represent the individual actions (e.g., program statements as well as events) and predicates (conditions affecting whether an action is taken) of all the components, hardware and software, making up the system. SDG edges represent the dependences among the actions and predicates, including data and control dependences, and other system-level dependences such as synchronization and communication dependences.
SystemSurfer will support heterogeneous models, with components defined by UML, code, and HDL descriptions. Design-centered users will view SystemSurfer as a smart OO modeling tool that allows some components to be already implemented in code. Code-centered users will view SystemSurfer as a smart code browser in which the external environment can be modeled by the finite-state diagrams of an OO modeling tool.
The key technical challenge is to extend CodeSurfer®, our existing dependence technology for sequential programs, so that it is effective for concurrent systems. Ongoing improvements to CodeSurfer, funded by GrammaTech IR&D and a DARPA SBIR Phase II contract, are complementary and will directly benefit SystemSurfer. But concurrency is a new dimension to the problem of precise modeling of dependences, and presents separate challenging problems.
The Information Assurance Workbench will provide specialized operations for information assurance analysis, and will be layered on SystemSurfer's open APIs and scripting language.

Critical Vulnerability Discovery Using Big Code
DARPA
Software vulnerabilities with known exploits pose an imminent threat to software systems, needing urgent attention. "Using Components with Known Vulnerabilities" is one of the critical software flaws identified in the security advisory list OWASP Top 10. A recent Verizon report on data breaches found that 99.9% of the exploited vulnerabilities were compromised more than a year after the vulnerability was made public. Most modern software systems have significant third-party dependencies, where source code is potentially unavailable, and developers and release managers are often unaware of lurking exploitable bugs in these dependencies.
Effectively finding and highlighting these known critical vulnerabilities is very difficult using only existing traditional program analysis techniques because: they suffer from scalability and precision issues due to the lack of source code, they only check a limited set of vulnerability classes, and they are not geared to look for known vulnerabilities. GrammaTech proposes Discover, a vulnerability assessment tool to scan software binaries to find known critical bugs that are potentially susceptible to existing exploits. Discover works by building a database of known critical vulnerabilities, along with an array of their representative machine code features, and using machine learning to train a system so it effectively searches a given collection of software binaries for known vulnerabilities.

Bug Injector: Injecting Vulnerabilities for Configurable Cyber Defense
DARPA
GrammaTech is working on BUG-INJECTOR, a tool for generating cyber defense evaluation benchmarks. BUG-INJECTOR works by injecting vulnerabilities into existing software. BUG-INJECTOR is highly configurable, providing users the transparency required of a tool used to compare commercial products and perform security audits, and the customizability to enable focused evaluation of specific defensive tools, host programs, domains, classes of vulnerabilities, and even specific vulnerabilities. BUG-INJECTOR is implemented independent of leading cyber-defensive techniques, thus avoiding the circularity in which technical shortcomings limit the generated benchmark used to evaluate those very same techniques resulting in a false sense of security.
Computational systems are increasingly ubiquitous, networked, and subject to attack. The rise in cyber-attacks has spurred research and development of cyber-defensive tools. The corresponding ability to evaluate cyber-defensive tools both in general, and in specific operational scenarios is now needed. Existing cyber-defensive test suites are artificial, insufficient, static, and ad hoc, typically used beyond their original intent, and lacking the transparency and customizability needed to evaluate quality, guide development, and enable decisions in the diverse and quickly moving field of cyber-defense. For example, many government funded research programs are forced to manually build custom one-off benchmarks in order to evaluate their performers, a time- and labor-intensive effort requiring high skill levels. A Tool like BUG-INJECTOR which automates benchmark construction will permit more thorough and customized evaluation of commercial products and research results at lower cost.
BUG-INJECTOR injects bugs into dynamic traces. A "guess and check" heuristic evolutionary search allows BUG-INJECTOR to "get ahead" of existing and future cyber-defensive tools (i.e., inject bugs into concretely observed states which may be difficult to predict abstractly), ensuring the injected bugs are *independent* of leading cyber-defensive techniques, and resulting in a proof of vulnerability (PoV) (i.e., the input which caused the dynamic trace) for each injected vulnerability. Implementation on top of Clang's libtooling ensure robust parsing and manipulation of C/C++ source. I highly-configurable design increases end user trust of and control over the resulting benchmark.

GrammaTech Application Security Platform (GRASP)
DARPA
The GrammaTech Application Security Platform (GRASP) is a tool for enabling app store administrators to specify custom security policies. GRASP will automatically re-write mobile application code to ensure security policies are enforced at run-time, using a combination of static analysis and injected run-time instrumentation. GRASP will Leveraging GrammaTech's experience on performing static analysis on ARM binaries in order to target Apple's iOS platform. Many of the components of GRASP will nevertheless be target-neutral, making it a product that can inject new security policy enforcement mechanisms into arbitrary desktop or mobile application code. GRASP will make extensive use of GrammaTech's state-of-the-art binary analysis and rewriting capabilities, ensuring that administrators can add customized security policy enforcement and monitoring to COTS applications, even in the absence of the original source code.

Regenerative, INtent-Guided Systems (RINGS)
DARPA
BAE Systems, together with GrammaTech, MIT (led by Adam Chlipala) and UPenn (led by Oleg Sokolsky), proposes Regenerative, INtent-Guided Systems (RINGS) to develop an adaptive regime of regenerative programming. Regenerative programming allows an executing program to self-diagnose changes in its operational environment and automatically adapt its code to accommodate those changes. Our approach combines recent advances in formal specifications with emerging techniques for generating code variants. To detect changes in the ecosystem and/or failures in execution, RINGS incorporates runtime monitoring and verification using an intent language to guide code regeneration. Collectively, these elements promise to both increase availability of systems and reduce programmer maintenance costs.

A Framework for Aspect-Oriented Programming of Embedded Systems
DARPA
Commercial programming frameworks for embedded software that enable aspect-oriented software development and property-based debugging will require semantically rich deep-structure program representations. Such representations convey the preprogrammed generic analyses, and enable the subsequent application-driven analyses and transformations, that are needed to support weaving and composition of crosscutting software aspects. In contrast, it has been expedient, until now, for researchers in aspect-oriented programming to use only the simplest surface-structure representations of programs.
Research in aspect-oriented programming, which is still in its infancy, will advance considerably in the coming years, spurred on, for example, by DARPA's investments in the PCES project. We fully expect that effective mechanisms for aspects, such as concurrency, efficiency, effective use of the memory hierarchy, etc., will emerge in the next 2-4 years. Tools that support the automation of such mechanisms will have the potential to cut costs and errors dramatically throughout the embedded systems arena.
Under DARPA's Program Composition for Embedded Systems (PCES) Project, GrammaTech and its partner the University of Wisconsin (Madison) are studying how GrammaTech's dependence-graph technology can be adapted and applied for aspect-oriented programming.
Application of Horwitz-Prins-Reps program integration (which is based on program dependence graphs and slicing to the problem of merging independently woven aspects has particular appeal. Its potential benefits include automatic detection of interference between aspects, automatic merging of non-interfering aspects, and reduced occurrences of aspect interference.

Static Analysis of Multi-Core Applications
DARPA
The goal of this project is to develop a static analysis tool that identifies program flaws that may arise when executing on a multi-core processor. In particular, we will target flaws that arise in lock-free algorithms. Concurrent programs are often plagued by race conditions on shared data. Lock-based solutions to this problem are conceptually simple and work reasonably well. However, they typically do not perform well for the fine-grained concurrency that is often needed on multi-core processors. As the number of processor cores increases, lock contention also increases and many processors are left idle.
Lock-free algorithms achieve much greater processor utilization (and better performance) as the number of cores increases. However, lock-free algorithms are difficult to reason about. (To make matters worse, multi-core processors often use a relaxed memory model.) This project will focus on using bounded model checking in order to verify libraries that implement lock-free algorithms.

GenPatcher: Automatically Evolving Invulnerable Systems
Department of Homeland Security (DHS)
Modern civilization relies on a network of embedded devices that are riddled with security vulnerabilities and remotely hackable. Unfortunately, there seems to be near universal neglect for ensuring the security of these devices. Moreover, the existing IT solutions, such as antivirus software and intrusion-detection systems, do not provide substantial benefits for protection of embedded systems.
To address this problem, GrammaTech is developing GenPatcher, a fully automated end-to-end solution for securing embedded-systems software. To create GenPatcher, GrammaTech is integrating ongoing research on vulnerability discovery and patch evaluation with recent breakthroughs in genetic program repair. The key advantage of GenPatcher is its ability to operate completely autonomously, without requiring human interaction. Thus, GenPatcher will significantly reduce the costs of vetting and deploying critical software.
GenPatcher will operate directly on software binaries, even in the absence of source code, protecting newly developed software, third-party codes, and legacy software equally well. The technology will also be easily retargetable to different instruction sets to accommodate a variety of platforms employed in the embedded systems domain.

Multi-Platform Program Analysis
Department of Homeland Security (DHS)
The current generation of advanced static-analysis tools find vulnerabilities by exploring all possible executions of a program as configured for a single platform. Phase I research (CodeSonar® with Metronome) confirmed that a significant number of platform-specific defects may be missed if analysis is restricted to a single platform.
The next quantum leap in capability will be a system that will explore all executions for many different platforms simultaneously. We propose to develop such a system by combining a number of state-of-the-art techniques. Novel continuous integration technology will allow distribution of concurrent analyses across a farm of heterogeneous machines. Advances in our static-analysis engine will exploit machine-code analysis to ferret out subtle platform-specific differences in behavior. Intelligent test-case-generation technology will find test inputs that trigger platform-specific defects. The results of these analyses will be collated, filtered, ranked, and presented to the analyst as a single combined report. The resulting analysis system will appeal to software producers in many market segments, including communications, medical electronics, avionics, and industrial control.

Concolic Testing with Metronome
Department of Homeland Security (DHS)
We propose to build a system that combines novel automatic test generation techniques with state-of-the-art multi-platform continuous integration technology. The proposed system will automatically generate test data by using a combination of symbolic and concrete executions to intelligently explore the space of inputs. The continuous integration technology will enable the system to detect defects very early in the development cycle.

CodeSonar with Metronome
Department of Homeland Security (DHS)
The current generation of advanced static-analysis tools find vulnerabilities by exploring all possible executions of a program as configured for a single platform. The next quantum leap in capability will be a system that will explore all executions for many different platforms simultaneously. We propose to develop such a system by combining a number of state-of-the-art techniques. Novel continuous integration technology will allow distribution of concurrent analyses across a farm of heterogeneous machines. Advances in our static-analysis engine will exploit machine-code analysis to ferret out subtle platform-specific differences in behavior. The results of these analyses will be collated, filtered, ranked, and presented to the analyst in a single combined report.

Model Checking Software Binaries
Department of Homeland Security (DHS)
There is an urgent need for better tools and services to detect security vulnerabilities in software. The potential market includes software consumers as well as producers. Producers need to harden products during development and QA—for competitive advantage, and to reduce costs (in dollars and reputation) of repeated security patches during deployment. Consumers need better assurance that the COTS products they use (often in secure and critical information systems) are trustworthy—to reduce costs and risks from vulnerable software.
Under this Homeland Security Advanced Research Projects contract, GrammaTech is developing a software analysis tool that will enable third parties to examine software packages for security weaknesses. The tool will use model-checking techniques to verify security properties of software binaries. The feasibility of software model checking (for a large class of queries, and on large programs) has been demonstrated on source code, but there are many advantages to performing such an analysis on binary code. In particular, the binary code is more complete (because libraries have been linked in, or are otherwise available) and more accurately reflects what the program will do (because compiler-specific artifacts are visible).
GrammaTech will subcontract portions of the effort to its research partners, the University of Wisconsin and Symantec Corporation. This contract was awarded by the DHS Science and Technology (S&T) Directorate.

Insider Threat (Information Assurance)
Missile Defense Agency
Insider threat has been an increasingly important problem given low employee morale in today's tight economy and the increasing reliance on third-party software (COTS components) that are generally only available in machine-code form and are in many cases produced oversees. We believe that a key to addressing the problem of insider threat is the use of powerful program analyses in order to automate and simplify the jobs of security analysts as much as possible.
Further, we believe that no solution to the problem of insider threat is complete, unless it addresses the possibility of insider attacks embedded in COTS components. We propose to build on several GrammaTech technologies for static and dynamic program analysis in an effort to identify insider threat in source code, as well as in machine code.

Anti-Tamper Technology for Missile Defense
Missile Defense Agency
A reverse engineer who has the ability to examine software in an off-line lab has an enormous advantage over one who is forced to work "in the field". In contrast to the software's intended environment, a reverse engineer can work undetected, and can more easily make use of analysis tools such as debuggers or emulators if they have fine control over the system that is executing the software. One of the most challenging problems in protecting software is ensuring that it cannot be executed within such an environment.
We propose to develop a new protection system that tightly binds software to a specific hardware environment. The system relies on taking a fingerprint of unique characteristics of the hardware such as timing. Building on GrammaTech's existing tools, DARE and gtSDT, this new system will bind software such that it only operates correctly when run on hardware exhibiting the correct fingerprint, thereby preventing the software from executing if the underlying hardware is altered or replaced.

Automated Cybersecurity Testing Tools
Department of Defense
Software systems that contain known vulnerabilities are readily exploitable. Such vulnerable systems pose an imminent security threat and need urgent attention. Most modern software systems have significant third-party dependencies, where source code is often unavailable or incomplete, and both developers and software users are often unaware of lurking exploitable bugs in these third-party binary components. Multiple studies have shown that computer systems are often riddled with known vulnerabilities, and are exposed to an increasing number of attacks. Large number of known vulnerabilities are publicly disclosed in various openly-available databases, and such information is growing rapidly. According to the Verizon 2018 Data Breach Investigations Report, cyber-criminals are continuing to exploit such known vulnerabilities.
Once a vulnerability is made known, there is a race between malicious actors trying to exploit the vulnerability, and the defenders of critical infrastructure. Deployed systems must be continuously scanned for such known vulnerabilities, and repaired with patches before the attackers breach them. We are building Discover: a continuous and deep binary scanner for identifying the presence of known vulnerable components through binary software composition analysis. Discover will "scan what you execute", by deeply understanding the binaries and their library dependencies, without the source code, build information, or any other metadata. By using a combination of lightweight binary analysis and machine learning, Discover scans binaries for presence of n-day vulnerabilities, providing a comprehensive and up-to-date cybersecurity assessment, and helping users prioritize where to focus on. As new vulnerabilities are disclosed, Discover will quickly find which of the previously scanned binaries are affected by these new vulnerabilities, thus rapidly locating high risk code in software systems.

A Refactoring Environment for Ada
Missile Defense Agency
The missile defense community currently has millions of lines of Ada code in systems that are still undergoing development, or that must be maintained for decades to come. Of these systems, THAAD is among the most highly visible. Unfortunately, the use of Ada is in decline. Fewer programmers are being trained in Ada, and many vendors have cut back on their tool support. However, in few cases does it make both economic and technical sense to migrate to a new language. More often, code written in Ada must continue to be developed and maintained.
Engineers working with Ada should have the best tools available for working with legacy code. Such tools should help with program understanding, software assurance, and the disciplined modification of programs to adapt to evolving needs. For this contract, GrammaTech developed a toolset to help with modifying legacy code in a manner that reduces the risk of introducing flaws to a minimum.
In this project, we retargeted GrammaTech's CodeSurfer® and CodeSonar®to handle Ada programs. We also implemented an interface between CodeSurfer and ASIS – a standard for retrieving semantic information about Ada programs. This work has applications beyond refactoring. For example, CodeSonar/Ada can automatically find flaws in large bodies of Ada code.

Ballistic Missile Defense Innovative Anti-Tamper Techniques
Missile Defense Agency
Adversaries reverse engineer weapons systems to replicate a system's advanced capabilities or discover its weaknesses. Many of these critical systems are real-time systems. The restrictions of real-time have a significant impact on the implementation of anti-tamper technology.
This work will build on our prototype for transforming source code and machine code to add state-of-the-art protection against reverse engineering. An optimizing planner will determine how to best apply the defensive transformations in order to maximize the degree of protection without violating the real-time constraints. We will also investigate transforms that are directed specifically toward protecting architectural features of real-time platforms.

Advanced Static Analysis for Software Assurance
Missile Defense Agency
Software continues to be deployed with large numbers of flaws. Existing approaches for detecting flaws in software are mostly dynamic: they rely on the executing the software on a particular set of inputs. In contrast, static approaches consider all possible executions of the program. Static approaches have achieved some success, but to date have not realized their full potential because they are based on analysis of superficial surface structures, are not interprocedural, not whole-program, and are blind to aliasing effects.
We have developed highly-advanced static analysis technology for reverse engineering that addresses these shortcomings, and we now propose to apply that technology to finding flaws in software. Our technology computes the dependence graph representation of programs, which captures their semantics at a much deeper level. We propose a tool that will address a wide range of flaws including resource mismanagement errors, failure mode checking, division by zero defects, and illegal conversions. The challenge is to achieve accuracy (fewer false positives) and completeness (fewer false negatives), while maintaining the ability to scale to very large programs. We propose to prototype the system, and develop an experimentation plan to measure its efficacy. The proposed tool will be used to significantly enhance the quality of software systems. It will allow software developers to develop higher-quality software with lower software assurance costs.

Specification Editing and Discovery Assistant for C/C++ Software Development (SPEEDY)
NASA
GrammaTech will prototype a specification editing and discovery tool (SPEEDY) for C/C++. The tool will assist software developers with modular formal verification tasks by providing active user interface guidance in writing and editing software specifications, integrated into a common, open IDE (Eclipse). Built on an architecture that will unify source and machine code verification, the tool will provide automated suggestions of specifications for given contexts. It will be packaged as a plug-in to Eclipse's C/C++ development environment.
The result will be a tool that facilitates using formal methods by all software developers, improving efficiency and accuracy. The resulting specifications will also serve as machine-readable documentation of the software, simplifying and accelerating the task of independent V&V.

Continuous Integrated Invariant Inference (Ciii)
NASA
The project will develop a new technique for invariant inference and embed this and other current invariant inference and checking techniques in an easy-to-use tool. The result will enhance an engineer's ability to use formal methods — generating, editing, reviewing, proving and testing invariants — and improve productivity in verification and validation of safety and correctness properties software.
Currently, invariants that represent such properties require extensive human effort to write; automated techniques, though improving, are still insufficiently capable of automatically inferring them.
The project will develop innovative techniques to infer logical invariants describing the behavior of individual software modules by combining static (analyzing a program without running it) and hybrid analysis (inferring invariants from observations of executing software). In particular, the project will (a) combine concolic execution and hybrid analysis to find candidate invariants from high-branch-coverage test suites, (b) apply that combination to obtain invariants for individual functions and data structures, (c) iterate the analysis to broaden data coverage of the test suite and improve the accuracy of invariants, and (d) create early prototypes and development plans to integrate the resulting tools in selected IDEs (Eclipse and GrammaTech's CodeSonar® tool).

Static Detection of Bugs in Embedded Software Using Lightweight Verification
NASA
Validating software is a critical step in developing high confidence systems. Typical software development practices are not acceptable in systems where failure leads to loss of life or other high costs. Software best practices for high confidence systems are often codified as coding rules. Adhering to these practices can increase software readability and predictability, thereby enhancing quality.
However, adherence is limited by the lack of high-quality tools to measure adherence automatically. Checking rule conformance requires a diverse set of software analysis technologies, ranging from syntactic analysis to sophisticated inference of runtime behavior. By combining lightweight verification techniques with other scalable analysis techniques that target syntactic and other static properties, we will create a tool that flags violations for almost all the rules typically applied to high-assurance code.
The presence of a tool to define and check coding rules will make it easier for engineers and project managers to apply best practices to projects that would otherwise ignore them because manual inspections are too cumbersome. We expect the Phase II work will be applicable in any industry that develops high confidence software. Department of Defense projects often adopt their own coding standards for safety-critical software (for example, the Joint Strike Fighter project). Defense contractors can use the proposed tool to identify non-compliant code cheaply, increasing productivity and software quality. Other industries that develop high-assurance code, such as the automotive, medical device, and banking industries, can apply the tool to enhance their own development processes.
The proposed tool will be applicable to all NASA centers and contractors that develop or validate flight software written in C/C++. Flight software must meet a higher standard of quality than typical general-purpose software, and a common technique for encouraging quality software is by applying coding rules. For example, Holzmann's Ten Rules are being adopted for mission-critical flight software across JPL. The proposed tool will search source code and flag those parts that do not conform to the rules.

Static Analysis for Automatic Differentiation
NASA
Scientific computing is an important branch of computer science with applications in many different industries including aerospace, defense, automotive, financial, environmental and energy. Scientific computing algorithms such as optimization, nonlinear equation solving, and modeling require the use of derivative functions.
There are many options for computing these derivatives. They can be created manually, but this is highly tedious and error prone. They can be created symbolically using a mathematics package such as Maple or Matlab, but this is not directly applicable to computer programs written in a language such as C, and either fails or becomes computationally infeasible for large programs. They can be approximated using finite differencing, but this can be inaccurate and slow.
Automatic differentiation (AD) (also known as computational differentiation or algorithmic differentiation) has emerged as an alternative technique for computing derivatives. Given a program that computes a numerical function F(x), an automatic differentiation tool creates a related program that computes F'(x) (the derivative of F). Automatic differentiation works by systematically applying the chain rule of differential calculus at the level of the built-in operators and libraries of the language. Because of this, it is not subject to the same numerical errors as finite differencing.
Recently, automatic differentiation has seen a surge of interest, corresponding to the appearance of automated techniques for applying it. Its advantages are clear and proven. However, it is still used in only a handful of research laboratories. The low level of uptake can be attributed to the fact that the tools are still in their infancy, and still suffer from scalability, usability, and performance problems.
Automatic differentiation can be implemented either as a source-to-source translation, or by means of overloaded operators and reinterpreted operands. The advantage of using overloaded operators is that it is easy to implement and the source code does not need to change much. However, it has the disadvantage that it is inefficient. There are two main sources of inefficiency in the overloaded-operator approach. First, the approach is context-insensitive —it cannot exploit local context-sensitive information to improve efficiency. Second, the technique must be applied globally to all statements, regardless of whether they contribute to the computation of the derivative or not.
The great advantage of source-to-source translation is that analysis of the source code can be used to greatly improve the efficiency of the derivative. The chief disadvantage is that static source-code analysis tools and source-to-source transformation tools are generally large, complicated, and difficult to implement.
GrammaTech is a leading provider of static analysis tools for computer programs. Our technology computes the dependence-graph representation of a program, in which program points such as statements, functions, parameters, etc., are joined by an edge if there is a data or a control dependence between them. Queries are provided for inspecting the graph—for example, a slice takes as input a set of program points and returns all program points that contribute to the values computed at those input points. The query algorithms are not simple straightforward graph reachability—they must only return results that correspond to feasible executions of the program.
Many of the questions that must be answered in order to do efficient AD can be cast in terms of queries on the dependence graph. For example, identifying the set of variables that contribute to the derivative can be done by doing point-to-point graph reachability between the independent variables and the dependent variables.
We propose to develop a system for automatic differentiation using source-to-source transformation methods. The system will make use of our advanced static analysis capabilities to generate efficient derivatives.

A Refactoring Environment for Incremental Migration of Legacy Code
NASA
The rate of rapid change in the computing industry in the last few decades has meant that systems with long lifetimes often rely on software written in languages that are no longer being actively or widely used. Typical of such systems are large military and aerospace applications such as the Air Force's B2 bomber, which relies on software written in Jovial.
Similarly, many Navy applications are programmed using CMS2, and NASA has systems written in HAL/S. Older systems, such as for the Navy's E2C aircraft, are even written in custom assembly languages for processors that are no longer manufactured. None of these languages are used in the mainstream. Programs written in these languages are expensive to maintain for several reasons.
We propose to develop a system for the incremental transitioning of programs from one language to another that uses Refactoring as a fundamental operation. The system will consist of three parts. A legacy source refactoring editor will allow a user to transform selected portions of the code written in the legacy language into a form suitable for translation to the new. An incremental translator will translate these selections into the new language. Finally a target language refactoring editor will allow the user to transform the translated code so that it is understandable and maintainable.
The resulting tool will be of use not only for translating code from one language to another, but also for helping maintain code that is not translated.
We propose to develop technology that will be applicable to many languages. However, for this project, we will focus on the translation of Ada to Java. Despite its clear technical superiority to many other languages, Ada (at least in the U.S.) has the reputation of being a niche language. Tools for Ada have traditionally been much more expensive than similar tools for more popular languages. Trained developers are in short supply, and since the advent of Java, few schools teach Ada, so the supply is getting tighter. Since the withdrawal of the congressional Ada mandate that all defense contractors use Ada for military systems, many contractors who had been chafing at the requirement are now starting to transition their systems to other languages. Although this is a sad state of affairs for the Ada market, it represents a significant commercial opportunity for legacy code translation tools.

Inlined Reference Monitors for Java Bytecode
NIST
The problem of information security has become critical because of the growing dependence of the economy on complex networked information systems. Specification and enforcement of security policies is difficult even when policy-setting authorities have complete control over and knowledge of the target software. In an environment where mobile code is being used, security policy enforcement is even more difficult because little is known about the code being executed.
Under an NIST SBIR contract, GrammaTech will develop mechanisms for specifying and enforcing security policies for mobile code that work by inserting fragments of code into programs in order to monitor their state and prevent them from violating security policies. The proposed system will allow arbitrary policies to be specified independently by different policy-setting authorities. We will apply this approach, named Inlined Reference Monitors (IRMs), to Java bytecodes. We believe that advanced static-analysis techniques, in particular those embodied in our own dependence-graph technology, are crucial to allow this to be done efficiently and fully automatically.
The IRM approach is important because administrators and users can transparently tailor policies on a per-application basis, without requiring access to source code or operating system internals. We believe that this approach holds great promise for enforcing security policies, and propose to transition these techniques from academia and commercialize them for widespread use.

Crowdsourcing Static Analysis Annotations
National Science Foundation
As powerful and useful as modern static analysis tools are, their impact could be even more significant with the addition of comprehensive annotations on many widely used libraries. Static-analysis tools analyze the code that is provided to the analysis. So, when the analyzed code interacts with an external module – or system call, or hardware device – the analysis tool must either make assumptions about how that module operates or rely on annotations that describe the behavior of the external components. For this reason, the effectiveness of a static analysis tool grows with the availability of correct annotations.
Library annotations are necessary for almost any check that a static analysis tool might apply. Sometimes, the need for annotations on the library interface can be addressed by supplying the source code for the library, but source code is not always available and some library code is too complicated to analyze automatically. Maximizing the effectiveness of state-of-the-art flaw-finding tools requires a large body of annotations for the multitude of libraries in use today.
The benefits of annotations in the process of creating secure software are known, but the practice needs to be invigorated through a combination of tools and support systems. Within this project, we will devise tools, technology, and infrastructure that will support and encourage participation in the creation and maintenance of software annotations.

Defenses Against Malicious Code
National Science Foundation
Malicious code, including worms and viruses, causes huge economic damage every year. Estimates of the worldwide economic impact of malicious-code attacks vary, but Computer Economics, a leading independent research organization, estimates that the impact was $10.7 billion in 2002.
The Code Red worm (2001) infected fewer than 7,000 machines during its first three hours. In contrast, the Blaster worm (2003) infected over 100,000 machines in its first three hours. Research conducted by Weaver at U.C. Berkeley suggests that attackers may soon launch worms capable of infecting all vulnerable hosts on the Internet in 15 minutes to an hour. Paxson and Weaver estimate that a worst-case worm could cause as much as $50 billion in economic damage in the US.
The key idea of this proposal is to protect programs in advance by making system calls inaccessible to an attacker. System calls are the mechanism by which processes interact with their environment (e.g., to do input or output, create new processes, change permissions, etc.). Malicious code usually works by taking control of a legitimate process and using the process to either execute attack code directly or spawn another process that executes attack code. Both approaches rely heavily on the ability to use system calls. If system calls are inaccessible to an attacker, it will be extremely difficult to use a hijacked process to cause harm.
The proposed system will restrict access to system calls using a multi-layer hardening strategy that employs security in depth: remapping of system calls, dual (source and binary) obfuscation, and monitoring run-time behavior by comparing it against an expected, pre-computed model.
The technology will perform transformations on a program's binary and source code (if available). Scrambling the system call mapping can be performed through a binary-to-binary transformation. Good obfuscation can be performed with binary-to-binary transformations, as well. However, better obfuscation can be performed when source-to-source transformations are added to the binary-to-binary transformations, resulting in a higher degree of assurance. A better model of a program's behavior can be constructed from source, as well.

An Aspect-Oriented Unit Testing Solution
National Science Foundation
As software systems become more complicated, the problem of testing them effectively gets harder. Unit testing is a methodology for testing small parts of an application independently of whatever application uses them. It is time consuming and tedious to write unit tests, and it is especially difficult to write unit tests that model the pattern of usage of the application they will be used in. It is common for an application to expose a bug in a module that the unit tests for that module missed entirely. A better approach is to monitor the application as it runs its regression tests, and to capture the events at the boundaries of the module of interest. This log of events can then be replayed to a test harness that exercises the module independently of the environment in which it was originally tested.
A system for implementing such a strategy would need to analyze the application code, identify the sites that constitute the boundaries of the module, and then essentially rewrite the application code to capture occurrences of such events and log them to a file. Such a system would be highly complicated and expensive to produce. Fortunately, we can leverage work done in a field of computing that addresses just this issue-the field of Aspect-Oriented Programming.
Aspect-Oriented Programming (AOP) addresses the problem of separation of concerns in programs. An aspect is defined as a crosscutting property of a program. For example, an aspect might be the property that states that all exceptions must be logged to a file. Traditional programming methodologies require that the code to implement the aspect be replicated everywhere the aspect should be enforced. This makes it tedious and time consuming to do in the first place, or to modify after the fact. AOP is novel because it allows the user to specify the aspect in one location, independently of the target code. An aspect weaver is software that then merges the aspect with the original code.
The unit test problem is one that is well suited to AOP. Roughly speaking, the aspect would specify that all events at the boundary of a module, e.g., a call to a method of a class in that module, would be logged to a file. The weaver would insert the code to do the logging into the application. Then, when the application is run with its own test data, its input and output to the module would be captured.
The pre-eminent tool for doing AOP is AspectJ, which is being developed by Gregor Kiczales (the inventor of AOP) and his team at Xerox PARC. Under an NSF SBIR contract, GrammaTech is studying how AspectJ can be used to develop a tool to capture an application's interaction with a module, then to allow that interaction to be played back to the module to confirm that its behavior is identical.

Dependence Graphs for Dynamic Internet Technologies
National Science Foundation
The Internet has been a significant force in propelling a strong U.S. economy for the last decade by enabling e-commerce and increasing industrial productivity. The Internet technologies (Java, Active Server Pages, Web scripting languages, etc.) that have enabled this boom are used in a manner that is quite different from more traditional technologies. They are highly dynamic and interpretive, and programs often involve concurrency.
However, the software tools used to construct these systems, and to reason about their behavior generally do not provide capabilities for effectively dealing with highly dynamic, concurrent systems of this kind. This leads to systems that have software faults, security vulnerabilities, or are simply inefficient. The resultant system crashes, security breaches, poor performance, and general lack of business confidence about security and privacy on the web can have a significant impact on the overall U.S. economy. Given the extra complexity found in dynamic and concurrent systems, the need for better software tools is critical.
Under this contract, GrammaTech is studying how GrammaTech's dependence-graph technology can be adapted and applied to highly dynamic internet technologies such as Java, JavaScript, and ASP. If dependence-graphs can be successfully applied to dependence graphs a new-generation of software development tools will emerge that can deal with the dynamism and concurrancy of internet applications.

Software System Reliability Analysis
United States Army
System reliability is a fundamental requirement for safety-critical weapons systems. A key challenge is identifying reliability problems early so that they can be fixed quickly and cheaply. Reliability problems are often integration problems: integration often reveals that flaws that seemed minor in isolated components but lead to serious system-wide reliability problems. The combination of static source code analysis with data mining offers the potential to identify reliability problems before integration. By applying data mining to the data generated through static analysis, we can recognize patterns that indicate when a software system is becoming a reliability risk.
We propose a tool that will flag components that are likely to cause integration problems later in development, giving engineers time to investigate and solve problems before the expensive integration step. The proposed tool will collect metrics from the target system; learn patterns for correct and faulty system both passively and interactively; refine the patterns when new data is gathered; and issue reports pointing out software modules that pose reliability risks.

Decompilation-Enhanced Binary Static Analysis (DECOMP)
United States Army
Software defects are frequently being exploited for malicious purposes, leading many organizations to now mandate static inspection of software for cyber vulnerabilities. Static analysis of source code during the development stage is a common practice. However, developers often don’t have access to source code for third-party libraries. Access to any source code during software deployment is rare. This leaves cyber security analysts with the challenge of inspecting binary software. While powerful analysis tools that operate on machine code have emerged, using these tools is not trivial. Inspecting analysis results in machine code requires highly skilled users. And even a skilled user will require more time to review logic represented in machine code than a similar logic in source code. This gets even worse once static analysis warnings are reflected on top of the machine code. As a result, binary software vulnerability inspections are costly endeavors.
We propose to combine machine code vulnerability analysis with decompiler technology to make it easier for analysts to understand static analysis results. This will enable users to review analysis results in a high-level language that is more familiar—and faster to understand—than the machine code the analysis is performed on.
Expected Commercial Benefits
Conducting binary analysis with today’s tools requires access to highly-skilled analysts and a substantial investment in time. These costs inhibit many organizations from adopting binary static analysis tools, leaving them unable to effectively audit binary software for vulnerabilities.
The proposed solution will directly enhance today’s binary analysis technology to lower the barrier to adoption and reduce the cost of using the technology. This will make it easier for customers to analyze binaries for cyber security vulnerabilities and evaluate vulnerabilities against MISRA, CWE, OWASP, SEI CERT and DISA STIG among others.

Immersive Environments for Visualizing Software Tradeoffs
United States Army
Building on a prototype created in previous work, GrammaTech is developing a tool to help users explore various aspects of extremely large software systems.
The tool will combine GrammaTech's powerful software analysis technologies with a 3-D visualization platform that allows users to examine a variety of properties of their software systems at many levels of detail.

Software Anti-Tamper for Real-Time Systems
United States Army
We are developing a tool that protects software from reverse engineering by analyzing and rewriting software binaries with anti-tamper techniques.
Existing approaches to rewriting binaries are typically limited to simple patches. In contrast, GrammaTech's approach involves creation of a high-precision fine-grained intermediate representation of the binary that allows massive modification and restructuring of the program.

Semantics-Aware Malware Detection
United States Army
For concerns of intellectual property, software companies generally distribute their applications and COTS components in binary form-without their original source-code implementation. Analyzing software binaries is important for a number of reasons including quality assurance, software protection, and reverse engineering. For this contract, GrammaTech leveraged the success of CodeSonar®/C to produce a binary-analysis tool – CodeSonar/SWYX – that can satisfy these needs. The mnemonic SWYX (See What You eXecute) captures the notion that the analysis is performed on the actual binary code that executes, rather than the source-code version that is later translated by the compiler. (Analyzing the code that executes may be preferable even if source code were available.).
A salient feature of CodeSonar/SWYX is the use of TSL – Transformer Specification Language – for defining the semantics of machine-code instructions. TSL provides an architecture-independent way to specify analyses: it enables an analysis, defined only once, to be applicable to machine code of any computer architecture for which the semantics of the instruction set is defined. Thus, our approach to the implementation of CodeSonar/SWYX is expected to produce vulnerability detection for additional architectures with a lesser investment than would be required to produce such a tool in ad-hoc manner. Our use of TSL is the result of a close collaboration with colleagues at the University of Wisconsin-Madison.
CodeSonar/SWYX can detect familiar vulnerabilities, such as buffer overruns, null pointer dereferences, division by zero, and unreachable code.
Additionally, tight integration with the CodeSonar/C COTS product gives CodeSonar/SWYX the ability to analyze projects comprising binary and source-code components. In particular, the reuse of CodeSonar/C's models of library functions enables CodeSonar/SWYX to detect vulnerabilities due to misuse of common API, such as double-frees, uses of memory after it has been freed, and file-access race conditions. More generally, any project relying on a COTS component provided as a library, may benefit from the mixed-language capability.

Information Retrieval Techniques for Software Design Visualization
United States Army
Large software systems typically comprise many components written in many different languages that communicate through complex interfaces. The code base will consist of a mixture of source code, configuration files, test vectors and other artifacts. Tools are needed to help software engineers understand the architecture of the software and the role of all of the components in the code base. We propose the development of a tool that will support a wide variety of software engineering activities, and which will eventually scale to tens of millions of lines of code.
Our approach is to provide two different features: an intelligent search engine for the code base that uses latent semantic indexing, and a hierarchical visualization tool that will present clustered views of items in an intelligent and interactive manner. These seemingly disparate features will both use the same underlying mechanism: information retrieval techniques that compute statistical measures of similarity between documents. We will leverage our experience and capabilities in static analysis to adapt these techniques so that useful facts are extracted from the code base. Our previous experience with the automatic generation of UML diagrams from code will inform the design of the visualization component.

Mathematically Rigorous Analysis of Software Binaries
United States Navy
Software is rarely written entirely from scratch. Typically, third-party commercial off-the-shelf (COTS) components are integrated into larger software systems used both in the commercial sector and in critical infrastructure. Third-party components often come in binary form, e.g., as dynamically linked libraries, Active X controls, or plain executables. That is, the source code for those components is typically unavailable and the debug information is stripped. Additionally, to hamper reverse-engineering attempts, the binaries of those components are often further protected with anti-tamper techniques and obfuscations. The lack of source code for third-party components prevents most existing security-analysis tools from exposing the vulnerabilities and malicious behaviors harbored by those components themselves, as well as by software systems that integrate those components.
We propose to design and build a tool that will conduct rigorous analysis of machine code to assess its quality. The tool will automatically identify vulnerabilities in third-party components and will assist security analysts in spotting unexpected and potentially malicious behavior in the third-party code. The proposed tool will integrate with existing GrammaTech source-code-analysis tools to boost their effectiveness in dealing with third-party components and libraries.

Deep Understanding of Complex High-Assurance Hypervisor Source Code
United States Navy
Hypervisors offer a virtualization platform that is cost effective and attractive from a security point of view because guest operating systems are independent of each other. However, these claims of independence must be certified before it is permitted to use a hypervisor in a security-critical environment. The cost to perform a Common Criteria security evaluation of such low-level system code is very high, and the complexity of the code often thwarts automated tools that could help.
We propose to work on advanced static analysis techniques to help reduce this cost by providing user interfaces that aid a user gain understanding of the functionality of the code. When risky features are identified, the hypervisor can be refactored to remove them. The same static analysis techniques can be used to help assess the impact of the refactoring on the remainder of the code. These techniques will include advanced versions of program slicing and chopping, and software model checking. Variations of more superficial techniques will also be explored. The work will build on our existing static analysis platform. We will work closely with the customer and with existing customers involved in performing such certifications.
The work described in this proposal is expected to lower the cost to perform a detailed security evaluation of low-level system code, especially for Common Criteria certification. This includes embedded systems for weapons, avionics, cryptographic devices, and communications. The same techniques will useful for doing safety analysis, and so will be applicable in non-military contexts, including medical devices, transportation, and process control.

Tools for Software Architecture Visualization
United States Navy
The problem of supporting large mixed-language software systems is of great importance to the Navy and beyond. Often the only reliable source of information about their architecture is the source code itself. Having a reliable way to automatically create visualizations of the architecture is key to understanding the system, and for planning changes. Under this contract, GrammaTech is developing of a set of tools for automatically extracting UML design diagrams directly from the source code. The techniques are mostly based on a static view of how objects in the code interrelate, and will allow the extraction of the class diagram, the object diagram, the interaction diagram, and the state diagram.
We are also exploring additional techniques for helping users understand the structure of less well-structured programs. These tools will be based on our existing static analysis technology, which is already highly developed for C and C++, and for which a version for Ada 95 is currently under development. The tool will produce output in the form of XML objects in a standard format that can imported into existing UML design tools.
Many businesses have a need for tools to help support the maintenance of legacy mixed-language systems. The tool we propose to develop will reduce the expense of adapting existing software for new purposes. It will support C/C++ and Ada, so it will have wide appeal in the commercial and military/aerospace sectors.

Model Checking UML Designs
United States Navy
Real-time embedded systems are increasingly prevalent in military and civilian applications. They are often both complex and safety critical in both military applications (missile guidance systems, weapon control systems, etc.) and civilian applications (flight control systems, automotive braking systems, etc). To deal with their inherent complexity and safety requirements, these applications require an advanced design process and sophisticated software-engineering tools to support the process. Such a process would be characterized by rigorous and complete specification capabilities, robust automated translation from design to implementation, support for debugging of the designs at the semantic level of the design specification language, and sophisticated verification and analysis of design models.
In the domain of embedded-systems design, the Unified Modeling Language (UML) is a first step towards these goals; however it does not adequately support the specification requirements of complex embedded systems. Rational Software has attempted to address this by embedding the ROOM methodology for concurrent embedded systems in its UML-RealTime extension to UML and Rose RealTime UML tool. Rose RT is a best of breed COTS tool that builds upon UML-RealTime by adding robust automated code generation and support for simulation and debugging at the UML semantic level.
Current best practices and tools are inadequate for complex mission critical embedded applications because they do not provide adequate verification and analysis of the design models. This lack of support for automated verification and analysis undermines the specification capabilities of current COTS tools. Rational Rose for RealTime allows users to specify the language (in state machine form) of the communication protocol with which components (referred to as capsules in Rose for Realtime) interact. Despite being a powerful design specification capability, this is rarely used in practice because there is no way of ensuring that the components' behaviors will actually obey this protocol. The lack of formal verification capabilities has reduced a powerful design specification technique to mere documentation. This reduced level of specification and complete lack of verification capabilities undermines the safety critical requirements of today's complex embedded systems.
Under this NAVSEA contract, GrammaTech is integrating model checking Rational Rose for RealTime. Our vision is to have model-checking technology tightly integrated with cutting edge design environments that support automatic code generation. The goal is to decrease the error rate in software designs, and then to use automatic code generation to ensure design-to- implementation consistency. We believe this will have a significant impact on both software engineer productivity and overall software quality.

Lean and Efficient Software: Whole-Program Optimization of Executables
United States Navy – Office of Naval Research
Modern software is typically produced using home-grown or third-party libraries and pre-existing components. Consequently, a finished executable often contains unneeded code, duplicate defensive checks, and extra layers of procedure calls. Such bloat contributes to excess memory footprint, slower performance, and security vulnerabilities (by hosting more return-oriented-programming gadgets an attacker can hijack).
The Layer Collapsing project is devising and prototyping techniques to substantially improve the performance, size, and robustness of binary executables. We are using static and dynamic binary program analysis techniques to perform whole-program optimization directly on compiled programs: specializing library subroutines, removing redundant argument checking and interface layers, eliminating dead code, and improving computational efficiency. A tool that successfully implements this goal will dramatically improve the way software is developed and deployed, providing new optimizations available late in the development process or even by the end user.

ARTCAT: Autonomic Response to Disruptive Cyber Attacks
Office of Naval Research
Cyber-security is widely recognized as a pressing problem for national security and the economy in general. Software is notoriously hard to perfect before it is deployed, leaving our critical software infrastructure open to attackers with the patience and skills to detect and exploit vulnerabilities. The resulting cyber-attacks impose an ongoing cost in privacy violations, lost data, and disrupted businesses and lives. In a crisis, such attacks could endanger lives or national security by disabling or subverting the weapons or logistics systems that our citizens and warfighters rely on. System operators need security solutions that can detect, understand, and respond to sophisticated cyber-attacks quickly, before damage is severe.
Modern security monitoring and response systems’ limitations often prevent effective response. They try to detect programs deviating from their intended behavior, but they usually have little knowledge of what constitutes desired behavior. Instead, monitors check for a very rough approximation of intent, allowing clever attackers to subvert programs undetected. If an attack is noticed, the true status, severity and damage are difficult to learn from an avalanche of cryptic event messages. When an attack is detected and understood, operators still must wait for the vulnerable software to be patched before they can defend successfully. Above all, current responses require humans “in the loop” to diagnose the problem, create patches, and reconfigure the system, making successful attacks inherently disruptive and costly.
ARTCAT has several advantages over current practice. We use a runtime monitoring approach—dynamic translation—that gives us full visibility and control over a running system, including the operating system kernel and all processes. ARTCAT’s monitors can use a model of expected behavior—created by application developers with assistance from code analysis—to distinguish good behavior from bad, detecting attacks even when they do not use standard hijacking techniques like control-flow diversion.1 ARTCAT incorporates a System Reasoner monitor for collating information from many applications to give a holistic picture of mission health rather than merely grouping suspicious system events and asking operators to determine their importance and impact. When an attack is suspected, the system can adjust its monitoring strategies to ensure critical events are observed. When an attack is confirmed, ARTCAT can respond automatically. First, it quarantines the affected components. Next, it attempts to automatically harden the targeted program against similar future attacks by rewriting the binary to block or mitigate the attack.
If successful, this project will develop and demonstrate an integrated system where critical software can run safely and harden itself against attack with minimal human oversight. Defensive responses to attacks will be executed at cyber speed, without human intervention. ARTCAT will reduce the burden on cyber-security specialists, allowing them to manage many more systems, and allow users with less experience and expertise to protect systems that would otherwise be unmanaged and very vulnerable to attack. Ultimately, this approach will reduce the damage caused by cyber-attacks—and the cost of preventing cyber-attacks—by making these attacks easier to detect and minimize, and more expensive to mount.

Enhancing Code Awareness in Software Development Environment
United States Navy – Office of Naval Research
Large software systems are built by many people collaborating over long periods of time. Most large pieces of software are sufficiently complex that no individual completely understands all of its subparts. Yet in order to make progress, programmers must operate in this environment of incomplete information. In many cases, programmers make poor design and implementation decisions because of a lack of awareness of how pieces of code interact with each other. These poor design decisions can lead to defects that delay deployment, create security vulnerabilities, cause failures in the field and badly bloat development budgets.
Advances in program analysis and visualization now give software engineers powerful tools for improving software understanding and quality. Modern static analysis tools can scan millions of lines of code and identify subtle but dangerous problems like buffer overruns and race conditions. Techniques like concolic execution, machine learning, and SMT solvers have enabled new techniques for extracting logical and structural information from source code. Modern graphics hardware has the power to display complex large structures in an easy-to-use form. However, these tools are typically available as standalone applications; using them requires a deliberate choice by an engineer to stop coding and start analyzing. Consequently, analysis and understanding tools are often underutilized. Even when they are used, there is a gap between time-of-writing and time-of-analysis, in which information is forgotten.
In this project, we will develop the Code Awareness Assistant (CA2), a virtual coding partner that looks over an engineer's shoulder and gives extra context while pointing out problems as they are introduced. CA2 will be integrated with the engineer's development environment, and give the engineer a broader view of the code than is available from the editor window. CA2 will visualize dependence information at both small scale (what functions call/are called by this code) and large (what packages are required for/affected by this code), and support interactive visual exploration of the system's structure. CA2 will automatically extract key facts about code such as: conditions required for safe execution, use of common idioms and patterns, effects on variables, and required components. CA2 will flag changes to key facts, indicating a possible deviation from original design goals (for example, a procedure that is called in more contexts than before, and may not have been written with generality in mind). CA2 will also flag common software flaws like buffer overruns, memory leaks, and unreachable code sections. This immediate feedback will allow developers to quickly converge on cleaner, safer code; security, maintainability and other software quality issues will no longer get short shrift.

Stealth and Real-Time Program Execution Monitoring
United States Navy – Office of Naval Research
Modern computer systems involve complex arrangements of many software components. It has proven difficult to secure such systems from attack by finding and closing all security holes. Dynamic monitoring techniques that detect intrusions have been developed to defend against latent, unknown vulnerabilities. However, to date these monitoring techniques have focused too narrowly on specific detection strategies and can often be sidestepped.
We propose a next-generation system monitoring platform capable of supporting a wide variety of monitoring strategies. Moreover, our approach provides comprehensive protection for the entire computer system rather than guarding individual processes one-at-a-time. The proposed system monitoring tool incorporates stealth to inhibit an adversarys ability to disable it, and dynamic optimization to ensure minimal performance overhead on the protected system.
Computer security is critical to both national security as well as the private sector. Breaches in security may result in loss of sensitive data and compromise the operation of critical infrastructure. The proposed technology will provide a next-generation tool for detecting and preventing such attacks. Systems deployed with the proposed monitoring system will be more resilient to attack than those systems using older protection systems or none at all.

Deterministic Detection for Hijacked Program Execution
United States Navy – Office of Naval Research
Real-time detection of attacks against software has the potential to revolutionize the software industry. Insecure software is an enduring problem that contributes to huge financial losses and endangers national security. The persistence of the software security problem can be understood as a consequence of the economics of (i) software production and (ii) exploit production.
Software producers have few incentives to ensure that their software is secure. The average customer will usually be unaware that his/her system has been compromised. Even if they determine that their system is compromised, they will rarely know how it was compromised. They will have no idea what software contained a vulnerability that allowed the intrusion. Mention "malicious PDF" to an average user and you are likely to get a wide-eyed look and the question, "PDF can be malicious??" In this marketplace, vendors are rewarded for providing features and rarely punished for providing insecure software.
Hackers can often achieve high (financial, political, or other) rewards with very low risk. Even when their exploits are detected, it is often long after the fact. This vastly simplifies the job of eradicating any evidence that may implicate the hacker, or fabricating evidence to point to someone else.
Real-time detection of attacks has the potential to dramatically change this situation. Real-time detection will give consumers immediate feedback that they are under attack — and demonstrate which vendor(s) put them at risk of infection. Professional embarrassment, if nothing else, will cause software producers to pay the appropriate attention to security. Real-time detection will also make it much more likely the attackers will be caught. In summary, real-time detection of software hijacking will dramatically change the incentives of software and exploit production, which may be the only way to achieve realistic improvements in software security. These goals may be realized even if adoption of the detection technology is not universally deployed.
We propose technology to automatically detect software exploits that hijack the execution of a running Linux kernel or one of its hosted applications. The proposed technology will be integrated with other technology at GrammaTech for monitoring and modifying the execution of a guest virtual machine (provided by KATE, the Kernel Analysis and Translate Engine), analyzing, diversifying, and confining execution of software of uncertain provenance to prevent exploits against certain classes of vulnerabilities (provided by the PEASOUP tool for Preventing Exploits Against Software Of Uncertain Provenance), and diversifying entire systems (provided by the PEASYS tool for Preventing Exploits Against SYStems).

Safety in Numbers
United States Navy – Office of Naval Research
The global information grid has changed the nature of what is possible in a very short time span. One important effect is the new ways in which communities form and gain leverage on previously intractable problems. For example, it is now possible for an encyclopedia written by volunteers (Wikipedia) to rival those that are collated by professionals.
Similarly, Linux has demonstrated that operating systems are no longer solely the domain of well-funded, commercial enterprises. Google's online email system, Gmail, leverages feedback from its user community to create effective spam filters. Programs such as Folding@Home rely on a community willing to donate spare computational cycles to help with research on causes of disease and the development of new treatments. On the darker side, malicious actors are able to co-opt tens of thousands of user machines to create artificial communities (botnets) that launch cyber attacks or send spam.
This work is concerned with developing a new approach for improving the security of computer systems, one that leverages the new realities of the global information grid, rather than fighting them. As mentioned above, online communities can provide new leverage on intractable problems. In this proposal, we focus on the problem of detecting malicious code, although we anticipate that the work will also be applicable to defect detection.
This effort will build on prior research in program analysis and in community-based computation. The relevant research on program analysis includes work on machine-code analysis, automatic malware detection, and defect detection. The relevant research on community-based computation includes work on grid computation.

Modernization of Legacy Software
United States Navy – Office of Naval Research
As hardware platforms age, manufacturers are less willing to support them and related technologies. Legacy weapons systems must eventually be modernized, but re-implementation of legacy software by hand is prohibitively expensive. A system is needed for automatically translating software written in obsolete languages like CMS-2 or assembly to a modern language like C.
Previous attempts at translation have amounted to transliteration: they produced "high-level" code that closely mimics the low-level code. Some attempts even use global variables for registers. Translators that do more than this often assume that the low-level code was generated by a compiler, and hence adheres to certain conventions; such assumptions do not hold for legacy code. Under this OSD/NAVY SBIR contract, GrammaTech is developing a system for modernizing legacy software based on state-of-the-art static analysis.

Programmable Interfaces for Advanced Static Analysis
United States Navy – Office of Naval Research
The security and prosperity of the nation are increasingly dependent on information systems. They are critical to the operation of our civil infrastructures, as well as the armed services. The need to ensure the security of these systems is of paramount importance.
Software security is compromised by both deliberate and inadvertent acts. Malicious code is code that has been intentionally added to or changed in a software system to cause harm or to subvert the intended function of the system. Examples of malicious code include time bombs, viruses, worms, and backdoors. Bugs are the unintended result of innocent human frailty, unsafe programming languages, and inadequate tool support—but bugs lead to exploitable vulnerabilities. Once an adversary has discovered an exploitable bug, its potential for harm is as great as malicious code inserted by insider attack. Indeed, malicious code is often disguised to look like a run-of-the-mill bug to cover the tracks of the perpetrator.
Using advanced static analysis offers the potential to find both bugs and security vulnerabilities in code. GrammaTech’s system, named CodeSurfer®, is being applied to solving these problems. For example, the University of Wisconsin is using CodeSurfer in its Critical Infrastructure Protection, Adaptable Software (CIP/SW) University Research Initiative (URI) project, “Vulnerability and Information Flow Analysis for COTS”. In this Office of Naval Research-sponsored CIP/SW project, Wisconsin is using CodeSurfer to develop techniques for detecting vulnerabilities in both C source code and x86 machine code (COTS binaries).
To facilitate software engineering and security applications like the Wisconsin work, GrammaTech is creating a premier toolkit for static program analysis. Under an Office of Naval Research (ONR) contract, we are building mechanisms and interfaces to support a language-independent, iterative, and programmable approach to the organization of static program analyses and transformations of intermediate representations. With the API, users of the system will be able to program their own custom static analyses. The architecture is very flexible; user code will access the API in whatever iteration scheme is most suitable for the application, be it code inspection, code understanding, bug detection, malicious-code detection, or another application.

Model Checking of Software Designs
United States Navy – Office of Naval Research
With ONR support, GrammaTech and sub-contractor Formal Systems (Europe) Ltd. are integrating FDR2, Formal Systems' advanced state space exploration tool, with the Rational Rose RealTime UML environment. This integration will enable software engineers to formally verify critical properties of the design of embedded systems. Design verification allows developers and designers to detect errors early in the development life-cycle, reducing the substantial time and cost of correction, making "on-time and under-budget" a reality.
Formal methods offer great promise for the elimination of errors in distributed, realtime, networkcentric warfare software, but before they can be widely adopted in industry, they must be supported by good tools readily acceptable to professional programmers. Existing formalmethods tools are hobbled by weak userinterfaces, and severely limited capabilities for integration with other tools. In an existing SBIR Phase II project, GrammaTech is seeking to remedy these problems by developing a professionalquality, standardscompliant, customizable, portable, common interface tailored to formalmethods applications. GrammaTech proposes to complement the existing SBIR project by building a vendor-independent UML communication layer, so that formal methods tools can be integrated with different vendor's UML implementations with minimal effort; by integrating FDR2 with Rational's Rose; by adapting GrammaTech's CodeSurfer® system to provide slicing and navigation in dependence graphs manipulated by Formal Methods tools; and by interacting closely with potential industrial users of the technology.
DoD is moving rapidly to put networks, with their ability to disseminate information quickly, at the center of its military strategy during the next decade. Although DoD will make heavy use of commercial products, it is no ordinary commercial customer. Warfighting networks demand high connectivity, reliability, and redundancy, as well as data integrity and data security; singlepoint failures can not be tolerated, and systems must degrade gracefully in the event of a failure. Getting distributed realtime software right is notoriously difficult.
Because installation of networkcentric warfighting infrastructure represents such a large and long term investment, it must be done right the first time. Thus, in (say) the Navy's IT21 project, the system and software production methodologies themselves need high assurance. Designs must be thoroughly simulated, debugged, validated, and (if possible) verified before implementation. Design flaws discovered during or after implementation can be horrifically expensive to correct. UML. The Unified Modeling Language (UML) is used for specifying, visualizing, constructing, and documenting the artifacts of software systems. It simplifies the complex process of software design, making a "blueprint" for construction. UML incorporates Harel's Statecharts for finite state modeling. Although the UML definition was led by three software methodologists (Booch, Jacobson, and Rumbaugh) from one company (Rational Software), its adoption as an standard throughout the objectoriented analysis and design (OOAD) industry has been a watershed event of considerable importance to DoD. Rational Software's recent acquisition of ObjecTime has merged Selic's ROOM methodology with UML, a combination uniquely suited for embedded systems.
Although DoD contractors are heavy users of OOAD systems, their value has been limited by the absence of powerful reasoning and formal analysis techniques that validate the models after they have been produced. Indeed, the "analysis" in OOAD is informal "problem analysis" that precedes design. Once a model has been created, it is rarely subjected to any formal analysis. This accounts, in part, for OOAD's poor reputation in some circles as "mere documentation". Without formal analysis of models, OOAD has not been yielding a full return on its investment.

User Interface for Rule-Based Formal Methods Systems
United States Navy – Office of Naval Research
With SBIR support from ONR, GrammaTech is working to improve the quality and reliability of software by speeding the adoption of formal methods in industry.
The common intermediate representation of the Toolkit is based on system dependence graphs (SDGs). SDGs are a semantically rich representation that describes programs in terms of their deep structural properties. In contrast, abstract syntax trees are mere surface structure. SDGs deliver the results of sophisticated static analyses in a form that substantially raises the level of discourse for the subsequent analyses, transformations, and compositions needed by formal methods. Thus, they have a far greater potential as a common intermediate representation for the FME than attributed abstract syntax trees.

Language-Based Software Environments
United States Navy – Office of Naval Research
With ONR Support, GrammaTech developed a general tool, the Synthesizer Generator™, for creating language-based environments from formal specifications. Using a declarative language, an editor designer prepares a specification that includes rules defining a language's context-free abstract syntax, context-sensitive attributions, concrete input syntax, concrete display format, and transformations.
From this specification, the Generator creates a full-screen editor for manipulating objects according to these rules. The Synthesizer Generator uses attribute grammars to compute derived information about the object being edited. The fundamental algorithmic importance of the Synthesizer Generator are its algorithms that incrementally recompute attribute values after an editing change. These incremental algorithms permit immediate response when objects are being manipulated through a generated editor.

HACCS TA2: Automatically Generating Agent Injection Methods
Defense Advanced Research Projects Agency
Botnets and similar large-scale malware present an ever-present therat. When a new botnet surfaces, today's defenders must use manual efforts to neutralize it. An automated neutralization system will improve reaction time, decrease the number of systems infected by the botnet, and decrease the botnet's impact. Under HACCS, GrammaTech and AIS will help develop a system to create safe and reliable autonomous software agents that can effectively counter such malware.

Vetting Commodity IT Software and Firmware
Defense Advanced Research Projects Agency
The VET (Vetting Commodity IT Software and Firmware) program seeks to help U.S. government agencies address the threat of malicious code and hidden “backdoor” access in commodity IT devices. Mobile phones, network routers, computer workstations, and other networked devices can be secretly modified to function in unintended ways or spy on users. To address this danger, GrammaTech will be developing a tool that tackles this supply-chain risk-management issue for software and firmware.
GrammaTech’s tool will automatically discover malicious functionality in these devices. The technology will take in two inputs: software in binary form and a list of properties to check in the software, in order to ensure a secure system. It will then run a series of sound analyses of the binary code to check all of the required properties, and produce a set of reports of potential property violations, or prove that all of the properties hold.
The deep technical challenges inherent in the VET project require solutions that must surpass present state-of-the-art machine-code analysis technologies. For instance, current static analyses are either scalable but heuristic (unsound), or sound but unscalable to the program sizes needed for VET. To achieve the project goals, GrammaTech will be developing sound, yet scalable binary-analysis techniques. Additionally, analyses are often tied to specific properties to be checked and developed only for a specific instruction set. GrammaTech’s technology will be able to check broad collections of properties and it will be readily retargetable to new instruction sets.

Automatic Detection and Patching of Vulnerabilities in Embedded Systems
Defense Advanced Research Projects Agency
Recent studies have shown that embedded systems are extremely vulnerable to security attacks. Some published exploits include remote hijacking of the electronic systems in a modern car and using IP phones and smart televisions to perform covert surveillance of their owners.
In this project, we are developing a protection system that automatically detects and removes vulnerabilities from embedded software. The system will be based on static rewriting of the software prior to deployment. The rewriting will render the known vulnerabilities unexploitable and will add protections to prevent exploits of undiscovered vulnerabilities.
The system will operate directly on software binaries, even in the absence of source code or symbol information. Thus, the system will protect equally well both the newly developed software and legacy software. We will build the system to be easily retargetable to different instruction sets to accommodate a variety of platforms employed in the embedded systems domain. To make sure that added protections do not break the functionality of a program, the system will include a component for verifying that the rewritten program is semantically equivalent to the original program.

Mining and Understanding Software Enclaves (MUSE)
Defense Advanced Research Projects Agency
Mining and Understanding Software Enclaves (MUSE) is an initiative that seeks to gather hundreds of billions of lines of publicly available open-source computer code to mine in an effort to create a searchable database of properties, behaviors, and vulnerabilities. PLINY is a joint effort among GrammaTech, Rice University, the University of Texas-Austin, and the University of Wisconsin-Madison.
GrammaTech’s role in the PLINY project focuses on analyzing code to extract program elements that represent computational paradigms, as well as identifying syntactic and semantic features that characterize the program elements. Accurate identification and characterization of program elements is essential to enabling organization and search of PLINY’s code database. GrammaTech’s CodeSonar static analysis tool will be at the center of the effort to generate features. CodeSonar provides extraordinary scalability, an ability to analyze diverse code samples, and an analysis engine with proven capabilities for discovering subtle program properties. For more information about the PLINY project, visit DARPA’s MUSE page.

Calculating and Understanding Resource Bounds to Detect Space/Time Vulnerabilities (STAC)
Defense Advanced Research Projects Agency
The STAC program seeks to enable analysts with the ability to identify two important classes of vulnerabilities that are based on an application's usage of space and time. GrammaTech is building technology to automatically detect these vulnerabilities.
The first type, Algorithmic Complexity Vulnerabilities, allow an attacker to create inputs that can cause excessive resource consumption, frequently used to mount Denial-of-Service (DoS) attacks that disrupt a software application's responsiveness. The second type, Side-Channel Leaks, allow an attacker to infer confidential information by observing an application's usage of time and space.
GrammaTech's technology will detect these classes of vulnerabilities in Java bytecode, without requiring access to program source code. GrammaTech will also collaborate with researchers at Yale University, who will contribute recent breakthroughs in amortized resource-bound analysis, as well as researchers at the University of Wisconsin-Madison, who will contribute seminal work in shape analysis, enabling the combined technology to capture the dependence of resource use on linked data structures.

Automatic Software Diversity for Execution-Time Protection (CFAR)
Defense Advanced Research Projects Agency
The CFAR (Cyber Fault-Tolerant Attack Recovery) project combines advanced binary analysis and transformation technology with new approaches to binary diversification. Software diversification creates small variations in a programs implementation to thwart potential attacks yet maintain its original functionality. The resulting technology seeks to allow a system to understand when one of the program variants has been compromised.
Subcontractors New York University and the University of Iowa will assist GrammaTech in developing algorithms that will ensure that the automatically-created variants function in the same way.

Automated Exploitability Reasoning
The U.S. Army
Fuzzing techniques will often produce a large enough number of crashing inputs for the program under test that it is important to prioritize them in terms of impact. One natural axis of a bug's impact is whether it can be used in a security exploit. Determining whether a crash is exploitable, however, is a complex and multi-layered problem. Within this project, GrammaTech is developing Chase, a tool suite for automatically triaging crashes reported in a program depending on the degree to which a crash appears indicative of an exploitable security vulnerability. In the long term, Chase will combine information about the crash itself, analyses to determine what data values are particularly important, computations of how much influence the attacker has over those important values (i.e., channel capacity), fault localization techniques, static analysis for "proving" unexploitability, taint analysis, automatic exploit generation, and domain-specific knowledge about exploitability. Chase will analyze a stripped binary in the context of a particular crashing input, compute or record the above information, and produce an estimate of the likelihood that the bug is exploitable. Users of Chase will be able to use the results to help prioritize which crashes deserve particular attention.
These tools will be able to be used standalone, but many of the tools and techniques will be useful for ranking warnings from static analysis tools such as GrammaTech's CodeSonar. CodeSonar is a leading commercial tools for finding program vulnerabilities and will be an intended transition target for this technology. In addition to this specific application, we will be advancing technology for symbolic execution, fault localization, and other analyses, which will benefit other projects and application areas as well.

Progressive Model Generation for Adaptive Resilient System Software
United States Navy
Today’s software applications, expected to operate with minimal human oversight, have the potential to act erratically after being hacked, exposing critical data or harming physical equipment and humans. To solve this issue, future autonomic ‘intelligent’ software applications will operate with self-awareness. Such autonomic applications monitor for deviations from expected behavior, assess mission health, and react to preserve objectives – raising alerts, initiating recovery processes, or shutting the system down. The Scenario-Based Modeling and Checking (SMAC) project is aimed at helping software developers efficiently create models of ‘correct behavior’ that their applications will be required to follow.
Most systems are just implemented in code. A central problem in this approach to cyber security is defining a precise model that lets the monitor distinguish good behavior from bad. When the implementation is complete, there is no higher-level description of correct behavior that can serve as a model. The essence of GrammaTech's approach is to help programmers define the models while they are coding. This will facilitate ‘smart monitoring’ that will allow the Navy, DoD, and other software users to deploy systems with more confidence that their systems will survive attacks and behave as intended.

Multi-Abstractions System Reasoning Infrastructure toward Achieving Addaptive Computing Systems (SyRes)
The U.S. Office of Naval Research
The complexity of modern computer systems has grown to the point of stressing human ability to understand their behavior completely. The sheer number of software components (and the myriad interactions between them) that are present on a single desktop computer presents a difficult security challenge that continues to confound modern protection technologies. Every day, new exploits are created that take advantage of obscure combinations of software bugs and unexpected behavior to sidestep existing defenses. Response is slow, requiring human effort to diagnose and develop new counter-measures to each new threat.
GrammaTech envisions a new paradigm for building system security that endows the computer itself with the tools to diagnose new attacks, reason about their impact to the system, and implement countermeasures in an automatic fashion. Autonomy-oriented computation offers the potential to allow complex computer systems to police themselves, detecting intrusion, performing self-healing, and directly countering cyber threats.
Computer security is critical to both national security as well as the private sector. This technology will provide next-generation technology for creating autonomic systems capable of detecting and closing breaches in security in an automated fashion.