Our Contender for DARPA’s AI Cyber Challenge

We at GrammaTech, together with our academic teammates at UVA and CMU, were thrilled to receive a $1M prize on the Small Business Track of DARPA’s new AI Cyber Challenge.

In this post, we’d like to introduce the event and ourselves. We will also give a high-level description of our approach and why we think it’s the right way to meet DARPA’s ambitious challenge.

Who We Are

GrammaTech is a cybersecurity company with a decades-long history of government-funded research and development. We have unique capabilities for analysis and automation at every stage in the technology stack, from source code to binaries and hardware. And we take our research into the real world, transitioning projects into COTS products and open-source tools. Our colleagues at UVA and CMU bring further expertise in program analysis and repair.

The AI Cyber Challenge

DARPA’s AI Cyber Challenge (AIxCC) aims to harness recent developments in AI – such as Large Language Models (LLMs) – to secure critical-infrastructure software, by creating systems that automatically detect and repair vulnerabilities at unprecedented scale.

DARPA’s Challenge competitions have a proven history of advancing the state of the art and turning theory into practice. This makes the AI Cyber Challenge important not just for the competitors, but for everyone. To give a sense of just how critical “critical” infrastructure can be, consider that after DARPA’s challenge was announced, ARPA-H – DARPA’s health-care technology peer – also committed its resources to the challenge, recognizing that hospitals, life-saving medical equipment, and life-sustaining implantable devices are all vulnerable to cyberattack.

In the competition, each participant Cyber-Reasoning System (CRS) will be presented with a series of challenge projects, each incorporating vulnerabilities seeded into the project’s Git history. The goal is to detect and repair each vulnerability. For a vulnerability to count as detected, we must generate two pieces of evidence. The first is a Proof of Vulnerability, i.e., an input that triggers a particular sanitizer — a vulnerability detection tool that causes the program to crash instead of silently continuing when an error introduces a vulnerability. The second is a Proof of Understanding, which shows we understand the vulnerability by naming both the Git commit where it was introduced, and the sanitizer that it triggered.

Once we’ve identified the vulnerability, and proven we understand it, the final step is to fix it by generating a patch at the source code level. The patch must be selective and precise, removing the vulnerability without affecting the functionality of the challenge project (and without introducing new vulnerabilities!).

Introducing VERSATIL

Our solution for the AI Cyber Challenge is VERSATIL: “Vulnerability Exploration and Repair through Static Analysis and Testing Instructed Large-language-models”.

LLMs are the key enabling technology for the AIxCC vision. But LLMs are only powerful, not miraculous. To use their power responsibly, we need to embed them in systems that ground them in reality and keep them accountable.

VERSATIL is such a hybrid system: a unique combination of LLMs and static and dynamic analysis techniques. VERSATIL is built on Proteus, GrammaTech’s advanced software testing system. Proteus is based on binary rewriting and dynamic analysis, and automatically finds and fixes vulnerabilities, with no false positives. In fact, Proteus owes its existence to a previous DARPA challenge. In 2016 GrammaTech, jointly with UVA, participated in the DARPA Cyber Grand Challenge (CGC) competition, where our system won second place. We applied the lessons learned during CGC to enhance the technology and develop it into what is now GrammaTech’s Proteus platform. Today, Proteus is used by development groups, testing organizations, and cybersecurity teams.

VERSATIL extends the existing Proteus infrastructure to handle analyzing and patching source code, automate harnessing, and to integrate LLM-based techniques.

The VERSATIL approach

At a high level, VERSATIL’s vulnerability discovery works in stages, with each stage informing the next. Faster but less precise tools help narrow the search space so that we can apply more precise but slower tools only to well-justified suspected vulnerabilities.

In the first stage, LLM-based tooling uses its understanding of the application domain to identify vulnerability candidates. Focusing on those candidates, we generate customized static analysis passes, e.g., by controlling which checkers we use (some checkers are too expensive to run in the general case), or by adding new context-dependent analysis rules. The static analysis results allow us to create targets to guide dynamic analysis, for example, by biasing fuzzing towards identified vulnerable locations. This dynamic analysis is responsible for confirming vulnerabilities and creating the Proof of Vulnerability.

Next comes the diagnosis stage, where we determine the location of the vulnerability (which could be different than the crash location). We also identify the commit which introduced the vulnerability, and we generate the Proof of Understanding.

Finally, we repair the vulnerability using both the Darjeeling repair framework and LLM-based program repair techniques. These integrate all the information previously generated by vulnerability localization to construct LLM queries, using both template selection and prompt engineering approaches. LLM-generated repairs are not naively accepted, but validated and ranked based on both static and dynamic properties.

We submit our Proof of Vulnerability, Proof of Understanding, and generated patch together to DARPA’s scoring system, and move onto the next vulnerability or the next challenge project.

Looking ahead

VERSATIL is developing rapidly. We expect our approach will significantly evolve as we determine what works and what doesn’t. Be sure to check back here regularly for further updates on our plans and progress.

Our Contender for DARPA’s AI Cyber Challenge

Related Posts

Hacking Embedded Applications

Artificial Intelligence Embedded in Code: Do’s and Don’ts for Commercial Developers

Contact Us

Company