How automated firmware security analysis, SBOM generation, and AI-assisted rehosting produce traceable evidence for vulnerability assessment
HCSS 2026 Blog Series (Part 3 of 4)
Insights from GrammaTech’s accepted talks and posters at the HCSS Conference
This post is the third in a series highlighting GrammaTech’s contributions to the HCSS Conference, where we will present two talks and two posters on emerging challenges in software security. In this series, we’ll break down key findings and explore their practical implications.
The Evidence Problem in Firmware Assurance
Firmware is one of the hardest targets in software security. There’s rarely source code. Testing depends on hardware that may not be available. The tooling ecosystem is thin compared to what exists for conventional software. And yet firmware powers the devices that underpin critical infrastructure: industrial controllers, medical devices, network equipment, vehicles. When something goes wrong, the consequences are physical.
Strong analysis tools exist for this problem. A skilled reverse engineer can run Ghidra or Binwalk (or a plethora of other one-off tools) on a firmware image and learn a great deal about what’s inside. But the outputs of that work (disassembly databases, analyst notes, one-off scripts) tend to stay locked in individual workflows. When an assessor, a program manager, or a red team needs to act on those findings, the evidence often has to be reconstructed from scratch. And the tools themselves demand deep expertise, locking out the program managers, blue teams, and decision-makers who need the answers most.
This is the gap between firmware analysis and firmware assurance. Analysis asks what a binary does. Assurance asks whether you can prove it to someone else, repeatably, with structured evidence. That’s the gap REAFFIRM is built to close.
From Image to Structured Evidence
REAFFIRM takes a firmware image as input and orchestrates a pipeline of analyses that produce structured, queryable artifacts at each stage. The system unpacks firmware packages recursively, discovers embedded binaries, identifies instruction set architectures, and feeds everything into multiple analysis backends, including Ghidra and the Datalog-based disassembler ddisasm. But the critical step is what happens next. Rather than leaving results in tool-specific formats, REAFFIRM normalizes all disassembler output into a unified knowledge base (a Datalog fact database) through a common abstraction layer.
This knowledge base is where analysis becomes evidence. Souffle Datalog inference rules derive control flow, data flow, dominance relations, and capabilities from the base facts. The system can answer concrete questions: which functions interact with hardware peripherals, which procedures implement cryptographic operations, which components are reachable from a given entry point, and what the call graph looks like around a function of interest. These aren’t ad hoc queries against a one-off analysis. They are repeatable derivations from a formal fact base that can be re-run, extended, and audited.
The outputs follow suit. REAFFIRM produces software bills of materials in CycloneDX and SPDX formats, CVE scan results mapped against those SBOMs, cryptographic audit reports with severity classifications, and capability summaries, all in structured formats that downstream tools and processes can consume directly. When an analyst claims that a binary includes a weak cryptographic implementation, the evidence chain runs from the raw binary through the Datalog derivation to a specific finding in a machine-readable artifact.
From Evidence to Action
Structured evidence is only useful if it supports decisions. REAFFIRM connects its analysis artifacts to three practical workflows: targeted validation, automated rehosting, and multi-audience reporting.
For validation, the platform’s component extraction subsystem identifies the most interesting and harnessable functions in a binary, weighing factors like code complexity, call graph position, argument types, and hardware access patterns. Selected components can be extracted from the original binary, rehosted as standalone executables, and prepared for fuzzing, all without requiring the original hardware. This makes it possible to move from a static capability claim (“this function handles network input”) to a dynamic validation (“and here is what happens when you fuzz it”).
For bare-metal and deeply embedded targets, REAFFIRM goes further with an agentic AI pipeline for automated rehosting. The system uses binary analysis to discover hardware abstraction layer functions in stripped firmware, classifies their behavior, and uses an LLM to generate handler code. A multi-tier validation loop (static checks, boot tests, and progress analysis) catches errors and feeds diagnostic information back to the LLM for regeneration. This automates what has traditionally been weeks of manual reverse engineering work. The human analyst reviews the result and steps in when the automation gets stuck, but the bulk of the rehosting setup is handled by the pipeline.
For reporting, REAFFIRM uses large language models to translate technical findings into artifacts for different audiences. A red team, a blue team, and an executive stakeholder need different views of the same evidence. The generated reports explicitly link their claims back to specific analysis findings (SBOM entries, CVE records, capability results, cryptographic audit findings) so that every assertion in a summary can be traced to the underlying evidence. The human analyst remains in control of interpretation and meaning, but the reporting layer means they no longer have to manually write the translation between raw findings and stakeholder communication.
An Evidence-Driven Approach
The firmware assurance problem is not primarily a tooling problem. Capable analysis tools exist. The problem is organizational: analysis results are produced in formats that don’t travel, by experts whose findings aren’t easily reused, in workflows that don’t produce auditable evidence. Every time an assessment has to be repeated because the original results can’t be traced or queried, the cost of assurance goes up and the incentive to skip it increases.
REAFFIRM addresses this by treating structured evidence as a first-class output of the analysis pipeline, not a report written after the fact. The approach reduces the burden on individual analysts while keeping the rigor that firmware security demands. It also demonstrates how AI can serve as a genuine enabler across the workflow: formal Datalog inference for capability extraction, agentic pipelines for automated rehosting, and grounded report generation that links claims to findings. The uncertainty is visible, the provenance is intact, and the analyst stays in the loop.
HCSS Series:
- Part 1: Comparing the Cognitive Vulnerabilities of Human and AI-Based Penetration Testers
- Part 2: Malware Detection Using Features from Static Disassembly
- Part 3: From Firmware Analysis Outputs to Assurance Artifacts: Evidence-Driven Workflows in REAFFIRM [This post]
- Part 4: AI Enabled High-Confidence Firmware Bill oF Materials Extraction [Coming soon]
