AI-Enabled High-Confidence Firmware Bill of Materials Extraction

HCSS 2026 Blog Series (Part 4 of 4)

Insights from GrammaTech’s accepted talks and posters at the HCSS Conference

This post is the third in a series highlighting GrammaTech’s contributions to the HCSS Conference, where we will present two talks and two posters on emerging challenges in software security. In this series, we’ll break down key findings and explore their practical implications.

The Firmware Visibility Problem

Modern assurance of software-intensive and cyber-physical systems increasingly depends on understanding the true composition of firmware. That means reused libraries, hidden dependencies, and inherited vulnerabilities, along with defensible evidence suitable for high-confidence assurance workflows.

Firmware analysis, however, remains bottlenecked by manual reverse engineering and brittle signature-based methods that struggle with compiler variation, optimization, and architectural diversity. A library compiled for ARM looks nothing like the same library compiled for MIPS. The same source code built at -O0 and -O3 produces binaries that share almost no syntactic features. Stripped binaries shed the symbols that conventional tools depend on. The result is a persistent gap between what defenders need to know about a firmware image and what they can practically determine. Closing that gap at scale requires analysis techniques that learn, generalize, and reason across the variation that defeats conventional signatures.

FABLE: An AI-Enabled Firmware Analysis Pipeline

GrammaTech’s FABLE, the Firmware Automatic Bill of Materials (BOM) Labeling Engine, is designed to close that gap. Developed under the NSTXL and Navy CRANE Firmware Bill of Materials Extractor (FBME) program, FABLE integrates multiple complementary analysis techniques, including hashing, fuzzy and graph-based matching, emulation fingerprinting, and AI-based similarity analysis, into a unified pipeline for automated firmware decomposition and component identification.

AI is central to how FABLE operates, showing up as a set of complementary techniques, each targeting a different aspect of the firmware identification problem:

Neural embedding similarity is the centerpiece. Built on GrammaTech’s Discover technology, originally developed under DARPA sponsorship, it uses a self-attentive Siamese neural network to derive high-dimensional function embeddings from binary code. Assembly instructions are vectorized through a learned instruction embedding model, then passed through a bidirectional recurrent network with multi-hop self-attention. The result is a representation that captures function semantics even when compilation, optimization, and architecture vary. This is deep learning applied directly to stripped binaries, and it identifies library components that no handcrafted signature could match.
Behavioral fingerprinting through emulation takes a complementary route. Functions are emulated at an architecture-independent intermediate representation, and their observable state changes are hashed into a deterministic fingerprint. Functionally equivalent code produces identical fingerprints regardless of compiler or target ISA. It is the closest thing to semantic equivalence that automated analysis can offer.
Learned feature vector similarity extracts high-dimensional function features and performs fast approximate nearest-neighbor search against a reference database of known library functions, using techniques drawn from modern vector search infrastructure.
Large language models are used in narrowly scoped roles that play directly to their strengths. They identify components through explicit versioned strings and distinctive identifiers in binaries. They accelerate tasks that are otherwise labor-intensive. And they support translation of user intent and aid human analyst understanding and downstream reasoning, turning dense machine output into narrative that an engineer or decision-maker can act on.

Supporting these AI techniques are faster pattern-based signals, including YARA capability rules, string banners, and conventional SBOM fragments, that add corroborating evidence at low cost.

Trust Through Tiered Voting Across AI-Driven Analyses

AI is powerful, but no single model is reliable on its own. To ensure trustworthiness, FABLE employs a tiered voting framework that aggregates corroborating evidence from independent static, dynamic, and AI-driven analyses, producing confidence-scored results suitable for assurance workflows.

Each technique contributes evidence weighted by its reliability profile. Exact behavioral fingerprints carry the highest weight, because a matched fingerprint implies semantic equivalence. Neural embedding matches and vector similarity matches provide strong corroboration. Pattern-based signals add supplementary context but do not carry the vote on their own.

When techniques disagree, for example when the same function is attributed to different libraries, or when multiple versions of the same library appear to be present, the system flags the conflict and reduces confidence accordingly. The output is not a binary match-or-no-match result but a confidence-scored identification with full provenance, showing which techniques contributed, at what strength, and where they agreed or diverged.

This is the core design principle: AI-enabled but explainable. Every component in the resulting bill of materials traces back to specific evidence from specific models and analyses. That matters because it is what lets machine learning output enter high-confidence workflows, where “the model said so” is not a defensible answer.

Deployment Across Operational Contexts

The system supports both air-gapped and cloud-based execution to ensure accessibility across a diverse user base. Local models, pre-cached vulnerability databases, and containerized services mean the same AI-driven pipeline runs inside classified enclaves, on premises, or in cloud environments without modification. The analytical rigor is identical across deployments.

What You Get

The result is an automated yet explainable firmware BOM generation process that outputs SPDX and CycloneDX artifacts annotated with provenance, confidence measures, and vulnerability context. Concretely, that includes:

Per-component confidence scores and full AI-technique provenance, so every identification is defensible and reproducible
Function-level capability detection, covering cryptographic usage, dangerous C functions, hardcoded credentials, known malware indicators, and insecure network services
Vulnerability mapping via industry tooling, linking identified components to known CVEs through CPE and PURL identifiers
LLM-generated analysis reports tailored to offensive, defensive, and executive audiences, grounded in the structured evidence produced upstream

AI as a Force Multiplier for High-Confidence Assurance

This work demonstrates how AI can act as a force multiplier for high-confidence software and systems engineering, reducing analyst burden while preserving rigor, traceability, and defensibility in the firmware supply chain. Neural networks identify components. Learned embeddings measure similarity. LLMs translate results into human-readable form. A tiered voting framework keeps the whole system honest, with confidence scoring and provenance at every step.

It is an instance of the AI as an Enabler theme in practice: carefully scoped AI techniques augmenting firmware assurance workflows while preserving rigor, explainability, and defensible evidence across the firmware supply chain.

HCSS Series:

Part 1: Comparing the Cognitive Vulnerabilities of Human and AI-Based Penetration Testers

Part 2: Malware Detection Using Features from Static Disassembly

Part 3: From Firmware Analysis Outputs to Assurance Artifacts: Evidence-Driven Workflows in REAFFIRM

Part 4: AI Enabled High-Confidence Firmware Bill of Materials Extraction (this post)

AI-Enabled High-Confidence Firmware Bill of Materials Extraction

Contact Us

Company