HCSS 2026 Blog Series (Part 2 of 4)
Insights from GrammaTech’s accepted talks and posters at the HCSS Conference
This post is the first in a series highlighting GrammaTech’s contributions to the HCSS Conference, where we will present two talks and two posters on emerging challenges in software security. In this series, we’ll break down key findings and explore their practical implications.
Malware Detection Using Features from Static Disassembly
Why Current Malware Detection Falls Short
Machine learning has become an important tool for malware detection and classification. It can detect patterns that go beyond known signatures, which helps when dealing with new or unseen malware. This makes it useful in cases where traditional methods fail.
Many current approaches rely on simple features from binary parsing or limited static or dynamic analysis. These features often miss deeper structural and semantic properties of binaries. As malware continues to evolve, these shallow features are not enough. There is a need for richer representations that reflect how programs are structured and what behaviors they exhibit, while still working at scale.
Using Static Disassembly to Improve Detection
This work focuses on extracting features from static disassembly of PE32 Windows binaries. The goal is to capture detailed information about program structure and behavior in a way that supports large scale analysis.
We show that by incorporating features derived from static disassembly using GrammaTech’s state-of-the-art disassembler DDisasm, in conjunction with features based on binary parsing and capability labeling using Mandiant’s CAPA, we can substantially improve malware classification accuracy and robustness. By extracting detailed control-flow, instruction-level, and semantic patterns from binaries, we can find traits that generalize across different datasets. This helps models stay effective against new threats.
What Features Actually Help the Model
We also study which features matter most in trained malware detection models. This helps us understand what the model is using to classify and analyze malware. For example, feature importance analysis can surface which control-flow or instruction-level patterns most influenced the model’s decisions, giving analysts a starting point for deeper investigation. The results show that features from disassembly add useful signals that basic parsing features do not capture. These features work well together and give a more complete view of the binary.
These features also make the models more robust across datasets from different sources and time periods. They help reduce overfitting and keep detection rates high, even for malware the model has not seen before. We also find that a smaller set of important features can reach near-peak performance. This suggests that the approach can be efficient enough for practical use.
What This Means for Malware Detection
Our research shows that machine learning can be combined with careful static program analysis to improve malware detection. This approach not only helps detect threats more effectively but also provides useful insights for malware experts who need to understand how and why a binary is classified as malicious.
The results suggest that features from static disassembly can improve how reliable, understandable, and scalable these systems are. This makes them better suited for use in high-confidence software and security workflows.
HCSS Series:
- Part 1: Comparing the Cognitive Vulnerabilities of Human and AI-Based Penetration Testers
- Part 2: Malware Detection Using Features from Static Disassembly (this post)
- Part 3: AI Enabled High-Confidence Firmware Bill oF Materials Extraction [Coming soon]
- Part 4: From Firmware Analysis Outputs to Assurance Artifacts: Evidence-Driven Workflows in REAFFIRM [Coming soon]
