Machine Learning and Big Code

About the area

Our Big Code technologies combine statistical/machine learning techniques with program analysis to improve bug finding, program synthesis, and code search. We mine the collective wisdom in openly available code repositories, with applications to software security, reliability, construction, and maintenance.

Benefits

  • Identify library components that are part of a binary application to obtain a Bill of Materials. The Bill of Materials is then correlated with a vulnerability database (such as the National Vulnerability Database) to detect n-day vulnerabilities that may be affecting the binary application.
  • Recover mathematical algorithms implemented in binary software for Subject Matter Expert inspection, to facilitate understanding and reuse.
  • Increase the coverage of source-level static analysis tools for detecting incorrect uses of library API functions. This approach reduces the false negatives in static analysis tools.
  • Improve developer productivity by auto-completing code snippets and help search for relevant code.

Technologies

  • Advanced binary analysis to extract features from machine code, and machine learning to discover the library components in a binary application.
  • Statistical and machine learning on the semantic information extracted from sophisticated static analysis tools to further improve bug-finding tools.
  • Statistical learning on natural language information in code to find new kinds of bugs in programs.

Projects

Other Capabilities

Check out all of GrammaTech’s Areas of Expertise and stay informed.

view all posts

Contact Us

Get a personally guided tour of our solution offerings. 

Contact US