Software Engineering Institute | Carnegie Mellon University
Software Engineering Institute | Carnegie Mellon University

Digital Library

Javascript is currently disabled for your browser. For an optimal search experience, please enable javascript.

Advanced Search

Basic Search

Content Type

Topics

Publication Date

Article

Supervised Learning for Provenance-Similarity of Binaries

  • October 2014
  • By Sagar Chaki, Cory Cohen, Arie Gurfinkel
  • In this article, the authors present a notion of similarity based on provenance; two binaries are similar if they are compiled from the same source code with the same compilers.
  • Malware Analysis
  • Publisher: ACM, Inc.
  • Abstract

    Understanding, measuring, and leveraging the similarity of binaries (executable code) is a foundational challenge in software engineering. We present a notion of similarity based on provenance—two binaries are similar if they are compiled from the same (or very similar) source code with the same (or similar) compilers. Empirical evidence suggests that provenance-similarity accounts for a significant portion of variation in existing binaries, particularly in malware. We propose and evaluate the applicability of classification to detect provenance-similarity. We evaluate a variety of classifiers, and different types of attributes and similarity labeling schemes, on two benchmarks derived from open-source software and malware respectively. We present encouraging results indicating that classification is a viable approach for automated provenance-similarity detection, and as an aid for malware analysts in particular.

Read Article

Published by ACM, Inc.

Read Article