search menu icon-carat-right cmu-wordmark

Discovery of C++ Data Structures from Binaries

October 2014 Article
Dan Quinlan (Lawrence Livermore National Laboratory), Cory Cohen

In this article, the authors present the techniques to identify C++ data structures in binary executables.


ACM, Inc.


This extended abstract presents the techniques to identify C++ data structures in binary executables. With respect to automated tools, this is a largely open problem and generally requires significant manual intervention, inspection, and tracing to establish. The techniques for manual evaluation of C++ data structures are well known, but tedious. Because of this manual handling, the results are error prone and sensitive to the time available and experience of the analyst. All of our work is accomplished using the ROSE compiler infrastructure.

ROSE is an open source compiler infrastructure that handles source code, and also binary executables. Uniquely ROSE handles binary executables much like source code, parsing them to identify and represent their internal parts in an intermediate representation (IR), disassembling the appropriate segments containing instructions, defining a number of standard forms of program analysis, and permitting users to define their own specialized forms of analysis. The work to reconstruct C++ data structures is part of larger work that reconstructs all the data used in the binary more generally.