search menu icon-carat-right cmu-wordmark

Recovering Meaningful Variable Names in Decompiled Code (video)

Video
Watch CMU Institute for Software Research (ISR) assistant professor Dr. Bogdan Vasilescu discuss ongoing collaborative research with SEI in the reverse engineering of legacy systems using machine learning (ML) to identify variables.
Publisher

Software Engineering Institute

Watch

Abstract

In this project, we propose the Decompiled Identifier Renaming Engine (DIRE), a novel probabilistic technique for variable name recovery that uses both lexical and structural information. We also present a technique for generating corpora suitable for training and evaluating models of decompiled code renaming, which we use to create a corpus of 164,632 unique x86-64 binaries generated from C projects mined from Github. Our results show that on this corpus DIRE can predict variable names identical to the names in the original source code up to 74.3% of the time.