search menu icon-carat-right cmu-wordmark

Poster - Recovering Meaningful Variable Names in Decompiled Code

November 2020 Poster
Edward J. Schwartz, Cory Cohen

This presentation describes DIRE, a novel probabilistic technique for variable name recovery that uses lexical and structural information.

Publisher:

Software Engineering Institute

Abstract

Understanding executable code is a challenge because the compilation process removes much of the source code information. Decompilers have been widely believed to be unable to recover meaningful variable names, which improve code understandability. To meet this challenge, CMU SEI researchers developed the Decompiled Identifier Renaming Engine (DIRE), a novel probabilistic technique for variable name recovery that uses lexical and structural information. CMU SEI researchers also developed a technique for generating corpora for training and evaluating models of decompiled code renaming, which researchers used to create a corpus of 164,632 unique x86-64 binaries generated from C projects mined from Github. Surprisingly, the results show that DIRE can predict variable names identical to the names in the original source code up to 74.3% of the time.