Estimating types in binaries using predictive modeling

Omer Katz, Ran El-Yaniv, Eran Yahav

Research output: Contribution to journalArticlepeer-review

Abstract

Reverse engineering is an important tool in mitigating vulnerabilities in binaries. As a lot of software is developed in object-oriented languages, reverse engineering of object-oriented code is of critical importance. One of the major hurdles in reverse engineering binaries compiled from object-oriented code is the use of dynamic dispatch. In the absence of debug information, any dynamic dispatch may seem to jump to many possible targets, posing a significant challenge to a reverse engineer trying to track the program flow. We present a novel technique that allows us to statically determine the likely targets of virtual function calls. Our technique uses object tracelets - statically constructed sequences of operations performed on an object - to capture potential runtime behaviors of the object. Our analysis automatically pre-labels some of the object tracelets by relying on instances where the type of an object is known. The resulting type-labeled tracelets are then used to train a statistical language model (SLM) for each type.We then use the resulting ensemble of SLMs over unlabeled tracelets to generate a ranking of their most likely types, from which we deduce the likely targets of dynamic dispatches.We have implemented our technique and evaluated it over real-world C++ binaries. Our evaluation shows that when there are multiple alternative targets, our approach can drastically reduce the number of targets that have to be considered by a reverse engineer.

Original languageEnglish
Pages (from-to)313-326
Number of pages14
JournalACM SIGPLAN Notices
Volume51
Issue number1
DOIs
StatePublished - 8 Apr 2016

Keywords

  • Reverse engineering
  • Static binary analysis
  • x86

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'Estimating types in binaries using predictive modeling'. Together they form a unique fingerprint.

Cite this