Estimating types in binaries using predictive modeling

Omer Katz, Ran El-Yaniv, Eran Yahav

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Reverse engineering is an important tool in mitigating vulnerabilities in binaries. As a lot of software is developed in object-oriented languages, reverse engineering of object-oriented code is of critical importance. One of the major hurdles in reverse engineering binaries compiled from object-oriented code is the use of dynamic dispatch. In the absence of debug information, any dynamic dispatch may seem to jump to many possible targets, posing a significant challenge to a reverse engineer trying to track the program flow. We present a novel technique that allows us to statically determine the likely targets of virtual function calls. Our technique uses object tracelets - statically constructed sequences of operations performed on an object - to capture potential runtime behaviors of the object. Our analysis automatically pre-labels some of the object tracelets by relying on instances where the type of an object is known. The resulting type-labeled tracelets are then used to train a statistical language model (SLM) for each type.We then use the resulting ensemble of SLMs over unlabeled tracelets to generate a ranking of their most likely types, from which we deduce the likely targets of dynamic dispatches.We have implemented our technique and evaluated it over real-world C++ binaries. Our evaluation shows that when there are multiple alternative targets, our approach can drastically reduce the number of targets that have to be considered by a reverse engineer.

Original languageEnglish
Title of host publicationPOPL 2016 - Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
EditorsRupak Majumdar, Rastislav Bodik
Pages313-326
Number of pages14
ISBN (Electronic)9781450335492
DOIs
StatePublished - 11 Jan 2016
Event43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016 - St. Petersburg, United States
Duration: 20 Jan 201622 Jan 2016

Publication series

NameConference Record of the Annual ACM Symposium on Principles of Programming Languages
Volume20-22-January-2016

Conference

Conference43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016
Country/TerritoryUnited States
CitySt. Petersburg
Period20/01/1622/01/16

Keywords

  • Reverse engineering
  • Static binary analysis
  • X86

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'Estimating types in binaries using predictive modeling'. Together they form a unique fingerprint.

Cite this