Code completion with statistical language models

Veselin Raychev, Martin Vechev, Eran Yahav

Research output: Contribution to journalArticlepeer-review

Abstract

Our main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. We then employ the language model to find the highest ranked sentences, and use them to synthesize a code completion. Our approach is able to synthesize sequences of calls across multiple objects together with their arguments.

Experiments show that our approach is fast and effective. Virtually all computed completions typecheck, and the desired completion appears in the top 3 results in 90% of the cases.

We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.

Original languageEnglish
Pages (from-to)419-428
Number of pages10
JournalACM SIGPLAN Notices
Volume49
Issue number6
DOIs
StatePublished - 5 Jun 2014
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'Code completion with statistical language models'. Together they form a unique fingerprint.

Cite this