Structural language models of code

Uri Alon, Roy Sadaka, Omer Levy, Eran Yahav

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We address the problem of any-code completion generating a missing piece of source code in a given program without any restriction on the vocabulary or structure. We introduce a new approach to any-code completion that leverages the strict syntax of programming languages to model a code snippet as a tree structural language modeling (SLM). SLM estimates the probability of the program s abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous techniques that have severely restricted the kinds of expressions that can be generated in this task, our approach can generate arbitrary code in any programming language. Our model significantly outperforms both seq2seq and a variety of structured approaches in generating Java and C# code. Our code, data, and trained models are available at http://github.com/tech-srl/ slm-code-generation/. An online demo is available at http://AnyCodeGen.org.

Original languageEnglish
Title of host publication37th International Conference on Machine Learning, ICML 2020
EditorsHal Daume, Aarti Singh
Pages222-233
Number of pages12
ISBN (Electronic)9781713821120
StatePublished - 2020
Event37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
Duration: 13 Jul 202018 Jul 2020

Publication series

Name37th International Conference on Machine Learning, ICML 2020
VolumePartF168147-1

Conference

Conference37th International Conference on Machine Learning, ICML 2020
CityVirtual, Online
Period13/07/2018/07/20

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Structural language models of code'. Together they form a unique fingerprint.

Cite this