Extracting code from programming tutorial videos

Shir Yadid, Eran Yahav

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The number of programming tutorial videos on the web increases daily. Video hosting sites such as YouTube host millions of video lectures, with many programming tutorials for various languages and platforms. These videos contain a wealth of valuable information, including code that may be of interest. However, two main challenges have so far prevented the effective indexing of programming tutorial videos: (i) code in tutorials is typically written on-The-fly, with only parts of the code visible in each frame, and (ii) optical character recognition (OCR) is not precise enough to produce quality results from videos. We present a novel approach for extracting code from videos that is based on: (i) consolidating code across frames, and (ii) statistical language models for applying corrections at different levels, allowing us to make corrections by choosing the most likely token, combination of tokens that form a likely line structure, and combination of lines that lead to a likely code fragment in a particular language. We implemented our approach in a tool called ACE, and used it to extract code from 40 Android video tutorials on YouTube. Our evaluation shows that ACE extracts code with high accuracy, enabling deep indexing of video tutorials.

Original languageEnglish
Title of host publicationOnward! 2016 - Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software
EditorsEelco Visser, Emerson Murphy-Hill, Crista Lopes
Pages98-111
Number of pages14
ISBN (Electronic)9781450340762
DOIs
StatePublished - 20 Oct 2016
Event2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2016 - Amsterdam, Netherlands
Duration: 2 Nov 20164 Nov 2016

Publication series

NameOnward! 2016 - Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software

Conference

Conference2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2016
Country/TerritoryNetherlands
CityAmsterdam
Period2/11/164/11/16

Keywords

  • Programmer education
  • Statistical language models
  • Video tutorials

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Extracting code from programming tutorial videos'. Together they form a unique fingerprint.

Cite this