TY - GEN
T1 - Extracting code from programming tutorial videos
AU - Yadid, Shir
AU - Yahav, Eran
N1 - Publisher Copyright: © 2016 ACM.
PY - 2016/10/20
Y1 - 2016/10/20
N2 - The number of programming tutorial videos on the web increases daily. Video hosting sites such as YouTube host millions of video lectures, with many programming tutorials for various languages and platforms. These videos contain a wealth of valuable information, including code that may be of interest. However, two main challenges have so far prevented the effective indexing of programming tutorial videos: (i) code in tutorials is typically written on-The-fly, with only parts of the code visible in each frame, and (ii) optical character recognition (OCR) is not precise enough to produce quality results from videos. We present a novel approach for extracting code from videos that is based on: (i) consolidating code across frames, and (ii) statistical language models for applying corrections at different levels, allowing us to make corrections by choosing the most likely token, combination of tokens that form a likely line structure, and combination of lines that lead to a likely code fragment in a particular language. We implemented our approach in a tool called ACE, and used it to extract code from 40 Android video tutorials on YouTube. Our evaluation shows that ACE extracts code with high accuracy, enabling deep indexing of video tutorials.
AB - The number of programming tutorial videos on the web increases daily. Video hosting sites such as YouTube host millions of video lectures, with many programming tutorials for various languages and platforms. These videos contain a wealth of valuable information, including code that may be of interest. However, two main challenges have so far prevented the effective indexing of programming tutorial videos: (i) code in tutorials is typically written on-The-fly, with only parts of the code visible in each frame, and (ii) optical character recognition (OCR) is not precise enough to produce quality results from videos. We present a novel approach for extracting code from videos that is based on: (i) consolidating code across frames, and (ii) statistical language models for applying corrections at different levels, allowing us to make corrections by choosing the most likely token, combination of tokens that form a likely line structure, and combination of lines that lead to a likely code fragment in a particular language. We implemented our approach in a tool called ACE, and used it to extract code from 40 Android video tutorials on YouTube. Our evaluation shows that ACE extracts code with high accuracy, enabling deep indexing of video tutorials.
KW - Programmer education
KW - Statistical language models
KW - Video tutorials
UR - http://www.scopus.com/inward/record.url?scp=84997402710&partnerID=8YFLogxK
U2 - https://doi.org/10.1145/2986012.2986021
DO - https://doi.org/10.1145/2986012.2986021
M3 - منشور من مؤتمر
T3 - Onward! 2016 - Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software
SP - 98
EP - 111
BT - Onward! 2016 - Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software
A2 - Visser, Eelco
A2 - Murphy-Hill, Emerson
A2 - Lopes, Crista
T2 - 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2016
Y2 - 2 November 2016 through 4 November 2016
ER -