TY - GEN
T1 - Combining LLM-Generated and Test-Based Feedback in a MOOC for Programming
AU - Gabbay, Hagit
AU - Cohen, Anat
N1 - Publisher Copyright: © 2024 Owner/Author.
PY - 2024/7/9
Y1 - 2024/7/9
N2 - In large-scale programming courses, providing learners with immediate and effective feedback is a significant challenge. This study explores the potential of Large Language Models (LLMs) to generate feedback on code assignments and to address the gaps in Automated Test-based Feedback (ATF) tools commonly employed in programming courses. We applied dedicated metrics in a Massive Open Online Course (MOOC) on programming to assess the correctness of feedback generated by two models, GPT-3.5-turbo and GPT-4, using a reliable ATF as a benchmark. The findings point to effective error detection, yet the feedback is often inaccurate, with GPT-4 outperforming GPT-3.5-turbo. We used insights gained from the prompt practices to develop Gipy, an application for submitting course assignments and obtaining LLM-generated feedback. Learners participating in a field experiment perceived the feedback provided by Gipy as moderately valuable, while at the same time recognizing its potential to complement ATF. Given the learners' critique and their awareness of the limitations of LLM-generated feedback, the studied implementation may be able to take advantage of the best of both ATF and LLMs as feedback resources. Further research is needed to assess the impact of LLM-generated feedback on learning outcomes and explore the capabilities of more advanced models.
AB - In large-scale programming courses, providing learners with immediate and effective feedback is a significant challenge. This study explores the potential of Large Language Models (LLMs) to generate feedback on code assignments and to address the gaps in Automated Test-based Feedback (ATF) tools commonly employed in programming courses. We applied dedicated metrics in a Massive Open Online Course (MOOC) on programming to assess the correctness of feedback generated by two models, GPT-3.5-turbo and GPT-4, using a reliable ATF as a benchmark. The findings point to effective error detection, yet the feedback is often inaccurate, with GPT-4 outperforming GPT-3.5-turbo. We used insights gained from the prompt practices to develop Gipy, an application for submitting course assignments and obtaining LLM-generated feedback. Learners participating in a field experiment perceived the feedback provided by Gipy as moderately valuable, while at the same time recognizing its potential to complement ATF. Given the learners' critique and their awareness of the limitations of LLM-generated feedback, the studied implementation may be able to take advantage of the best of both ATF and LLMs as feedback resources. Further research is needed to assess the impact of LLM-generated feedback on learning outcomes and explore the capabilities of more advanced models.
KW - MOOC for programming
KW - automated feedback
KW - generative AI
KW - large language models (LLMs)
KW - programming education
UR - http://www.scopus.com/inward/record.url?scp=85199921716&partnerID=8YFLogxK
U2 - https://doi.org/10.1145/3657604.3662040
DO - https://doi.org/10.1145/3657604.3662040
M3 - منشور من مؤتمر
T3 - L@S 2024 - Proceedings of the 11th ACM Conference on Learning @ Scale
SP - 177
EP - 187
BT - L@S 2024 - Proceedings of the 11th ACM Conference on Learning @ Scale
T2 - 11th ACM Conference on Learning @ Scale, L@S 2024
Y2 - 18 July 2024 through 20 July 2024
ER -