ChatGPT and Cheat Detection in CS1 Using a Program Autograding System

Read the whitepaper here. -Frank Vahid*, Lizbeth Areizaga, Ashley Pang Dept. of Computer Science, Univ. of California, Riverside
*Also with zyBooks


We summarize the ability of a program autograding system to detect programs written by ChatGPT, rather than by students, in a CS1 class using a “many small programs” approach. First, we found ChatGPT was quite good at generating correct programs from a mere copy-paste of the English programming assignment specifications. However, running ChatGPT using 10 programming assignments and acting as 20 different students, and using zyBooks’ APEX tool for academic integrity, we found: (1) ChatGPT-generated programs tend to use a programming style departing from the style taught in the textbook or by the instructor, and these “style anomalies” were automatically detected. (2) Although ChatGPT may for the same assignment generate a few different program solutions for different students, ChatGPT often generates highly-similar programs for different students, so if enough students in a class (e.g., 5 or more) use ChatGPT, their programs will likely be flagged by a similarity checker. (3) If students are required to do all programming in the autograder’s IDE, then a student using ChatGPT ends up showing very little time relative to classmates, which is automatically flagged. (4) Manually, we observed that if a student consistently uses ChatGPT to submit programs, the programming style may vary from one program to another, something normal students don’t do; automation of such style inconsistency detection is underway. In short, while there will no doubt be a nuclear arms race between AI-generated programs and the ability to automatically detect AI-generated programs, currently it is likely that students using ChatGPT in a CS1 can be detected by automated tools such as APEX.