Research Items

ChatGPT and Cheat Detection in CS1 Using a Program Autograding System

Read the whitepaper here. -Frank Vahid*, Lizbeth Areizaga, Ashley Pang Dept. of Computer Science, Univ. of California, Riverside
*Also with zyBooks


We summarize the ability of a program autograding system to detect programs written by ChatGPT, rather than by students, in a CS1 class using a “many small programs” approach. First, we found ChatGPT was quite good at generating correct programs from a mere copy-paste of the English programming assignment specifications. However, running ChatGPT using 10 programming assignments and acting as 20 different students, and using zyBooks’ APEX tool for academic integrity, we found: (1) ChatGPT-generated programs tend to use a programming style departing from the style taught in the textbook or by the instructor, and these “style anomalies” were automatically detected. (2) Although ChatGPT may for the same assignment generate a few different program solutions for different students, ChatGPT often generates highly-similar programs for different students, so if enough students in a class (e.g., 5 or more) use ChatGPT, their programs will likely be flagged by a similarity checker. (3) If students are required to do all programming in the autograder’s IDE, then a student using ChatGPT ends up showing very little time relative to classmates, which is automatically flagged. (4) Manually, we observed that if a student consistently uses ChatGPT to submit programs, the programming style may vary from one program to another, something normal students don’t do; automation of such style inconsistency detection is underway. In short, while there will no doubt be a nuclear arms race between AI-generated programs and the ability to automatically detect AI-generated programs, currently it is likely that students using ChatGPT in a CS1 can be detected by automated tools such as APEX.

Less Is More: Students Skim Lengthy Online Textbooks

Abstract—Computer science textbooks with lengthy text
explanations of concepts are often considered thorough and
rigorous, so lengthy textbooks (and class notes) are commonplace.
Some, however, suggest text should be concise because people tend
to skim lengthy text. This paper takes advantage of modern digital
textbooks that measure reading time to examine reading rates for
various text passage lengths. For a widely-used CS textbook
written in a non-concise style, students read shorter passages (200
words or less) at about 200 words per minute, which is a typical
rate. But for longer passages (600+ words), the rate increased to
about 800 words per minute, suggesting skimming rather than
reading. For another widely-used CS textbook, from the same
publisher but written in a concise style with text passage sizes kept
below 250 words, students spent more time (around 200 words per

minute) reading the text passages, and their time spent was well-
correlated with text length, suggesting students were carefully

reading rather than skimming. Across three digital textbooks, the
more interactive elements (e.g., integrated questions) that were
included, the more time students spent reading the text between
those activities. The conclusion is that to best educate students,
authors of CS content should take the extra time needed to explain
concepts more concisely – a case of “less is more” – and
incorporate many active learning opportunities.


Link to Full Paper

Impact of Several Low-Effort Cheating-Reduction Methods in a CS1 Class


Cheating in introductory programming classes (CS1) is a well-known problem. Many instructors seek methods to prevent such cheating. Some methods are time-consuming or don’t scale to large classes. We experimented with several low-effort commonly-suggested methods to reduce cheating: (1) Discussing academic integrity for 20-30 minutes, several weeks into the term, (2) Requiring an integrity quiz with explicit do’s and don’ts, (3) Allowing students to withdraw program submissions, (4) Reminding students mid-term about integrity and consequences of getting caught, (5) Showing tools in class that an instructor has available (including a similarity checker, statistics on time spent, and access to a student’s full coding history), (6) Normalizing help and pointing students to help resources. Because counting students actually caught cheating is not objective (being influenced by how much effort is spent in detecting and investigating, and how an instructor subjectively decides cheating has occurred), we developed two automated coding-behavior metrics that may suggest how much cheating is happening. We compared those metrics for terms before and after the intervention.  The results show substantial student behavior improvements when applying those low-effort methods. In our Fall 2021 comparison, time spent programming increased from 6 min 56 sec, to 11 min 6 sec, for a 60% increase. And, the percent of students with suspiciously similar programs dropped from 33% to 18%, for a 45% decrease.


See full paper here

The rise of the zyLab program auto-grader in introductory CS courses

In recent years, hundreds of college courses have switched how they grade programming assignments, from grading manually and/or using batch scripts, to using commercial cloud-based auto-graders with immediate score feedback to students. This white paper provides data on the rise in usage of one of the most widely-used program auto-graders, zyLabs, as one indicator of the strong shift in college course grading to the auto-grading paradigm. The number of courses, instructors and students using zyLabs have increased dramatically since it was first introduced, such that from 2016 to 2021, the number of courses per year grew from 284 to 3,935, the number of students per year from 24,216 in 2016 to 220,453 in 2021, and the number of instructors per year from 364 to 3,724. The result is a substantial shift in the classroom dynamic that enables instructors and students to spend more time on quality teaching and learning.

See full paper here