Research Items

ChatGPT and Cheat Detection in CS1 Using a Program Autograding System

Read the whitepaper here. -Frank Vahid*, Lizbeth Areizaga, Ashley Pang Dept. of Computer Science, Univ. of California, Riverside
*Also with zyBooks

We summarize the ability of a program autograding system to detect programs written by ChatGPT, rather than by students, in a CS1 class using a “many small programs” approach. First, we found ChatGPT was quite good at generating correct programs from a mere copy-paste of the English programming assignment specifications. However, running ChatGPT using 10 programming assignments and acting as 20 different students, and using zyBooks’ APEX tool for academic integrity, we found: (1) ChatGPT-generated programs tend to use a programming style departing from the style taught in the textbook or by the instructor, and these “style anomalies” were automatically detected. (2) Although ChatGPT may for the same assignment generate a few different program solutions for different students, ChatGPT often generates highly-similar programs for different students, so if enough students in a class (e.g., 5 or more) use ChatGPT, their programs will likely be flagged by a similarity checker. (3) If students are required to do all programming in the autograder’s IDE, then a student using ChatGPT ends up showing very little time relative to classmates, which is automatically flagged. (4) Manually, we observed that if a student consistently uses ChatGPT to submit programs, the programming style may vary from one program to another, something normal students don’t do; automation of such style inconsistency detection is underway. In short, while there will no doubt be a nuclear arms race between AI-generated programs and the ability to automatically detect AI-generated programs, currently it is likely that students using ChatGPT in a CS1 can be detected by automated tools such as APEX.

Less Is More: Students Skim Lengthy Online Textbooks

Abstract‚ÄĒComputer science textbooks with lengthy text explanations of concepts are often considered thorough and rigorous, so lengthy textbooks (and class notes) are commonplace. Some, however, suggest text should be concise because people tend to skim lengthy text. This paper takes advantage of modern digital textbooks that measure reading time to examine reading rates for various text passage lengths. For a widely-used CS textbook written in a non-concise style, students read shorter passages (200 words or less) at about 200 words per minute, which is a typical rate. But for longer passages (600+ words), the rate increased to about 800 words per minute, suggesting skimming rather than reading. For another widely-used CS textbook, from the same publisher but written in a concise style with text passage sizes kept below 250 words, students spent more time (around 200 words per minute) reading the text passages, and their time spent was well- correlated with text length, suggesting students were carefully reading rather than skimming. Across three digital textbooks, the more interactive elements (e.g., integrated questions) that were included, the more time students spent reading the text between those activities. The conclusion is that to best educate students, authors of CS content should take the extra time needed to explain concepts more concisely ‚Äď a case of “less is more” ‚Äď and incorporate many active learning opportunities.

Impact of Several Low-Effort Cheating-Reduction Methods in a CS1 Class


Cheating in introductory programming classes (CS1) is a well-known problem. Many instructors seek methods to prevent such cheating. Some methods are time-consuming or don’t scale to large classes. We experimented with several low-effort commonly-suggested methods to reduce cheating: (1) Discussing academic integrity for 20-30 minutes, several weeks into the term, (2) Requiring an integrity quiz with explicit do’s and don’ts, (3) Allowing students to withdraw program submissions, (4) Reminding students mid-term about integrity and consequences of getting caught, (5) Showing tools in class that an instructor has available (including a similarity checker, statistics on time spent, and access to a student’s full coding history), (6) Normalizing help and pointing students to help resources. Because counting students actually caught cheating is not objective (being influenced by how much effort is spent in detecting and investigating, and how an instructor subjectively decides cheating has occurred), we developed two automated coding-behavior metrics that may suggest how much cheating is happening. We compared those metrics for terms before and after the intervention.¬† The results show substantial student behavior improvements when applying those low-effort methods. In our Fall 2021 comparison, time spent programming increased from 6 min 56 sec, to 11 min 6 sec, for a 60% increase. And, the percent of students with suspiciously similar programs dropped from 33% to 18%, for a 45% decrease.

Theory to Practice: Closing the Gap in Undergraduate Math to Reduce Student Attrition

Researchers have developed numerous effective methods and theoretical models for teaching undergraduate general education mathematics (UGEM); however, many universities have struggled for decades to bridge the translational gap in courses. This gap has contributed to the national lack of a skilled STEM workforce. Recently, University of Phoenix has decided to close the gap in translating theory into practice by shifting the institution’s philosophical framework about math education from traditionalist methodology to a synthesis of seminal theories and best practices.  The purpose of this paper is to disseminate the implementation of theory-based practices in UGEM toward reducing student attrition, including: Rationale for theory identification, the construction of a philosophical framework for course implementation, collection of stakeholder input, implementation, evaluation of the impact on attrition, post-implementation maintenance and communication, and institutional socialization of the new paradigmatic shift.   These efforts yielded an attrition rate reduction from 17.5% to 4.7% of students withdrawing or failing in Quantitative Reasoning 1 and from 13.9% to 4.0% in Quantitative Reasoning 2. A key outcome of this work is a blueprint for an institution to similarly close the theory to practice gap in courses. 

The rise of the zyLab program auto-grader in introductory CS courses

In recent years, hundreds of college courses have switched how they grade programming assignments, from grading manually and/or using batch scripts, to using commercial cloud-based auto-graders with immediate score feedback to students. This white paper provides data on the rise in usage of one of the most widely-used program auto-graders, zyLabs, as one indicator of the strong shift in college course grading to the auto-grading paradigm. The number of courses, instructors and students using zyLabs have increased dramatically since it was first introduced, such that from 2016 to 2021, the number of courses per year grew from 284 to 3,935, the number of students per year from 24,216 in 2016 to 220,453 in 2021, and the number of instructors per year from 364 to 3,724. The result is a substantial shift in the classroom dynamic that enables instructors and students to spend more time on quality teaching and learning.

See full paper here