Widespread AI misuse means higher ed must rethink assessment
By Patricia Waldron
Large numbers of college students are now using artificial intelligence to complete – and cheat on – their assignments, suggesting that colleges and universities need to change how they are evaluating students, new Cornell research finds.
An analysis of survey responses from more than 95,000 students at 20 public research universities in the U.S. finds about one-third regularly used generative AI (GenAI), such as ChatGPT or other models to produce text, video or code, when completing assignments, and 9% had used it to cheat.
“Assessment reform is necessary and urgent,” said study co-author Rene Kizilcec, associate professor of information science at the Cornell Ann S. Bowers College of Computing and Information Science and director of the Future of Learning Lab. “The fact that students are misusing GenAI is a problem for assessment validity, and that’s a problem for the credibility of university credentials.”
The new study, “Generative AI Use and Misuse Call for Assessment Reform in Higher Education,” published May 21 in the journal Science.
Kizilcec partnered with Igor Chirikov, a senior researcher at the Center for Studies in Higher Education and director of the Student Experience in the Research University (SERU) Consortium at the University of California, Berkeley, to investigate AI use and misuse among university students. Each year, SERU sends out surveys to undergraduates, asking students’ opinions on engagement, belonging, affordability and other topics.
The questions regarding GenAI usage, collected during the 2023-24 academic year, was the largest survey of its kind at the time, which enabled researchers to break down responses by discipline.
“We wanted to provide a more evidence-based approach to how students actually use AI, and, more importantly, misuse it,” Chirikov said. “Even this early stage evidence shows that we have a very serious challenge on our hands, and universities need to address that.”
Overall, 37% of students reported using AI at least monthly, with disciplines requiring large amounts of data analysis showing higher rates of adoption. Rates varied, with 62% of computer science students reporting regular usage, compared to 24% of students in the arts.
The survey also showed demographic differences in GenAI use. Researchers found that 33% of female students reported using GenAI regularly, compared to 45% of male students. People belonging to underrepresented racial minorities also had lower rates of regular use at 29%, compared to 39% of white and Asian students.
These demographic differences may reflect equity gaps in the use of AI tools, researchers said. Additionally, they warn these gaps may widen as GenAI tools become more specialized and costly.
“Those disparities can shape both students’ learning and familiarity with the tools as they go through college and then in the labor market,” Chirikov said.
To accurately estimate rates of cheating – something students may hesitate to admit – the researchers used a technique called a list randomization experiment. They provided a short list of statements and asked students how many statements – but not which ones – applied to them. By including an additional statement about cheating on some surveys but not others, they could estimate rates of AI misuse.
Overall, the number of students who had used AI to cheat was lower than anecdotal reports had suggested, researchers said. Daily GenAI users had the highest rate of cheating, at 26%, compared to 7% for those who used it monthly.
“As we expect GenAI use among students to only grow, for better and worse, we also expect that GenAI misuse will grow, which is concerning,” Kizilcec said.
The study’s authors call for changes in how universities are assessing students, to promote academic integrity. They suggest three strategies: professors could go back to highly controlled testing environments – just pen, paper and proctors; they can set clearer guidelines for acceptable AI use; or they can adapt assessments to include AI in ways that show off professional skills.
Due to the differences between disciplines, the researchers propose that professional societies play a role in determining how best to evaluate student learning in their field in the age of AI. However, they caution that universities must also be mindful of inequities in AI literacy and access among students.
“If we’re not careful in how we implement new assessments that rely on or integrate GenAI, we may inadvertently exacerbate long-standing educational disparities,” Kizilcec said.
Ivan Smirnov of the University of Technology Sydney is a co-author on the paper.
Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.
Media Contact
Get Cornell news delivered right to your inbox.
Subscribe