AI gives scientists a boost, but at the cost of too many mediocre papers

After ChatGPT became available to the public in late 2022, scientists began talking among themselves about how much more productive they were using these new artificial intelligence tools, while scientific journal editors complained of an influx of well-written papers with little scientific value.

These anecdotal conversations represent a real shift in how scientists are writing up their work, according to a new study by Cornell researchers. They showed that using large language models (LLMs) like ChatGPT boosts paper production, especially for non-native English speakers. But the overall increase in AI-written papers is making it harder for many people – from paper reviewers to funders to policymakers – to separate the valuable contributions from the AI slop.

“It is a very widespread pattern, across different fields of science – from physical and computer sciences to biological and social sciences,” said Yian Yin, assistant professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science. “There’s a big shift in our current ecosystem that warrants a very serious look, especially for those who make decisions about what science we should support and fund.”    

The new paper, “Scientific Production in the Era of Large Language Models,” published Dec. 18 in Science.

Yin’s group investigated the impacts of LLMs on scientific publishing by collecting more than 2 million papers posted between January 2018 and June 2024 on three online preprint websites. The three sites – arXiv, bioRxiv and Social Science Research Network (SSRN) – cover the physical, life and social sciences, respectively, and post scientific papers that have yet to undergo peer review.

The researchers compared presumably human-authored papers posted before 2023 to AI-written text, in order to develop an AI model that detects papers likely written by LLMs. With this AI detector, they could identify which scientists were probably using the technology for writing, count how many papers they published before and after adopting AI, and then see whether those papers were ultimately deemed worthy of publication in scientific journals.

Their analysis showed a big AI-powered productivity bump. On the arXiv site, scientists who appeared to use LLMs posted about one-third more papers than scientists who weren’t getting an assist from AI. The increase was more than 50% for bioRxiv and SSRN.

Not surprisingly, scientists whose first language is not English, who face the hurdle of communicating science in a foreign language, benefited the most from LLMs. Researchers from Asian institutions, for example, posted between 43.0% and 89.3% more papers after the AI detector indicated a switch to using LLMs compared to similar scientists not using the technology, depending on the preprint site. The benefit is so large, Yin predicts a global shift in the regions with the greatest scientific productivity, to areas previously disadvantaged by the language barrier.

The study uncovered another positive effect of AI in paper preparation. When scientists search for related research to cite in their papers, Bing Chat – the first widely adopted AI-powered search tool – is better at finding newer publications and relevant books, compared to traditional search tools, which tend to identify older, more commonly cited works.

 “People using LLMs are connecting to more diverse knowledge, which might be driving more creative ideas,” said first author Keigo Kusumegi, a doctoral student in the field of information science. In future work, he hopes to explore whether AI use leads to more innovative, interdisciplinary work.

While LLMs make it easier for individuals to produce papers, they also make it harder for others to evaluate their quality. For human-written work, clear yet complex language – with big words and long sentences – is usually a reliable indicator of quality research. Across all three preprint sites, papers likely written by humans that scored high on a writing complexity test were most likely to be accepted to a scientific journal. But high-scoring papers probably written by LLMs were less likely to be accepted, suggesting that despite the convincing language, reviewers deemed many of these papers to have little scientific value. 

This disconnect between writing quality and scientific quality could have big implications, Yin said, as editors and reviewers struggle to identify valuable paper submissions, and universities and funding agencies can no longer evaluate scientists based on their productivity.

The researchers caution that the new findings are based solely on observations. Next, they hope to perform causal analysis, such as a controlled experiment, where some scientists are randomly assigned to use LLMs and others can’t.

Yin is also planning a symposium that will examine how generative AI is transforming research – and how scientists and policymakers can best shape these changes – to take place March 3-5, 2026 on the Ithaca campus.

As scientists increasingly rely on AI for writing, coding and even idea generation – essentially using AI as a co-scientist – Yin suspects that its impacts will likely broaden. He urges policymakers to make new rules to regulate the rapidly evolving technological landscape

“Already now, the question is not, have you used AI? The question is, how exactly have you used AI and whether it’s helpful or not.”

Co-authors on the study include Xinyu Yang, a doctoral student in the field of computer science; Paul Ginsparg, professor of information science in Cornell Bowers and of physics in the College of Arts and Sciences, and founder of arXiv; and Mathijs de Vaan and Toby Stuart of the University of California, Berkeley.

This work received support from the National Science Foundation.

Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.

Media Contact

Becka Bowyer