You might therefore reasonably believe that published papers are quite reliable and meet high-quality standards. You might expect small mistakes that got overlooked during peer review, but no major blunders. It’s science, after all! You’d be wrong in expecting this, though. Real and good science does exist, but there’s a worrying amount of bogus research out there, too. And in the last few years, it has increased in volume at lightning speed, as evidenced by the skyrocketing number of paper retractions. This process is similar to a recall at the grocery store. If a previously sold product is bad or dangerous for some reason, the store might decide to recall it and ask all customers not to use it. Similarly, a journal can recall a published paper that, in hindsight, turned out to be bogus. A number of practices currently threaten to undermine the legitimacy of scientific research. They include made-up authors, the addition of scientists who had nothing to do with a paper as a co-writer and even more nefarious practices like swamping journals with submissions from low-quality, AI-written junk. Of course, sometimes papers get retracted because the authors made an honest mistake in their research. In more than half the cases, however, it’s because of academic misconduct or fraud. Up until a decade ago, this sort of behavior was more or less limited to researchers falsifying experimental data or skewing results to favor their theory. The more sophisticated technology has become, however, the more things have gotten a lot more complicated. One simple solution would be to just ignore bogus papers. The problem, though, is that they’re often hard to identify. Also, once a paper is retracted from a publication, that tarnishes the entire journal a bit. Let this happen often enough, and the public’s confidence in science as a whole goes down. Therefore, the scientific community as a whole needs to take this problem seriously.
Camille Noûs
Some of the problem is analog. Camille Noûs doesn’t have much to do with AI, but it deserves a mention nevertheless. Born in March 2020, Noûs has already co-authored more than 180 papers in fields as diverse as astrophysics, computer science and biology I’m saying “it” because Noûs is not a real person; rather, it’s an artifact invented by French research advocacy group RogueESR. It carries the gender-neutral French first name Camille and a conflation of the ancient Greek word “νοῦς,” meaning reason or cognition, and the French word “nous,” meaning “us.” Noûs was created in response to a heavily criticized new law (source in French) to reorganize academic research in France. Although the law’s objective was to make research better, its critics think that scientists’ jobs will be unfairly precarious and dependent on external funding under its requirements. In particular, the funding a scientist gets must depend on their own previous achievements, although research is often a community effort. To make this concern visible, many researchers chose to add Noûs as a co-author. The journals and peer reviewers who were in charge of checking those papers weren’t always informed, however, that Noûs isn’t a real person. Although the research portion of all these papers so far seems legitimate, it’s cause for concern that one can so easily add a co-author that doesn’t even have an ID card. Although highlighting communal efforts with authors like Noûs is an honorable goal, the idea that scientists can be invented out of thin air in this day and age is quite alarming.
Adding authors where they don’t belong
Highlighting the flaws of the peer-review system and academia isn’t the only place this problem manifests, though. Especially in papers about AI, cases of fake co-authorship have been mounting. This deception includes the practice of adding a high-profile scientist as a co-author without their knowledge or consent. Another way to carry this out is by adding a fictitious co-author, kind of like Camille Noûs, but with the goal of feigning international collaboration or broader scientific discourse. In addition to giving the illusion of international collaboration, adding fake authors with respectable credentials may contribute to a paper’s credibility. Many scientists will Google all authors’ names before reading a paper or citing it in their work. But seeing a co-author from a prestigious institution may sway them to give a closer look at a paper, especially if it hasn’t been peer-reviewed yet. The prestige of an institution can then function like a proxy for credibility until the peer-review, which can take many months, is completed. It’s unclear how many fake authors have been added to date. For one thing, some scientists may choose to ignore the fact that their name is on a paper they didn’t write, especially as the content of the papers in question often isn’t terrible (though not great) and legal action can get too expensive and time consuming. Moreover, no standard method currently exists to verify a scientist’s identity prior to publishing a paper. This gives fake authors a free pass. All these issues show the necessity of some type of ID-verification process. Nothing formal is currently in place, and that’s a shame. In a day and age where every bank can verify your ID online and match it with the face on your webcam, science can’t even protect its most valuable contributors from scammers.
Algorithms are producing bad articles
In 1994, physicist Alan Sokal got the itch to write a bogus paper about some subject related to the humanities and submit it to a journal. It got accepted, although no one, including the author himself, understood what he was saying. Not only is this ridiculous, but it also goes to show how lazy peer reviewers can get. In this case, they literally accepted what was essentially an article of gibberish. Along similar lines, in 2005, a trio of computer science students decided to develop SCIgen as a prank on the research world. This program churns out completely nonsensical papers complete with graphs, figures and citations, peppered with lots of buzzwords from computer science. One of their gibberish papers was accepted for a conference at the time. What’s more, in 2013, 120 papers were retracted by various publishers after they found out that SCIgen had written them. In 2015, the site still got 600,000 page visits per year. Unfortunately, though, fake papers aren’t only generated as pranks. Entire companies make money writing gibberish papers and submitting them to predatory journals that hardly reject anything because they charge a fee for publishing. Such companies, also dubbed paper mills, are getting more and more sophisticated in their methods. Although fraud detection is also getting better, experts have legitimate fears that these unscrupulous actors, having honed their craft targeting low-quality journals, may try to swamp real ones next. This could lead to an arms race between paper mills and journals that don’t want to publish bogus work. Of course, there’s another question on the horizon: How much longer will humans be the only ones writing research papers? Could it be that in 10 or 20 years, AI-powered algorithms are able to automatically sift through swaths of literature and put their conclusions in a new paper that reaches the highest standards of research? How are we going to give credit to these algorithms or their creators? Today, though, we’re dealing with a far sillier question: How can we identify papers that have been written by relatively unsophisticated algorithms and don’t produce any sensible content? And how do we deal with them? Apart from volunteer efforts and forcing fraudulent authors to retract their papers, the scientific community has surprisingly few answers to that question.
Act against fake science
Most journals with a good reputation to lose have, at least, a basic email verification process for researchers looking to submit a paper. Here, for example, is the verification system for the journal Science. Despite this, setting up a fake email address and going through the process with it is quite easy. This type of fraud still happens a lot, as illustrated by the sheer amount of papers that get retracted even from prestigious journals each year. So, we’re in need of stronger systems. One good approach to verifying the identity of a scientist is ORCID. Basically, through this system, every researcher can get a unique identifier, which is then automatically linked to their track record. Using ORCID throughout a journal’s peer-review and publication processes would make it much harder to create a fake identity or use other researchers’ credentials without their knowledge or consent. Although this is a very good initiative, no major journal has yet rendered identifiers from ORCID or elsewhere mandatory for all authors. That’s a shame, in my opinion, and something that could be fixed pretty easily. Finally, AI might be itself useful in this struggle. Some journals are deploying AI models to detect fake contributions. As of now, however, journals have been unable to agree on a common standard. As a consequence, journals that lack the resources or the expertise can’t apply the same quality measures as higher-ranking publications. This widens the perceived gap between high- and low-tier journals and is, to me, clear proof that journals across the board should get together and find a way to share resources for fraud detection. Of course, high-tier journals might profit from the lack of competition in the short term. In the long term, however, having more journals with low standards might reduce confidence in scientific publishing as a whole. It’s not that researchers and science journals are sitting on their lazy asses instead of tracking down fraudulent authors, though. Individual publications are, in fact, doing a lotto track down fake papers. But if some journals have the means and others don’t, publications aren’t operating on a level playing field. Plus, scammers will always be able to target some underfunded journals with their fake papers. Journals need to act collectively to find a way to track down paper mills and verify the identity of all their authors.
Beyond science: fake news is getting faker
If you think that fake content is a problem limited to science, you’re mistaken. Only a few years back, during the height of the Trump era, “fake news” was the buzzword of the season. The methods to generate content to sway public opinion have only gotten more sophisticated since then, and they’re jarringly similar to those of fake science papers. For example, fake journalists were the apparent authors of op-eds in various conservative outlets. Their headshots were generated with AI-algorithms. Their LinkedIn and Twitter accounts are entirely fake, and it’s still unclear who’s really behind these articles. There are also several fake news article generators that make creating fake headlines easy. Although you might not be able to convince an experienced fact-checker with such content, you might be able to impress the average Facebook user with it enough to convince them to share the article. That’s why I myself tend to trust only news and science from established sources, or content that I can cross-check enough to determine that it’s true. I totally disregard other sources because I know that most of them range from “a little bit wrong” to “totally made-up.” I didn’t have that attitude a few years back. Neither did the people around me. Trust in news has eroded dramatically, and I have no idea how we’ll be able to restore it. Now, what’s already been happening with news is happening with science. It’s bad enough that it’s difficult to find out the truth about what’s happening in the world. But if the very foundations of human knowledge erode, that would be an even bigger disaster. Although the debate around fake news has died down since the 2020 election, it’s far from over. Since the tools for faking content are still getting more and more sophisticated, I believe the conversation will get more fuel in the years to come. Hopefully, by then, we’ll have reached a consensus on how to fight against fake content — and fake research, too. This article was written by Ari Joury and was originally published on Towards Data Science You can read it here.