Are Innocent Students Paying the Price for Turnitin's AI Detection Flaws?

TurnItIn, a popular internet-based plagiarism detection service, has released an "AI Detector" for teachers and educators across the globe. But how accurate is it? And what happens when it falsely flags students?

Justin Gluska

Updated October 14, 2024

a chalkboard in a school getting written on by a robot teaching a class full of schoolchildren, digital art in 4k

a chalkboard in a school getting written on by a robot teaching a class full of schoolchildren, digital art in 4k

Reading Time: 12 minutes

Artificial intelligence is increasingly making its way into our daily lives, and the world of academia isn't immune. With tools like TurnItIn's AI detection software alongside AI-generated writing tools like ChatGPT, teachers and students alike are finding themselves navigating uncharted territory.

Educators are growing more concerned about students potentially cheating on assignments using advanced technologies. In response, they're looking to solutions like TurnItIn's new AI writing detection tool to combat this issue.

But while AI detection tools might seem promising at first glance, they come with their own set of problems.

Over the next few minutes, we'll dive into the unique relationship between AI-driven cheating and AI writing detection technology, as well as their impact on both educators and innocent students who might get caught in the crossfire of false positives.

TurnItIn's AI Detection Software: A Double-Edged Sword

The new software is set to be activated for over 10,700 educational institutions over the next few weeks, offering a solution to counter AI-assisted cheating, but at what cost?

The accuracy of the AI detectors has been called into question, with TurnItIn's software giving false positives and not being completely reliable in identifying mixed AI and human writing sources. And if you’re asking: “Can Turnitin detect paraphrasing?” — the answer is yes, though it might give professors a lower AI likelihood score.

High school senior Lucy Goetz was surprised when the detector flagged her essay on socialism as likely AI-generated. This shows that Turnitin's software is not foolproof, and AI technology is constantly evolving, making it difficult to keep up. However, it seems that professors are often comparing AI detection to concepts like plagiarism checkers.

But... AI Detection is NOT conclusive, it works based on predictions. I don't see how it's fair to accuse students of something when you don't have definitive proof.

Regardless, some experts believe that the introduction of AI detectors will only result in an arms race between cheaters and detectors... forever chasing tails for something neither will ever fully catch. But what can be done about it?

An example AI detection screen in TurnItIn's article overview

Take this recent example of a college senior at UC Davis: William Quarterman, was accused of cheating on his history exam after his professor used an AI detection tool, GPTZero, which claimed his answers were AI-generated.

Quarterman denied the allegations and experienced panic attacks as he had to face the university's honor court. He was eventually cleared of the accusation after providing evidence that he didn't use AI.

As higher education institutions struggle to address the increasing use of AI by students for assignments and exams, the reliability of AI-driven detection software is being questioned.

The rate of false positives from AI detection tools is not zero, and creators of these tools like OpenAI, Turnitin, Content at Scale, and GPTZero, warn educators about possible inaccuracies. 

As a result, education technology experts recommend that schools embrace AI and develop policies around its use, including citing AI when appropriate, making assessments more rigorous, and determining the right questions to ask when a student is suspected of cheating.

However, there is a flip side to these advantages, as the potential dangers of relying too heavily on AI detection become increasingly evident:

False positives.

As seen in Goetz's case mentioned in the Washington Post article, detectors can sometimes get it wrong - with potentially disastrous consequences for innocent students. When Turnitin flagged a portion of her original essay as likely being generated by ChatGPT, it raised serious concerns about false accusations based on imperfect technology.

Detectors Introduced Without Widespread Vetting

The rapid implementation of TurnItIn’s AI detection software across educational institutions raises questions about whether sufficient testing has been conducted to ensure its accuracy and fairness. For instance, the Washington Post columnist found several California high schoolers' papers falsely identified as fabricated by TurnItIn's new detector.

This is huge. It's like TurnItIn just hopped on the wave of AI detection and released something that can damage millions of students, and do we really know the extent of their vetting process?

Rapidly Evolving AI Technology Outpacing Detection Tools

Creators behind detection tools acknowledge their systems' fallibility due to constant advancements in AI-generated content techniques. Additionally, newer versions of popular writing bots such as ChatGPT (e.g., GPT-4o) or Google's Bard may already surpass the detection capabilities of current tools.

One teacher, who wishes to remain unnamed, expressed their concern for the current accuracy level of TurnItIn's AI detection tool:

"Turnitin is claiming 98% accuracy. But in the previous 3 days of my testing, I feel it is more like 60%. One of my papers (published in 2020) was flagged as AI written. Similarly, some prompts generated content which was flagged as human written. I am just appalled that some professors keep thinking AI detection is similar to plagiarism detection even though it is not."

The Consequences of Incorrectly Flagged Students

When students are falsely flagged by AI detection tools, the consequences can ripple beyond mere academic repercussions. Innocent students caught in this crossfire may face a range of adverse effects that could impact their education and overall well-being.

The impact on student-teacher relationships

When students like Goetz are wrongly accused of cheating, the trust they have built with their teachers may be significantly damaged. Teachers play a huge role in fostering an environment conducive to learning, and mutual trust is a fundamental aspect of that relationship.

A false accusation can create a sense of doubt in the teacher's mind, potentially leading them to scrutinize future assignments from the affected student more critically than others.

Students accused of AI may feel alienated or unfairly targeted by their instructors, which can result in disengagement from class activities or reluctance to seek help when needed.

These strained relationships could extend beyond individuals, impacting classroom dynamics as other students become aware of potential inaccuracies in AI detection tools. In some cases, such awareness may also lead to doubts about fair grading practices or increased skepticism regarding educators' reliance on AI-based solutions.

Unfair punishments and damaged academic reputations

When students like Quarterman are falsely accused of cheating due to the inaccuracies of AI detection software, they may face a range of unfair consequences that could negatively impact their educational journey. These include failing grades on assignments or exams, disciplinary actions such as academic probation, suspension, or even expulsion in severe cases.

Such punishments can tarnish a student's academic record and pursue future opportunities. Poor grades resulting from incorrect cheating accusations might hinder students' chances of securing scholarships or being accepted into competitive college programs.

In addition to these tangible effects, an undeserved blemish on one's academic reputation may have long-lasting repercussions on self-esteem and confidence in their abilities. Students who experience false allegations might become more hesitant to take risks in their studies out of fear that they may again be flagged erroneously.

To prevent these detrimental outcomes for innocent students caught in the crossfire of flawed AI detection tools, it is essential that educational institutions consider implementing robust review processes alongside AI detection solutions. By combining human expertise with AI advancements, educators can minimize wrongful accusations while still upholding academic integrity standards.

If an assignment comes back as 20% AI-generated, it doesn't mean 20% of it was written with AI. 20% AI-generated means there is a 20% chance the article you're looking at has been written with AI. The percentages can be a bit deceptive so it's very important to learn how AI detection works.

Testing TurnItIn's AI Detection Software: A Mixed Bag of Results

As educational institutions rely more heavily on AI detection tools like Turnitin's software, it becomes increasingly vital to evaluate their efficacy and limitations through real-world testing. This analysis can provide valuable insights into the practical performance of such systems, shedding light on potential shortcomings and areas for improvement.

In one particular case highlighted by the Washington Post columnist, five high school students volunteered to help test Turnitin’s AI detector by creating 16 samples of essays. These essays comprised a mix of original student work, AI-fabricated content from ChatGPT, and pieces featuring mixed-source writing from both human and AI sources.

By running these samples through Turnitin’s system, we were able to see how the software judges a small sample of student work.

The test results revealed certain limitations in Turnitin's ability to accurately identify AI-generated writing:

  1. Accurate identification of only six out of 16 samples - In this experiment, Turnitin correctly identified less than half (37.5%) of the total submissions.
  2. Partial credit for seven samples with mixed accuracy - While the system was directionally correct in some instances, it failed to fully identify or distinguish between human-written sentences and those generated by ChatGPT.
  3. Failure on three samples, including a false positive - Perhaps most concerning was the instance where the system flagged Goetz’s original essay as being partly generated by ChatGPT when it was entirely her own work.

TurnItIn Claims 98% Detection Accuracy and Less than 1% False Positives

Despite these notable discrepancies found during testing, TurnItIn claims that its detector is 98 percent accurate overall based on its internal assessments. The company also states that situations like Goetz's case—false positives—occur less than 1 percent of the time. 

I'm not quite sure how you can claim almost 100% detection accuracy on something that isn't actually provable.

Given the practical implications for students and educators, even a small percentage of false positives could have severe consequences. 1% of a million students getting falsely flagged is still massive, especially at the university level.

Turnitin's software must be monitored to ensure they provide accurate and fair evaluations while minimizing potential harm to innocent students caught in their net. This feature still needs more tweaking and needs to clearly educate teachers and professors on what the percentages and detection rates stand for.

How Reliable is TurnItIn AI Detection?

The detection tool is fairly reliable. This doesn't mean accurate, it just means you'll get around the same score when testing very similar content.

How Accurate is TurnItIn AI Detection?

It may be reliable (you'll get the same score testing similar content), but you cannot claim it is accurate. You cannot prove anything is written with ChatGPT.

You are really just looking at words on a screen.

You may get some insight, especially if sentences match patterns easily replicable by AI/ChatGPT. But if TurnItIn accuses a student of using AI and actually fails classes because of it – there will undoubtedly be a ton of lawsuits that pop up over the next year.

They claim they might flag a human-written document as AI-written for one out of every 100 fully-human written documents. This is not good enough in my opinion.

A large college lecture hall has at least 300 students. You're telling me 3 of them are going to get written up for using ChatGPT just because a detector told the professor so?

Does TurnItIn Detect AI Writing?

The short answer is yes, although not all teachers are actually going to use the feature. The detection tool provides reports to educators & integrates directly with popular learning management solutions like Canvas, Blackboard, and Moodle.

Teachers are able to check for AI writing via TurnItIn's Similarity report (what they also use to determine if a student is plagiarizing)

The Dilemma Facing Educators

As AI-generated cheating becomes a rising concern in academic settings, educators find themselves grappling with the challenge of maintaining integrity without relying solely on potentially faulty AI detection tools.

Teachers must strike a delicate balance between discouraging dishonest practices and ensuring fairness to all students. They need to find ways to assess student work accurately while minimizing the risk of false positives that could damage innocent students' academic records or emotional well-being.

It can be helpful to draw parallels between the adoption of AI writing technologies and the widespread use of calculators within academia. Both tools serve as valuable aids for learning when used responsibly, but they also present opportunities for misuse or over reliance by students seeking shortcuts instead of truly understanding concepts.

Recognizing potential pitfalls in using Turnitin's technology, 2 percent of its customers have chosen not to include the “Generated by ChatGPT” score alongside other feedback elements in their report summaries.

A significant majority of UK universities, according to UCISA, are cautious about embracing new technology without adequate understanding or safeguards in place. This leads them to avoid displaying potentially misleading information derived from AI detections when assessing student work.

This act highlights that while using technology advancements can benefit educators, finding a balance that ensures accuracy and fairness remains critical for protecting innocent students who might be caught up in flawed systems.

Possible Solutions and the Future of AI in Education

There is no easy solution. We're going to hit the point in the next few months (or year at the latest) where it's pretty much impossible to decipher between human and AI written content.

By creating partnerships between these institution tools and schools, both parties must work together to refine detection tools, minimize false positives, and develop solutions that seamlessly integrate with academic environments. I don't think it's going to be easy, but it's a necessary ethical step.

Schools could consider revising assignment formats, promoting a culture of academic integrity, or implementing honor codes as strategies for discouraging dishonest behavior without the over reliance on AI detection software.

Educators should also recognize the potential benefits AI tools offer in enhancing teaching and learning experiences while also addressing the challenges they bring concerning academic integrity. Unfortunately, it seems like many are stuck in their ways.

Final Thoughts

I think we're very close to an educational firefight. If this trend continues and detection is promised but inadequately deployed, there will be tons of lawsuits across the world from parents unhappy with unfair punishments.

Of course there will be students getting caught who wrongfully abuse AI to submit their school work. But punishing a single student who didn't cheat is absolutely terrible.

This isn't plagiarism, we don't really have proof. Patterns and assumptions aren't going to be enough to face the wave of what's about to come. It's nearing the end of the spring term now but when fall gets here, we're in for a wild ride...

Want to Learn Even More?

If you enjoyed this article, subscribe to our free newsletter where we share tips & tricks on how to use tech & AI to grow and optimize your business, career, and life.


Written by Justin Gluska

Justin is the founder of Gold Penguin, a business technology blog that helps people start, grow, and scale their business using AI. The world is changing and he believes it's best to make use of the new technology that is starting to change the world. If it can help you make more money or save you time, he'll write about it!

Subscribe
Notify of
guest

10 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments