Gold Penguin Official Logo

How Does AI Writing Detection Work? Robot vs Reality

So many tools and websites offer AI writing detection, but what exactly is it? Can you reliably and accurately predict artificially produced content?
Updated September 6, 2023

It's impossible to scroll more than a few seconds on Twitter without encountering someone mentioning how artificial intelligence is taking over the world & that everyone will probably be out of a job soon. As great of a sci-fi movie as that would become, I don't really think we're at the point where AI can fully replace employees. Now can employees leverage AI to optimize their workflow & climb the work ladder faster? Absolutely.

From the release of ChatGPT to more nuanced writing tools like Jasper and Copy, AI is quickly becoming one of the most powerful tools in a digital employee's arsenal. Though users raise the question: can anything written with AI be detectable?

The honest answer is not really. AI detection is based on predicting patterns, and as large-language models continue to evolve, it's getting harder to separate robot words from reality. The increasing sophistication will open up more opportunities for automated writing & text analysis.

Why Is AI Text Detection Even Important?

AI is increasingly being used to create content in fields like journalism, digital marketing, academia, and even law. It's not that AI-produced content is necessarily bad, but whatever your reason may be, it can sometimes be advantageous to know when you're dealing with a computer-generated article or a handcrafted piece of content.

What's crazy is that 65.8% of people believe AI content to be equal to or better than human-written content according to a survey done by AuthorityHacker.

Sometimes academic researchers can't even detect the difference between the two. The safety implications that could result from a world where ChatGPT-produced thesis papers are published could certainly result in some dangerous things. Identifying AI text with a ChatGPT detector can help a person evaluate the efficacy of content up to a certain human standard.

On the bright side, knowing which pieces of copy have been written with AI can even help certain businesses decide if it’s worthwhile to invest in helpful tools like Jasper. Certain tasks simply just don't require much human creativity. If you run into a good piece of marketing copy written with AI, you may benefit from using the same AI tools to recreate it yourself.

What Does AI Writing Detection Mean?

At its core, AI writing detection relies on reverse engineering language patterns to determine predictive text. This means that the machine breaks down a piece of text & then uses algorithms to detect patterns within those words. If a pattern is easier to identify – it's more in tune with what an AI would write, increasing the odds it was written by AI.

The first letter in AI is actually really important – artificial. That's what separates humans from machines. AI writing detection is further based on noticing the differences in how words are arranged and used. Machines write text based on the billions of data & patterns it was trained on, while natural human writing utilizes an aspect of creativity that can't easily be reproduced by one of these bots.

AI detection is also about context. As persuasive as it may seem, machines don't actually understand the meaning of words, but they can identify patterns that are commonly used really well. They can also pick up on repeated phrases & words, which is often a tell-tale sign of automation or copy-pasting.

Is AI Writing Detection Accurate?

No, not really. AI Writing detection tools are really just assistants giving insight into where writing came from. If you suspect writing as being AI-written and run it through a detector that says 100% AI, it probably was written by an AI tool like ChatGPT.

Take all the results you see with a grain of salt, as they really are just predictions.

Their have even been tools to help bypass AI detection that have flooded the market. Tools like Undetectable.AI are great ways of bypassing these detection filters, so you really have to be careful.

How Do AI Detectors Work

So now that we've gone over why detection is important and a little bit about what it means – we'll explain what goes on behind the scenes.

Tools that "predict" AI content are largely based on analyzing the context to the left of the following word.

Imagine the sentence "The best part of my day is when I wake up for ___." In this example, work is the most commonly predicted word based on the 117 million data points the GPT-2 language model was trained on.

The AI model will think back to all of its training data, then identify and analyze patterns in the context of the word set. It might know, for example, that the word "day" is often used after the words "best" and "part". The algorithm will then calculate the likelihood of each word being the next predicted word, based on these contexts.

Based on training data, the word work had a 41% chance of occurring (the highest probability compared to other words), so it predicted it.

Example of AI prediction showing a 41% chance of the word work being predicted – tested with GLTR

An important thing to understand when working with AI-generated text is the concept of temperature. Temperature probability is a measure of the randomness of predictions. If the temperature is low, a model will probably output the most correct text, but it will be quite boring as it has a smaller degree of variation.

If temperature probability is high, the generated text will be more diverse – but includes a higher chance of the model producing grammar mistakes or straight nonsense.

Consumer-facing AI text generation tools like Jasper and ChatGPT seem to err on the side of caution. Although ChatGPT responses produce larger variations than what Jasper does, they are still fairly predictable models.

If you're using an online tool to help write content, you're working with pre-trained models (which generally set conservative temperature probabilities to reduce mass errors)

Low vs High temperature variables for AI (low has little variability, high has high variability)

So after calculating this for a single sentence, keep going with the rest of your text. If a sampled piece of text consistently selects the most predictable word throughout paragraphs, you're almost certainly working with artificially generated text. Think about it from a personal perspective – the best writers often make use of complex language and explain things in unpredictable, creative ways. Artificial writing doesn't.

As language models become more and more complex, predicting AI based on the context of words will become a lot harder. The more data in a set, the more variability in generations. But for now, you could follow this pattern to analyze large chunks of text.

It's extremely simple in concept: To what extent can an AI model predictively regenerate a given example of text?

Best Tools To Detect AI Writing

Besides mathematics, there are grammatical and syntactical ways you can help identify if something was written with AI, but you could do that just from reading.

So how do you determine what percentage chance the context of a word has? Well for starters, you could use a few online tools. We wrote a larger article on how to detect AI content, but depending on what kind of writing you're checking you could use either GLTR or Originality.

If you're looking to check a paragraph or two of personal content, GLTR is the best to use. It's free, fast, and can give you a pretty decent understanding of AI detection. It's not trained on the largest data sets and it won't give you a direct percentage, but you can see the difference between blatant AI-generated (left) and human-written text (right) fairly easily.

If you want to check academic, industry, or professional content (especially in mass) – look into Originality. Originality lets you check for AI, plagiarism, and gives you a percentage that it believes a block of text was written with AI.

AI detection score example from Originality AI

The Future of AI Writing & Content Detection

Whether you like it or not, there's really no guaranteed way of determining if something was written with AI at this point in time. After ChatGPT went viral, tons of questions and concerns were raised about how this will impact the world: education, industry, and even literature.

Although a guest researcher at OpenAI revealed that they're developing a tool for "statistically watermarking the outputs of an AI text system." Whenever a system generates text, the tool would stamp an "unnoticeable secret signal" indicating where the text came from. If other companies follow suit, we might mitigate some of the ethical dilemmas raised by this new technology.

Regardless, an unpredictable storm is on its way! Generative writing tools are only going to get more nuanced, more creative, and eventually more complex. For now, it's best to use your intuition combined with detection tools if you're skeptical. The next few years are going to be very interesting and fun to see what's in store. How long until artificial intelligence can seamlessly integrate with our society? Are we already there?

Want To Learn Even More?
If you enjoyed this article, subscribe to our free monthly newsletter
where we share tips & tricks on how to use tech & AI to grow and optimize your business, career, and life.
Written by Justin Gluska
Justin is the founder of Gold Penguin, a business technology blog providing the latest news and tools in the artificial intelligence, business, and SaaS world. If it can help you make more money or save you time, he will write about it!
Subscribe
Notify of
guest

1 Comment
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Join Our Newsletter!
If you enjoyed this article, subscribe to our free monthly newsletter where we share tips & tricks on how to use tech & AI to grow and optimize your business, career, and life.
magnifiercross