The Prediction Game: Understanding How AI Writing Detection Works

So many tools and websites offer AI writing detection, but what exactly is it? Can you reliably and accurately predict artificially produced content?
Written by Justin on January 17, 2023 in ,
Updated: February 1, 2023 | Reading Time: 6 minutes

It's impossible to scroll more than a few seconds on Twitter without encountering someone mentioning how artificial intelligence is taking over the world & that everyone will probably be out of a job soon. As great of a sci-fi movie as that would become, I don't really think we're at the point where AI can fully replace employees. Now can employees leverage AI to optimize their workflow & climb the work ladder faster? Absolutely.

From the release of ChatGPT to more nuanced writing tools like Jasper and Copy, AI is quickly becoming one of the most powerful tools in a digital employee's arsenal. Though users raise the question: can anything written with AI be detectable?

The honest answer is not really. AI detection is based on predicting patterns, and as large-language models continue to evolve, it's getting harder to separate robot words from reality. The increasing sophistication will open up more opportunities for automated writing & text analysis.

Why Is AI Text Detection Even Important?

AI is increasingly being used to create content in fields like journalism, digital marketing, academia, and even law. It's not that AI-produced content is necessarily bad, but whatever your reason may be, it can sometimes be advantageous to know when you're dealing with a computer-generated article or a handcrafted piece of content.

Sometimes academic researchers can't even detect the difference between the two. The safety implications that could result from a world where ChatGPT-produced thesis papers are published could certainly result in some dangerous things. Identifying AI text can help a person evaluate the efficacy of content up to a certain human standard.

On the bright side, knowing which pieces of copy have been written with AI can even help certain businesses decide if it’s worthwhile to invest in helpful tools like Jasper. Certain tasks simply just don't require much human creativity. If you run into a good piece of marketing copy written with AI, you may benefit from using the same AI tools to recreate it yourself.

What Does AI Writing Detection Mean?

At its core, AI writing detection relies on reverse engineering language patterns to determine predictive text. This means that the machine breaks down a piece of text & then uses algorithms to detect patterns within those words. If a pattern is easier to identify – it's more in tune with what an AI would write, increasing the odds it was written by AI.

The first letter in AI is actually really important – artificial. That's what separates humans from machines. AI writing detection is further based on noticing the differences in how words are arranged and used. Machines write text based on the billions of data & patterns it was trained on, while natural human writing utilizes an aspect of creativity that can't easily be reproduced by one of these bots.

AI detection is also about context. As persuasive as it may seem, machines don't actually understand the meaning of words, but they can identify patterns that are commonly used really well. They can also pick up on repeated phrases & words, which is often a tell-tale sign of automation or copy-pasting.

How AI Writing Detection Works

So now that we've gone over why detection is important and a little bit about what it means – we'll explain what goes on behind the scenes.

Tools that "predict" AI content are largely based on analyzing the context to the left of the following word.

Imagine the sentence "The best part of my day is when I wake up for ___." In this example, work is the most commonly predicted word based on the 117 million data points the GPT-2 language model was trained on.

The AI model will think back to all of its training data, then identify and analyze patterns in the context of the word set. It might know, for example, that the word "day" is often used after the words "best" and "part". The algorithm will then calculate the likelihood of each word being the next predicted word, based on these contexts.

Based on training data, the word work had a 41% chance of occurring (the highest probability compared to other words), so it predicted it.

Example of AI prediction showing a 41% chance of the word work being predicted – tested with GLTR

An important thing to understand when working with AI-generated text is the concept of temperature. Temperature probability is a measure of the randomness of predictions. If the temperature is low, a model will probably output the most correct text, but it will be quite boring as it has a smaller degree of variation.

If temperature probability is high, the generated text will be more diverse – but includes a higher chance of the model producing grammar mistakes or straight nonsense.

Consumer-facing AI text generation tools like Jasper and ChatGPT seem to err on the side of caution. Although ChatGPT responses produce larger variations than what Jasper does, they are still fairly predictable models.

If you're using an online tool to help write content, you're working with pre-trained models (which generally set conservative temperature probabilities to reduce mass errors)

Low vs High temperature variables for AI (low has little variability, high has high variability)

So after calculating this for a single sentence, keep going with the rest of your text. If a sampled piece of text consistently selects the most predictable word throughout paragraphs, you're almost certainly working with artificially generated text. Think about it from a personal perspective – the best writers often make use of complex language and explain things in unpredictable, creative ways. Artificial writing doesn't.

As language models become more and more complex, predicting AI based on the context of words will become a lot harder. The more data in a set, the more variability in generations. But for now, you could follow this pattern to analyze large chunks of text.

It's extremely simple in concept: To what extent can an AI model predictively regenerate a given example of text?

Best Tools To Detect AI Writing

Besides mathematics, there are grammatical and syntactical ways you can help identify if something was written with AI, but you could do that just from reading.

So how do you determine what percentage chance the context of a word has? Well for starters, you could use a few online tools. We wrote a larger article on how to detect AI content, but depending on what kind of writing you're checking you could use either GLTR or Originality.

If you're looking to check a paragraph or two of personal content, GLTR is the best to use. It's free, fast, and can give you a pretty decent understanding of AI detection. It's not trained on the largest data sets and it won't give you a direct percentage, but you can see the difference between blatant AI-generated (left) and human-written text (right) fairly easily.

If you want to check academic, industry, or professional content (especially in mass) – look into Originality. Originality lets you check for AI, plagiarism, and gives you a percentage that it believes a block of text was written with AI.

AI detection score example from Originality AI

The Future of AI Writing & Content Detection

Whether you like it or not, there's really no guaranteed way of determining if something was written with AI at this point in time. After ChatGPT went viral, tons of questions and concerns were raised about how this will impact the world: education, industry, and even literature.

Although a guest researcher at OpenAI revealed that they're developing a tool for "statistically watermarking the outputs of an AI text system." Whenever a system generates text, the tool would stamp an "unnoticeable secret signal" indicating where the text came from. If other companies follow suit, we might mitigate some of the ethical dilemmas raised by this new technology.

Regardless, an unpredictable storm is on its way! Generative writing tools are only going to get more nuanced, more creative, and eventually more complex. For now, it's best to use your intuition combined with detection tools if you're skeptical. The next few years are going to be very interesting and fun to see what's in store. How long until artificial intelligence can seamlessly integrate with our society? Are we already there?

Join Our Newsletter!
If you enjoyed this article, subscribe to our free monthly newsletter providing you tips & tricks on how to grow your digital business using powerful online tools & AI.
Written by Justin
Justin is the founder of Gold Penguin, a web design and marketing agency that helps businesses increase their revenue using the internet. He writes about the latest software and tools that can help companies 10x their daily workflow & revenue

Comments

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
Made with 💛 by Gold Penguin © 2022
magnifiercross