How To Check If Something Was Written with AI

With AI taking the world by storm, many have started to question the authenticity of what they're reading online. Here's some tips and tricks to help determine if the content you're reading was written by a human or a quirky robot
Written by Justin on November 17, 2022 in
Updated: January 31, 2023 | Reading Time: 14 minutes

The internet is rapidly advancing. With the rise of AI content creation platforms and tools like ChatGPT, we're starting to see a lot more generative content – and it's getting harder and harder to tell which pieces are actually written by humans. With recent Google algorithm updates designed to counteract AI-spammy content, high-quality content is soon going to dominate search engines. But if you're not Google, how can you tell? Some articles slip through the cracks and are able to publish robotic misinformation throughout the web. Not all AI content is bad, and not all bad content is written by AI, but there's definitely a correlation between AI-produced fluff and expertly-written human content. In a few years, I'm not so sure we'll be able to tell the difference, but for now – sometimes it can be pretty obvious.

I've been messing around with these text and image editors for several months now, and I've found that there are a few tell-tale signs of AI-generated content. Let's go over some technical and non-technical things to look for.

The 2023 Artificial Intelligence Boom

So much buzz about AI these days. You can't go past a few tweets on Twitter before running into an announcement about the next magical AI platform that can customize your entire wardrobe, redesign your bedroom, or generate some realistic avatars. These tools are nothing short of amazing & are only the start of something bigger.

We're starting to see artificial intelligence tools transition from a being fun gimmicks to show friends to powerful & complex tools that are literally changing the way we live and work. With that comes a whole new set of challenges, one of which is the rise of AI-generated, spammy content.

If you're a writer, you might be thinking "Great, another thing I have to compete with." And you're not wrong. AI content is getting better and better and it's only going to continue to improve. As an avid copywriter, I've started to see the uptick in AI-generated content and figured I'd start investing ways to sort through it. Although OpenAI is working on a watermarking system to help distinguish natural vs AI content, there is still no official method available.

It's not just blog posts – AI is now being used to generate everything from school research papers, e-commerce product descriptions and even chunks of code. And as AI gets better and better at imitating human writing, it's getting harder and harder to tell the difference between what's been written by a machine and what's written by a human.

But what's the point? If an AI can write an article that looks and feels just like one that's been written by a human, why does it matter? Well for starters, it's getting harder and harder to trust the things we read. In a world where anyone can say anything, it's important to be able to spot the difference between fact and fiction. With the rise of AI-generated content, we're at risk of losing levels of authenticity previously represented with some of the most popular online websites, blogs, and scholarly produced content. So how can you ensure that the content you're reading is the real deal?

How To Tell If An Article Was Written With AI

To discourage spammy, low-quality content, Google has started penalizing sites that publish generative content. Beyond the realm of Google, academics & other professionals have seen a huge surge in AI-generated content. So, whether you've come across content in an academic, professional, or casual setting, you might want a way to validate if certain content was written by another human. But how can you tell?

After months of manually analyzing content, I still find myself getting stumped depending on the complexity of the AI used. While I don't think most AI tools can write past an undergraduate college level, I'd like to be able to sort between real and generated content. Luckily, there are a few tools & manual methods you can use to determine if a piece of text was written by an AI.

Here are my personal best tips and tools to spot AI content in 2023:

Method 1: OpenAI Classifier (made by OpenAI)

On January 31st, OpenAI released their very own language classifier to determine if something was written with AI (especially ChatGPT). Although still not definitive, the company claims you can use their tool to provide insight into determining if something was written with AI. Even though the tool was made by the same company as the investors of ChatGPT, OpenAI claims only 26% of AI-written samples they tested were identified properly as AI.

You could use the classifier here. It requires a minimum of 1000 characters & is important to note that it does a lot better with larger chunks of text. Also, text that is very predictable cannot be reliably identified. This includes things like songs or math equations, since each answer will always be the same. With the release of the classifier came some released guidelines for educators trying to tackle & digest all the recent news about ChatGPT.

To use the classifier, simply paste an article of text into the input and hit "submit." If you click on the example buttons it will autofill the samples into the text field.

OpenAI AI content classifier input box

So... how well does the classifier work? I threw in an article that I wrote a couple weeks ago and got the result "unlikely to be AI-generated" (this is true). After this I tested some ChatGPT writing the classifier resulted in "possibly AI-generated." Seems good so far, right?

Then I tested two more outputs from ChatGPT and got "unable to tell" and "unlikely written by AI." So it really seems like a coin toss.

I suggested this detection method first since it was released by OpenAI. It will hopefully get a lot better over the next few months. I have noticed that if a result comes back as probably/ most likely AI, it generally has been produced with AI. The tool just doesn't always do a great job at catching it to begin with. I have hope, but based on how uncertain it seems so far, I'm going to stick with Originality since I've honestly just had more consistent results with it.

Method 2: Originality.ai (professional writing)

If you're looking for an industry-leading content checker that will determine if writing is both plagiarized or written with AI, check out Originality. This tool uses a combination of GPT-3 and other natural language models (all trained on a massive amount of data) to determine if content seems predictable. Originality seems to be the only non-official AI content detection tool accurate for both ChatGPT & GPT 3.5 (the most advanced generative language tools). With pricing starting at 0.01 per 100 words, it's pretty reasonable if you're looking for a more professional, industry-level content detection checker. I've had good luck with it and will continue to use it when checking production-level copy.

enter content in Originality.ai content AI and plagiarism checker

To use Originality, paste content into the checker and scan it. As an example, I went back into originality about a week after I initially published this article & entered the paragraph above to see the results (written by me and without using any AI) and these were the results:

AI and plagiarism detection score using originality AI

Impressively enough, it was able to find the exact blog I "copied" the content from and marked the text as having a low likelihood of being written with AI. I was honestly impressed at how quickly it was able to find this article. Combining AI detection with a plagiarism checker is a really solid way to determine the origins of some content. For anyone looking to automate and easily test writing, Originality has been my go-to tool. Unlike GLTR, Originality will produce a predictability "score" as seen above, so you don't have to become Sherlock Holmes just to figure out if some basic writing was made with a machine. Remember, nothing is truly definitive. The AI detection score represents the chance the selected writing is AI, not the percentage of the article that is AI. In our previous example, there's a 4 in 5 chance that it was written by a human. Make sense?

Acceptable Detection Scores

According to the CEO of Originality, if content is consistently ranking under 10%, it is almost certainly in the clear! Only when content rises close to 40 or 50% AI is when you should begin to get suspicious about its origins. The more content you scan by the same writer should give you a better idea to decide if certain writing is legitimate. The longer sample you input also increases the chance of detection being more accurate (larger sample sizes = more reliable detection). Publishers are using a high human score as a good measure of high Originality even when they are confident a human created it. Just be careful as some results end up with false positives and false negatives. It is far better to review a series of articles and make a call on a writer/service compared to passing judgement on a single article or text snippet.

Checking Entire Sites

If there is a pattern of consistently high or low detection scores, that should be your largest indicator of AI-written content. One single article is not enough proof to determine if an entire website (or multiple documents of content) have been written with AI assistance. It's also important to take these detection tools with a grain of salt (I can't stress this enough!). The more articles from one source you check will result in a greater statistical sample, but so many factors go into detection beyond what a website can do. Some of these factors includes syntax, repetition, and lack of complexity which we'll get into below. Originality recently introduced a tool to check entire websites at once.

Originality ai showing entire website AI detection for goldpenguin.org

Method 3: Giant Language Model Test Room (casual writing)

Three researchers from the MIT-IBM Watson AI lab and Harvard NLP group created a great free tool to help detect machine-generated text content called the Giant Language Model Test Room (or GLTR). GLTR is currently the easiest way to predict if casual portions of text have been written with AI. To use GLTR, simply copy and paste a piece of text into the input box and hit "analyze."

GLTR prompt box with sample text example

The tool will give you a prediction of how likely it is that the text was generated by an AI. If you want to learn more about the technical details behind GLTR, you can read more on their official website. Each word is analyzed by how likely each word would be the predicted word given the context to the left. If the word is within the top 10 predicted words, the background is colored green, for the top 100 it will shade yellow, the top 1000 red, otherwise violet. If you see content filled with a lot of green, it's likely generated by an AI. This tool is built with GPT-2, meaning it won't be as extensively trained as if it were written with GPT-3 content.

Here's a side-by-side comparison of an excerpt of an article written by an AI and one written by a human. You can see that the AI-generated text is much more green than the human-written text.

Again, not foolproof, but a very good indicator. I'd say GLTR is the best tool we have to determine AI content that is currently available to the public. The main issue is its not declarative (take that as you wish). You won't get a percentage or number saying "yeah this is probably AI." By simply pasting a group of text, you can get a good idea of how likely it was written by an AI, but the ending call should be based upon your own judgement. Want to see it used compared to Jasper, Hyperwrite, and Lex? Check out this video we made:

Method 4: Technical Signs

The next way to tell if a piece of content has been generated by an AI is to look at the technical aspects of the writing. This isn't as concrete & may seem obvious, but if you're having trouble with the previous tools or just want to further break down writing you've come across, you should look deep at the content. Here are a few things to look for:

1. Length of extensive sentences: AI-generated content often includes very short sentences. This is because the AI is trying to mimic human writing, but it hasn't quite mastered extensive sentence complexity as of yet. This is painfully clear if you're reading a technical blog about something that requires code or step-by-step instructions. We're not at the point where AI can pass that Turing test just yet. If you've tested content using GLTR or Originality, and if content is creative & unique, I'd say it's in the clear. It's the technical content that comes off as confidently fishy that you need to look further into.

2. Repetition of words and phrases: Another way to spot AI-generated content is by looking for repetition of words and phrases. This is the result of the AI trying to fill up space with relevant keywords (aka – it doesn't really know what it's talking about). So, if you're reading an article and it feels like the same word is being used over and over again, there's a higher chance it was written by an AI. Some of the spammy AI-generation SEO tools love keyword-stuffing articles. Keyword stuffing is when you repeat a word or phrase so many times that it sounds unnatural. Some articles have their target keyword in what feels like every other sentence. Once you spot it, you won't be able to focus on the article. It's also extremely off-putting for readers.

3. Lack of analysis: A third way to tell if an article was written by an AI is if it lacks complex analysis. This is because machines are good at collecting data, but they're not so good at turning it into something meaningful. If you're reading an article and it feels like it's just a list of facts with no real insight or analysis, there's an even higher chance it was written with AI. With ChatGPT, we're nearing the point where AI is able to start to analyze writing, but I still find responses to be very "robotic." People are starting to use AI to reply to tweets but don't realize how painfully cookie-cutter their responses are! You'll notice AI generated writing is a lot better for static writing (like about history, facts, etc) compared to creative or analytical writing. The more information a topic has, the better AI can write & manipulate it.

4. Inaccurate data: This one is more common in AI-generated product descriptions, but it can also be found in blog posts and articles. Since machines are collecting data from various sources, they sometimes make mistakes. If a machine doesn't know something but is destined to give an output, it'll predict numbers based on patterns (which aren't accurate). So, if you're reading an article and you spot several discrepancies between the facts and the numbers, you can be very confident what what you just read was written using AI. If you come across spammy content, report it to Google. Save someone else the pain of having to waste their time to read something that is clearly inaccurate!

Some studies claim current GPT-3 generated content is indistinguishable from human-written content, but I haven't seen consistent long-form content written by an AI that seems to be above a collegiate level. We'll get there for sure, but we're just not there quite yet.

Method 5: Verify Sources & Author Credibility

This one might seem a bit unnecessary for a single blog, but it's still worth mentioning. If you're reading an article and the domain seems to be randomly associated with the content posted, thats your first red flag. But more importantly, you should check the sources that are being used in the article (if any). If an author is using sources from questionable websites or simply declares things without any source, it's either the author isn't doing their research or could simply be automating a bunch of AI-generated content.

Extra Method: Writer.com AI Content Detector

Although not quite sure what parameters they're using for detecting AI content, Writing.com offers a free and extremely simple AI writing detection tool. You can check text by URL or paste writing directly into their tool to run scans. I've had good success with it but struggle to find the methods in which they determine flagged content.

human-written text result in writing.com showing 100% human generated content
chatgpt-written text result in writing.com showing 22% human generated content. This was written with AI and returned an accurate result

Another One: GPT-2 Output Detector

An additional source is to use the Hugging Face Output Detector. This service is based on the GPT-2 Output Dataset released by OpenAI. The reason I've included it as an extra method is because I don't find some predictions to be very accurate. Generally you can get a decent clue based on the response, but I've pasted some fully robotic AI writing and it has told me it was 99% real. I've also pasted an advanced academic essay written by a human and claimed 99% AI. The more text in each sample increases the likelihood of prediction accuracy. But sometimes it's really a coin toss – so remember to take all of these services with a grain of salt!

a prediction of AI written text using HuggingFace Output Detector software.

Other Online Detection Methods

Beware when finding random websites that claim they'll check if your content is AI-generated. If you're looking for AI-content detection tools, ensure that they describe how they are checking content – because "ai detection" doesn't mean anything by itself!

Conclusion

It's not the easiest to tell if an article was written by an AI. The technology has only became recently available after what seemed like a sudden boom in the machine learning industry. An unsettling fact is that AI is just getting so much better each day. That said, if you're questioning whether or not an article was written by an AI, your best bet is to use a combination of GLTR, Originality, and your own judgement! Hopefully these new tools benefit the web by allowing skeptics to filter out trustworthy content across the internet. As AI becomes more sophisticated and the line between human and machine-generated content becomes increasingly blurry, it's only a matter of time until everything we reach the point where content becomes indistinguishable! But for now – don't stress. We're not there yet 😉

Join Our Newsletter!
If you enjoyed this article, subscribe to our free monthly newsletter providing you tips & tricks on how to grow your digital business using powerful online tools & AI.
Written by Justin
Justin is the founder of Gold Penguin, a web design and marketing agency that helps businesses increase their revenue using the internet. He writes about the latest software and tools that can help companies 10x their daily workflow & revenue

Comments

Subscribe
Notify of
guest

2 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Made with 💛 by Gold Penguin © 2022
magnifiercross