25 Incredible Examples of What ChatGPT's New Vision Feature Is Capable Of

Multimodal inputs are finally here with ChatGPT. We just got our hands on GPT-4V (vision access) and with each example I'm more and more blown away. Here's a ton of awesome things you could have it do.
Updated October 17, 2023
A close up of an avatar-esque humans right side of the face and eye in green space looking background made with Midjourney

When GPT-4 was released months ago, one of its new flagship features was the ability to accept multimodal prompts. However, months have passed and many still didn’t have access to this incredible feature — us included.

But it all changed with the announcement of OpenAI's GPT-4V in September 2023. Many rushed to ChatGPT to give it a try, only to find themselves disappointed as it’s still on a gradual rollout.

We’ve just been given access to GPT-4V and I’ve been playing around with it. It's incredible. I'd let words describe it but I'm just going to let the examples do the talking.

Here are some of the coolest things ChatGPT’s new Vision mode can help with.

Identify Objects

Let’s start simple: identification. With multi-modal capacity, ChatGPT can now easily identify objects, as long as they exist within its knowledge base.

Identify Objects with Vision

You can even identify multiple objects from an image with GPT-4V! For instance:

Identify Objects with Vision 2
Identify Objects with Vision 3

Transcribe Text

Having trouble transcribing text? ChatGPT can now help you with that. Simply upload an image of your text and wait for GPT to stop generating. You should get a transcription in no time.

I do have to mention that this isn’t perfect…yet. The results I received were mostly correct, but Vision did change some small words like “It” to “If.”

Transcribing Images to Text with Vision

Translate Text

The GPT model is trained on more than 100 different languages. So, when you’re in a bind and you need to translate text from one language to another, try Vision. It can provide a good translation of your image, regardless of its origin and alphabet.

Translating with Vision

Get Directions

Chances are, you wouldn’t use ChatGPT for this. However, I wanted to know if ChatGPT can identify your location from an image and provide accurate directions to a specific destination. For this, I picked a landmark near me as an input and asked ChatGPT how I can get to my university using the input as my origin.

I’m really not surprised at how well GPT-4 Vision answered. It’s both amazing and scary how accurate these AI models are becoming.

Getting Directions with Vision

Extract Data From An Image

Vision can also extract relevant information and infer data from an image. Why do advanced analysis by yourself when ChatGPT can do the legwork for you? AI truly is the future of research, and we’re now seeing bits and pieces of what’s to come.

Data Extraction with Vision

Replicate a Website

ChatGPT can also take an image of a website as an input and recreate it as best as it can. In my experience, it does a good enough job, especially considering that it can’t access your files and fonts. But it still has a hard time perfectly replicating websites.

Website Replication with Vision

Create Web Apps

ChatGPT can do more than replicate — it can create. From simple apps like calculators to more complex ones like iOS dictionary applications, it can do them all. The best thing? ChatGPT with Vision can create complete apps from illustrations, even the bad ones like the one I made here:

Creating Web Apps with Vision

Gain Design Insights

Torn between several designs? Let ChatGPT make the decision for you. This highlights the next-level nuance of GPT-4. After all, it takes a machine to analyze, but it takes a human to judge creativity. However, that doesn’t seem to be the case anymore.

Design Insights with Vision

Explain Advanced Concepts

Do you ever find yourself staring at a whiteboard full of concepts you can’t understand? You can now take a picture of it and have ChatGPT explain it to you in simpler terms.

Advanced Concepts with Vision

Explain Diagrams

GPT-4 Vision can do more than interpret lessons — it can also interpret system diagrams. This can help you gain insights into a piece of software, allow you to recreate parts of a different system, and implement them into your own code.

Diagrams with Vision

Explain An Image's Context

ChatGPT can also interpret images that require a lot more nuance and real-time knowledge. Some examples of this include editorial cartoons and puzzles.

Context with Vision

Explain Medical Laboratory Results

It takes a bright mind to be a doctor, but ChatGPT can now perform some aspects of medicine accurately. Of course, you can’t replace your doctor or surgeon with an AI, but you can at least use it to interpret lab results.

Lab Results with Vision

Perform Medical Evaluations

Apart from lab results, you can also use ChatGPT to perform medical diagnosis. It’s not always right but this speaks volume to what AI can do in the future for medicine.

Medical Evaluation with Vision

Solve Complex Mathematics Problems

ChatGPT has been disrupting the education industry for a while now, and it’s bound to be a bigger problem in the future. With advanced GPT-4 Vision, students can now directly input a complex mathematics problem into ChatGPT and have it solved in mere seconds.

Solving with Vision

Answer Questions From A Non-English Language

It also doesn’t matter which language you choose. ChatGPT can translate a question from any language and answer it with precision.

Answering Questions with Vision

Detect AI Images

What better AI detector than an AI? GPT-4 Vision can use its advanced logic to determine whether or not an image comes from a human or not. For example, here’s a side-by-side comparison of two images: one from a person (left) and another from AI (right). ChatGPT was successfully sussed out which one was AI-generated.

Vision-Powered AI Detection

Bypass Captcha

Captchas were made to block bot activity — but it didn’t account for the arrival of AI. GPT-4 Vision can answer them with a varying level of success. It’s not always correct, but it’s accurate enough that captchas should find more complex ways of filtering bots from humans.

Captchas with Vision

Generate a Grocery List

Having trouble keeping your grocery lists? You can upload last month’s grocery to ChatGPT and let it create one for you.

Grocery Lists with Vision

Create Recipes

Say goodbye to secret recipes. With the power of replicating complex recipes just from a photo, ChatGPT can be the rat in your chef’s hat.

Recipes with Vision

Explain Jokes

Nobody likes that guy who explains jokes, except if it’s ChatGPT. Sure, it takes the fun out of the jokes, but it does help us evaluate how good GPT-4 is at understanding context and real-world nuances like sarcasm and humor.

Jokes with Vision

Find Waldo

The age old question: “Where’s Waldo?” It’s really remarkable that these images stood the test of time. Now, something that kept kids entertained for hours can be solved by ChatGPT in mere seconds.

Finding Waldo with Vision

Play GeoGuessr

GeoGuessr has been my hobby for the past month. It drops you off at a random place in Google Maps and you have to figure out where you are. If ChatGPT was playing this game, it’d get a perfect score all the time thanks to Vision.

Finding Places with Vision

Solve Brain Teasers

With GPT-4’s evolved reasoning, ChatGPT can solve complex puzzles with ease. Not only that, it can also provide the reason for its answer and its line of reasoning. Let’s take this famous brain teaser for example:

Brain Teasers with Vision

Solve Sudoku Puzzles

Stuck on a sudoku puzzle you can’t solve? ChatGPT can complete it for you. Of course, you wouldn’t get the satisfaction since you cheated — but hey, at least you’re witness to Vision’s reasoning and computing skills.

Sudokus with Vision

Help The Visually Impaired

Did you know that ChatGPT isn’t the first home of GPT-4 Vision? That honor belongs to a small mobile app called “Be My Eyes.” This software helps visually impaired people to interact more with their surroundings by providing a real-time description of what their phone cameras can see. 

Helping Visually Impaired Folks with Vision

Wrapping Up

And there you have it. 25 amazing use cases of GPT-4 Vision. Every time a new version of GPT releases or new features roll out, I find myself both frightened and excited about the future

But let’s focus on the present. The release of Vision was quieter than DALL-E 3 but, to me, is even more significant. We’re only seeing a fraction of what it can do.

In the future, it can be used to develop innovative applications, diagnose diseases, and reverse-engineer complex products. We're in the early days. Don't forget that. This is the start....

Written by John Angelo Yap
Hi, I'm Angelo. I'm currently an undergraduate student studying Software Engineering. Now, you might be wondering, what is a computer science student doing writing for Gold Penguin? I took up studying computer science because it was practical and because I was good at it. But, if I had the chance, I'd be writing for a career. Building worlds and adjectivizing nouns for no other reason other than they sound good. And that's why I'm here.
