25 Incredible Examples of What ChatGPT's New Vision Feature Is Capable Of
Multimodal inputs are finally here with ChatGPT. We just got our hands on GPT-4V (vision access) and with each example I'm more and more blown away. Here's a ton of awesome things you could have it do.

John Angelo Yap
Updated October 17, 2023

A close up of an avatar-esque humans right side of the face and eye in green space looking background made with Midjourney
Reading Time: 7 minutes
When GPT-4 was released months ago, one of its new flagship features was the ability to accept multimodal prompts. However, months have passed and many still didn’t have access to this incredible feature — us included.
But it all changed with the announcement of OpenAI's GPT-4V in September 2023. Many rushed to ChatGPT to give it a try, only to find themselves disappointed as it’s still on a gradual rollout.
We’ve just been given access to GPT-4V and I’ve been playing around with it. It's incredible. I'd let words describe it but I'm just going to let the examples do the talking.
Here are some of the coolest things ChatGPT’s new Vision mode can help with.
Identify Objects
Let’s start simple: identification. With multi-modal capacity, ChatGPT can now easily identify objects, as long as they exist within its knowledge base.

You can even identify multiple objects from an image with GPT-4V! For instance:


Transcribe Text
Having trouble transcribing text? ChatGPT can now help you with that. Simply upload an image of your text and wait for GPT to stop generating. You should get a transcription in no time.
I do have to mention that this isn’t perfect…yet. The results I received were mostly correct, but Vision did change some small words like “It” to “If.”


Translate Text
The GPT model is trained on more than 100 different languages. So, when you’re in a bind and you need to translate text from one language to another, try Vision. It can provide a good translation of your image, regardless of its origin and alphabet.

Get Directions
Chances are, you wouldn’t use ChatGPT for this. However, I wanted to know if ChatGPT can identify your location from an image and provide accurate directions to a specific destination. For this, I picked a landmark near me as an input and asked ChatGPT how I can get to my university using the input as my origin.
I’m really not surprised at how well GPT-4 Vision answered. It’s both amazing and scary how accurate these AI models are becoming.

Extract Data From An Image
Vision can also extract relevant information and infer data from an image. Why do advanced analysis by yourself when ChatGPT can do the legwork for you? AI truly is the future of research, and we’re now seeing bits and pieces of what’s to come.


Replicate a Website
ChatGPT can also take an image of a website as an input and recreate it as best as it can. In my experience, it does a good enough job, especially considering that it can’t access your files and fonts. But it still has a hard time perfectly replicating websites.


Create Web Apps
ChatGPT can do more than replicate — it can create. From simple apps like calculators to more complex ones like iOS dictionary applications, it can do them all. The best thing? ChatGPT with Vision can create complete apps from illustrations, even the bad ones like the one I made here:


Gain Design Insights
Torn between several designs? Let ChatGPT make the decision for you. This highlights the next-level nuance of GPT-4. After all, it takes a machine to analyze, but it takes a human to judge creativity. However, that doesn’t seem to be the case anymore.


Explain Advanced Concepts
Do you ever find yourself staring at a whiteboard full of concepts you can’t understand? You can now take a picture of it and have ChatGPT explain it to you in simpler terms.



Explain Diagrams
GPT-4 Vision can do more than interpret lessons — it can also interpret system diagrams. This can help you gain insights into a piece of software, allow you to recreate parts of a different system, and implement them into your own code.


Explain An Image's Context
ChatGPT can also interpret images that require a lot more nuance and real-time knowledge. Some examples of this include editorial cartoons and puzzles.


Explain Medical Laboratory Results
It takes a bright mind to be a doctor, but ChatGPT can now perform some aspects of medicine accurately. Of course, you can’t replace your doctor or surgeon with an AI, but you can at least use it to interpret lab results.


Perform Medical Evaluations
Apart from lab results, you can also use ChatGPT to perform medical diagnosis. It’s not always right but this speaks volume to what AI can do in the future for medicine.

Solve Complex Mathematics Problems
ChatGPT has been disrupting the education industry for a while now, and it’s bound to be a bigger problem in the future. With advanced GPT-4 Vision, students can now directly input a complex mathematics problem into ChatGPT and have it solved in mere seconds.


Answer Questions From A Non-English Language
It also doesn’t matter which language you choose. ChatGPT can translate a question from any language and answer it with precision.


Detect AI Images
What better AI detector than an AI? GPT-4 Vision can use its advanced logic to determine whether or not an image comes from a human or not. For example, here’s a side-by-side comparison of two images: one from a person (left) and another from AI (right). ChatGPT was successfully sussed out which one was AI-generated.

Bypass Captcha
Captchas were made to block bot activity — but it didn’t account for the arrival of AI. GPT-4 Vision can answer them with a varying level of success. It’s not always correct, but it’s accurate enough that captchas should find more complex ways of filtering bots from humans.

Generate a Grocery List
Having trouble keeping your grocery lists? You can upload last month’s grocery to ChatGPT and let it create one for you.


Create Recipes
Say goodbye to secret recipes. With the power of replicating complex recipes just from a photo, ChatGPT can be the rat in your chef’s hat.


Explain Jokes
Nobody likes that guy who explains jokes, except if it’s ChatGPT. Sure, it takes the fun out of the jokes, but it does help us evaluate how good GPT-4 is at understanding context and real-world nuances like sarcasm and humor.


Find Waldo
The age old question: “Where’s Waldo?” It’s really remarkable that these images stood the test of time. Now, something that kept kids entertained for hours can be solved by ChatGPT in mere seconds.

Play GeoGuessr
GeoGuessr has been my hobby for the past month. It drops you off at a random place in Google Maps and you have to figure out where you are. If ChatGPT was playing this game, it’d get a perfect score all the time thanks to Vision.

With GPT-4’s evolved reasoning, ChatGPT can solve complex puzzles with ease. Not only that, it can also provide the reason for its answer and its line of reasoning. Let’s take this famous brain teaser for example:



Solve Sudoku Puzzles
Stuck on a sudoku puzzle you can’t solve? ChatGPT can complete it for you. Of course, you wouldn’t get the satisfaction since you cheated — but hey, at least you’re witness to Vision’s reasoning and computing skills.


Help The Visually Impaired
Did you know that ChatGPT isn’t the first home of GPT-4 Vision? That honor belongs to a small mobile app called “Be My Eyes.” This software helps visually impaired people to interact more with their surroundings by providing a real-time description of what their phone cameras can see.

Wrapping Up
And there you have it. 25 amazing use cases of GPT-4 Vision. Every time a new version of GPT releases or new features roll out, I find myself both frightened and excited about the future.
But let’s focus on the present. The release of Vision was quieter than DALL-E 3 but, to me, is even more significant. We’re only seeing a fraction of what it can do.
In the future, it can be used to develop innovative applications, diagnose diseases, and reverse-engineer complex products. We're in the early days. Don't forget that. This is the start....
Want to Learn Even More?
If you enjoyed this article, subscribe to our free newsletter where we share tips & tricks on how to use tech & AI to grow and optimize your business, career, and life.