With OpenAI’s recent announcement of the new ChatGPT interface, I knew exactly what I wanted to try first. Unlike Midjourney, DALL-E couldn’t really accept image prompts in the past. It was strictly text to image.
When I realized it's now possible, I had so many ideas run through my head. What can you use image -> text -> image to do?
So, I opted to start simple: Creating virtual avatars of myself.
In this article, I'll guide you through the simple steps to create your own custom avatar using ChatGPT's language understanding and DALL-E’s creativity, as well as provide different examples to show you how good it actually is.
Just a quick warning: You’re going to see a lot of my face in this article. So, definitely stay tuned for that.
How To Make Custom Avatars with ChatGPT & DALL-E 3
It’s quite easy to make custom avatars using ChatGPT and, as promised, this will only take two minutes.
First, you have to enable GPT-4 by pressing the model drawer on the top left side of the screen.
Next, press the paperclip logo on the prompt bar to upload your reference image.
And, finally, simply say “Create an avatar of me.” You can add whatever you want, as long as the "avatar" keyword is there.
Here’s the final product compared to my original image:
I know what you’re thinking. It doesn’t look quite like me, right? So, why does that happen?
ChatGPT does not directly use your image as a reference for DALL-E 3. Instead, there’s a middleman between the two: GPT-4V, which takes your original image, turns it into text, and then generates a prompt based on the text. In fact, you can even see what GPT-4V thinks of my original photo when I inspect the avatar:
What GPT-4V does is pick the most defining features of the original image and uses that to create the avatar. For me, it was my black hair, white shirt, and pleasant expression. And, sure enough, it resulted in an inaccurate depiction of me.
So, could I make this better?
I also tried creating a better avatar before by providing more context to GPT-4 such as my ethnicity, better hair description, and other features that weren’t prominent in the reference. Here’s what it looks like:
Another thing I did was explore DALL-E’s creativity by asking it to create different variations of my avatar using different styles like a more realistic 2D illustration, a 3D render, doodle, and more.
7 Other Examples Using Famous People
To better showcase how good DALL-E 3 is at creating avatars, here are seven avatars I made using ChatGPT of famous people from different ethnicities, each characterized by unique facial features.
Since GPT-4V extracts the defining features of a person, this method works a lot better for people with unique characteristics. Out of all these people, I’d say that — if I’m being really generous — only 3 out of 7 are recognizable with these avatars.
It’s like a game of telephone, where I give ChatGPT an image, GPT-4V passes it on, until it reaches DALL-E 3. It’s only natural that there’s some elements that are lost in the mix.
So, if you’re looking for an accurate avatar creator, I suggest looking online for image-to-image editors instead. This can work, but it really depends on how you look.
A Quick Comparison Against Midjourney
I wanted to see if DALL-E 3 was actually better than Midjourney at creating avatars since it actually enables image-to-image tweaking. So, I used my image earlier and here’s a comparison of them.
And yep... I’ll stick with DALL-E.
Not only does Midjourney’s output look nothing like me, it’s also too stylized for my liking. Strange.
Would I Recommend It?
No would be too harsh — so, I’d say, not yet.
As long as ChatGPT can’t accept images and directly use them as input for DALL-E 3, this wouldn’t work consistently. If you’re looking for avatars that you’ll actually use, like I said earlier, it’s better to invest in image-to-image editors online.
That said, I do believe that this provides some good insight into how the new ChatGPT environment works. In my short experience, I found that this new interface is a lot more streamlined and allowed me to finish my tasks more efficiently.
As for performance, well, I’ll be honest and say that I’ve had many retries before I got what I needed and not an error — but that’s to be expected due to the volume of people trying it out.