Even before its release, there have been murmurs that this new model has the power to rival Midjourney both in creativity and nuance.
Finally, when it was officially announced, we got our first glimpse on the AI image generator of the future. People (myself included) immediately sprung into action and compared the new product to other existing AI image generators in the market.
But, I’ve never really thought about comparing its outputs against its previous iteration until now. There’s no question that DALL-E 3 is a significant improvement from DALL-E 2 but how much exactly? Let’s find out.
To keep things consistent, we'll have DALL-E 2 on the left and DALL-E 3 on the right.
Prompt: a realistic close-up portrait of an elderly person that radiates wisdom and character
If we’re going by pure realism, I actually prefer DALL-E 2 over DALL-E 3. The elderly man in its image has the face of someone who’s been through hell and back. Wrinkles, laugh lines, age spots — you just know this is someone who’s full of wisdom.
It’s not that DALL-E 3 did a bad job. I thought it did pretty well. However, it has a tendency to smoothen faces, which sometimes results in faces looking like wax figures.
Prompt: a snow-covered alpine landscape with a cozy cabin nestled among the pine trees, smoke rising from its chimney into the crisp winter air
This is where we see DALL-E 3’s evolution over its previous version. Instead of creating something coherent, DALL-E 2 looks like it crammed the elements in the prompt into one disjointed image.
On the other hand, DALL-E 3 generated an image that’s more true to the essence of the prompt. It’s cozy and fulfills every requirement without overcompensating.
Prompt: an image of new york city that captures the feeling of nostalgia for the 1980s, featuring iconic objects from that era.
DALL-E 2 completely missed the mark here. It just created a poor depiction of a payphone. There are no elements that scream “New York” or anything specifically about the 1980s.
I’d say that DALL-E 3 also had a hard time with this prompt. I would’ve rated it higher but I didn’t for two reasons. One, why is it in black and white when color photography was commonplace in the 1980s, and two, why is the subway above ground?
Prompt: a man-fox hybrid walking in a surreal dreamscape that combines elements of a forest, a desert, and a tropical beach
The landscape itself is fine in DALL-E 2, but it didn’t even attempt to create a man-fox hybrid. DALL-E 3 understood the context completely and even added its own twist in the final output. This is one of my favorite comparisons so far because it perfectly encapsulates how far this model has come.
I’ve also noticed that, while DALL-E 3 generally smoothens images, DALL-E 2 tends to create something that’s rough around the edges and looks like it's drawn with crayons.
Prompt: a modern two-story, eco-friendly and sustainable house with solar panels, rainwater harvesting systems, and a green roof garden
DALL-E 2’s house clearly suffered from some rendering issues. If you zoom in, you can see that it tried to create a balcony on the right side of the second floor but it turned into a weird window-balcony hybrid. The plants also block much of the first floor, which makes it hard to see what the house actually looks like.
As for DALL-E 3, now that’s a house I want to live in. It’s an exceptional render of a modern eco-friendly house. However, it does have a dream-like quality to it, which takes away from its realism.
Prompt: a 3D diorama of a fantasy forest with mythical creatures, ancient ruins, and enchanting bioluminescent plants
Once again, DALL-E 2 missed an important aspect of the prompt by making an artwork instead of a 3D diorama. DALL-E 3 was able to generate an image of a diorama containing all the context I provided.
Prompt: an oil painting of a man busking in the middle of a bustling city street at night, with people and reflections in the rain-soaked pavement
I like these two images because it shows the improvement the model has made in just a year. DALL-E 2 satisfied everything included in my prompt but it has subpar creativity. It’s poorly blended and vague. The reflections are a little bit off too.
DALL-E 3’s painting looks like the DALL-E 2’s output if it was done by a professional artist. The streets look more detailed and alive. It’s not perfect (the subject is missing a leg) but it’s an extremely good attempt.
Prompt: a pixel art scene of a calming Japanese garden with bonsai trees, a koi pond, and a tranquil bridge
These images look like a still from an old Gameboy game and its remastered version. I actually like DALL-E 2’s pixel art because of the nostalgia factor but, hands down, DALL-E 3 is far better. It’s vibrant, detailed, and consistent. The way it looks reminds me of Stardew Valley and Animal Crossing.
Prompt: a close-up portrait of a musician lost in the moment, with vibrant colors that evoke the film's rich, with the warm tones of Portra 400
If there’s one thing I really like about DALL-E 2, it’s how well it generates close-up images. There’s really nothing wrong with either of these two images, but I slightly prefer DALL-E 3 because of the contrast and lighting.
Prompt: a visual homage to Leonardo Da Vinci's anatomical studies including the intricate details of the human body with a modern twist
What I really dislike about DALL-E 2 (which led me to use Midjourney in the first place) is that it tends to create crayon-like images, like the one you see here. It’s also different from Leonardo Da Vinci’s art, which was a crucial part of the prompt.
DALL-E 3 managed to solve this and create something that looks like Da Vinci’s Vitruvian Man. It also added a modern twist, as specified in the prompt, by depicting a cyborg instead of a human.
Prompt: a visualization of morality
Creating a visual depiction of an abstract concept is always a difficult task and that shows in these images. DALL-E 2 went into a more concrete depiction where it chose to define morality as an inner struggle between two sides of your personality. DALL-E 3 went to a more abstract road where it depicted morality as a multi-faceted spectrum of different values.
Visualization of Digital Concepts
Prompt: a visualization of the Internet as a physical landscape, with websites and social media platforms as buildings
This one is not even close. DALL-E 2’s output lacks creativity and the colors blend together in a bad way. You couldn’t even tell the buildings apart. On the other hand, DALL-E 3 did a great job at creating a society based on the internet landscape today. I also particularly like the double-meaning of the high-speed traffic going in and out of the internet.
Prompt: a news article from the future reporting on the first contact with an extraterrestrial civilization
One of the most significant challenges of AI image generators today is text. This is because they perceive texts as shapes without any meaning. Whenever I use DALL-E 2, the text always comes out looking like the Greek alphabet, which you can see here.
DALL-E 3 promises to have better text generation and, for the most part, it does. It’s still unpolished, but it’s definitely an improvement.
Prompt: an ancient civilization's undiscovered lost city buried deep within a jungle
DALL-E 2’s choice to generate an aerial view of the prompt is a head-scratcher for sure. You really can’t see any details apart from the ruins, which are poorly-rendered to begin with. DALL-E 3’s artwork reminds me of ancient Aztec civilization. It’s detailed, well-lit, and has a mythical quality to it.
Prompt: a flying golden retriever
It’s not a good sign when your AI image generator can’t fulfill a low context prompt like this one. Apart from the glaring rendering issues on DALL-E 2’s dog, there’s also the fact that it’s jumping and not flying.
DALL-E 3 was able to turn a simple prompt into a cute artwork. That said, I don’t really understand the need to turn the goldie into an angel, but it adds to the charm of the image.
High Context: Diverse Elements
Prompt: an alternate history scene where ancient Egypt with advanced technology, featuring pyramids with rocket boosters, hieroglyphic-coded computers, and cyborg pharaohs, is in a war against a futuristic Roman legion with electric spears and mecha horses
Unsurprisingly, DALL-E 2 couldn’t handle a high context prompt at all. In fact, it looked like it gave up halfway through the process. It would be fair to say that its output is a complete mess.
DALL-E 3, once again, has blown me away with its precision when it comes to prompts. It didn’t miss any single line, even when I kept specifying random objects. This one’s a clear winner.
High Context: Background Description
Prompt: a neo-noir film still of a grizzled detective drinking whiskey in his burgundy-colored chair during a thunderstorm at night, behind him is a bookshelf filled with books and an ashtray, a single lamp is illuminating one side of his face
At this point, I’m noticing a trend in DALL-E 2 where rendering issues become more apparent the longer the prompt is. For instance, the image above is missing its face. Meanwhile, DALL-E 3 created the perfect film still that accurately depicts my prompt. It’s moody, atmospheric, and gritty — just like any other neo-noir film.
High Context: Subject Description
Prompt: an expressive oil painting of a young adult woman of southeast asian descent, with slight curls on her long black hair, who is trying to control her rage. her fists are clenched as her anger slowly turns into fury. emotions are shown in her face and posture. her face is slowly becoming crimson red as she's overcome with fury.
I’m actually surprised at how well DALL-E 2 completed this prompt. It didn’t miss a single descriptor and I’d say I’m satisfied with how it turned out. However, DALL-E 3 is just on a whole different level. You can really feel the emotions coming out of the woman in the picture. I also like that, if you zoom in, you can see the brush strokes that’s common in oil paintings.
My Thoughts on DALL-E 3
I expected DALL-E 3 to be a better version of DALL-E 2, but I didn't anticipate such a stark difference between them to be fair. Looking at the outputs side by side, it’s really night and day in terms of quality alone.
It’s not just creativity, it also follows through with its promise of better nuance and text generation. Missing the interpretation of a line in the prompt seems to be so rare in DALL-E 3, which is something that happens quite often even between Midjourney and Firefly.
That said, DALL-E 3 is not perfect. But considering the strides it made in just a year, it's now a viable competitor to other AI image generators