How Do DALL-E 3 And Midjourney Interpret The Same Prompts? Here's 50 Examples

It's no secret that DALL-E and Midjourney are two of the most popular AI image generators today. But how reliable can they be with various prompts? Today, we're testing these models with complex prompts.

John Angelo Yap

Updated January 9, 2024

A green robot (symbolizing DALL-E) against a single warship (symbolizing Midjourney) generated with V6

A green robot (symbolizing DALL-E) against a single warship (symbolizing Midjourney) generated with V6

Reading Time: 10 minutes

I've lost count of how many comparison articles I've written about AI image generators, but to this day, I'm still excited to talk about them and actually experiment with my prompts. This gives me the opportunity to engage with these tools and see how creative they can actually be.

Without a doubt, my favorites have always been DALL-E 3 and Midjourney. In the past, I've already tested their general creativity and text-generation capability. So let's now move on to the next big issue in AI image generators: nuance.

While I do understand that these tools differ in how they accept prompts (and what they each require to get your desired image out of them) but, the goal of this article isn't to judge the differences rather show what types of language create what for both of these tools.

What if they were given complex prompts? How creative can they be with lots of context and supporting details? Here are some examples to answer that question:

Midjourney vs. DALL-E 3 Complex Prompt Comparison

For these comparisons, I focused on populating the prompts with as much context as possible, whether it's on the subject or supporting details.

That said, length isn't the only factor in difficulty — there are prompts here that are shorter but require more understanding to generate accurately and creatively.

Each prompt will have two images: the images on the left are DALL-E 3, while the images on the right are Midjourney V6.

Realism (People)

I've said this over and over again, but Midjourney V6 really sets the bar high in terms of realism. As seen in the images below, DALL-E outputs can't quite match V6 because they still tend to be softened and flawless to the point of being uncanny.

As for nuance, DALL-E surprisingly ignored some of my prompt details. For instance, it completely ignored my "blonde" specification in the first prompt. Another example is when it generated an artwork instead of a photo in the third example.

On the other hand, Midjourney tends to make more mistakes when bombarded with details. The ramen example below showcases a lack of understanding and accuracy. I mean, who eats ramen like that?

portrait, a beautiful blonde korean woman on her mid 20s, glamour street medium format photography, feminine, shot on cinealta, night, pastel hues, cityscape background, vintage-inspired attire, soft ambient streetlights, reflective surfaces, subtle bokeh effect

a close-up film photo of an obscured man in a dream sequence, a subtle holographic glow outlines a slot-canyon. film photo is dark has subtle film grain as if shot on low ISO film; the photo features selective focus and contrasting rainbow-holographic accents. photo is shot on soaked film

black woman standing in a full of multicolor lasers shooting around him, radial blur, album cover, he’s standing still, shades, chain, black dress with yellow stripes, poster, trippy, 3d image, dark backdrop

a young asian-american woman wearing a cream sweater, in the style of mamiya rb67, shige's visual aesthetic style, dark brown and light beige, tumblewave, oshare kei, brooding mood, capturing the soft, ethereal glow of the natural light filtering through the fabric of her cream sweater, the vintage Mamiya RB67 lens emphasizing the rich tones of her dark brown and light beige surroundings

high-quality photography of a young girl smiling, backlighting, natural pale light, film camera, by Rinko Kawauchi, HDR, radiating a timeless joy against a backdrop of ethereal, sun-kissed hues that highlight the pure and genuine emotion

a man eating a bowl of ramen, nikon d850, in the style of Asian cinema, natural lighting, evoking the cinematic atmosphere of an intimate ramen shop, warm glow of natural light enhancing the authenticity of the moment

a man scoring a point in pickleball, sports photography, freezing the dynamic motion of victory on the pickleball court, with sharp focus and vibrant colors capturing the adrenaline-fueled triumph, slight motion blur

fashion photography, a stylish Indian-American woman in a blue and gold sundress, postmodern photography, elegant figures, art nouveau fashion, presenting a captivating fusion of contemporary style and classical elegance

a young man in a plain white top, indie, retro, medium format photography, warm light, dorm room aesthetics, taken with an iphone 6, lightroom

aesthetic photography, close up portrait of beautiful blonde woman with blue eyes, calm atmosphere, warm colors, snapshot photography, tapestry of beauty

an old man in the middle of a hallway, closeup, grainy 1988 VHS screengrab captured in the middle of an unnervingly clean, vast abandoned empty train station, unsettling, VHS filter, liminal space

cinematic, high key photo, a curly long-haired man, ARRIFLEX 35 BL camera, canon k35 prime lenses, black and white, subtlety, model photography

Other Realism Examples

Both AI image generators accurately created an artwork that follows every word of my prompt.

As for realism, the issues that DALL-E has with people are less apparent in images without them. In the series of photographs below, I'm only dissatisfied with the ripples (a clear case of AI repetition) and the pizza (who eats pizza with only tomatoes, pineapples, and olives?)

That said, Midjourney is still a clear winner in this category, showcasing outstanding prompt comprehension and creativity.

a micro shot of ripples on a river, canon eos 5d mark iv, naturalistic, zooming in on the intricate patterns and textures of gentle ripples on a river's surface, capturing the mesmerizing details of nature's subtle movements in a microcosmic perspective

a hyperrealistic slice of lasagna, white background, isolated

a minimap diorama of a small library attached to a cafe. wooden beams crisscross above. books are neatly arranged on wooden bookshelves, creating a charming miniature world

macro shot of a green human eye, exploring the intricate details of human eyes up close in a captivating macro shot, delicate patterns and textures

wide shot of a snow leopard blending in with his surroundings, wildlife photography, shot in the Himalayas, national geographic award-winning photo

product photography, a cup of coffee, coffee beans in the background, chic, coffee shop aesthetics, warmth and coze, warm tones. ceramics

a visually striking and premium quality photograph of an albert einstein bobblehead figure, hyperrealism, set against a serene pastel blue background

food photography, taking a slice from the cheese pizza, macro shot, focus on the cheese pull, beautiful indulgence

commercial photography, a bottle of wine, grapes, elegance, high contrast, cinematic lighting, luxurious ambiance, high-contrast visuals and cinematic lighting, sophistication and refinement

an aerial view of a pair of white sneakers on a soft mint green background, with natural daylight casting subtle shadows, commercial photography, minimalism

Zion National Park, landscape, retro style, Fujifilm XF 10-24mm f/4, overcast weather, muted tones, soft lighting, panoramic

Cinematic film still, the view on top of the mountain, awe-inspiring, grandeur, clouds, a man is standing in the distance, alone in a wide sea of clouds

A vast expanse of grassland, two-dimensional, 16k, high resolution, sunrise, intricate play of light and shadows, serene moments captured

Landscape photography, a beach during a storm, calm waters and dark skies, 8k, high resolution, cyan, calm before the storm, Fujifilm Pro 800Z, beautiful and ominous

magazine photography, a forest, lights filtering through the trees, biophilic, peaceful and serene, atmospheric, cellulose, southeast asian flora

national geographic photograph of antarctica, vast glaciers, snowstorm, ominous beauty

Digital Art

I've always leaned towards Midjourney for artworks, and these sets of examples are no exception. This AI model somehow manages to generate art that is not only creative but also precisely made. However, I do prefer some of DALL-E's creations, most notably the witch, the beach, and the RPG artworks.

For DALL-E, it has a surprising amount of creativity, but it still lacks the ability to generate copyrighted characters. For example, when I asked it to make a Mickey Mouse portrait in the style of Dragon Ball Z, I think it tried to generate a weird photo of what's supposed to be Steamboat Willie and Bugs Bunny.

a witch in a worned-out green dress releasing tremendous amounts of energy, dark fantasy illustration, lithography, 1980s illustration, gothic dark and macabre, larry elmore, lovecraftian

mickey mouse in a dvd screen grab of Dragon Ball Z, drawn by Akira Toriyama, animated by Toei animation studio, 1985 Japanese anime

vector art of a beach at twilight, with the sky painted in deep purples and blues, reflecting on the calm waters and creating a serene and peaceful scene. cinematic, wide-angle lens

a young woman watching tv on a house full of flowers, natural light coming from the window, cell shaded anime style, studio ghibli, makoto shinkai

miami beach with overcast skies, pixel art, 16-bit, calm before the storm, snes, game design, palm trees and their shadows are accurately portrayed

a surreal collage, pure ecstasy, happiness, organized chaos

a 1978 sci-fi magazine cover depicting an illustration of neil armstrong's first steps on the moon

midcentury modern artwork, soft colors, a greek goddess stepping foot on new york city, detailed oil painting

1950's optical illusion, a corridor to purgatory, glitchy and trippy, psychedelia, minimal, rené magritte, edward hopper, vivid colors

a colorful city in the middle of a quiet forest, rpg, realistic cartoon style, black line on the edge, ultra detailed, takao ogawa, toei animation

a God in complete and utter defeat, linocut print, silver hair, eyes containing the universe, distraught face, spiraling into madness, shigeo fukuda, surreal interstellar background, cosmos

a honda civic cruising at midnight, synthwave, magical realism, red and blue

Architecture and Interior Design

Word for word, both AI image generators successfully followed every instruction I gave them. However, DALL-E still has a weird, soft filter that it applies to some images, which makes realistic generations look like they're... AI-generated.

a modern interpretation of ancient greek temples, commercial photography, luxury architecture, Greek aesthetics infused with a contemporary twist, meticulous and opulent

architecture photography, a house, art nouveau style, diverse but muted colors, post-impressionism, nature and artistic expression

exterior shot, old bar, baroque architecture, biophilic, cozy, warmth, historic charm with a natural touch

interior of a rustic reading nook with exposed wooden beams, fine details, super wide angle, chic, bohemian

interior shot of a bathroom, luxury high end, beige colors, penthouse suite, architecture digest photography, sophisticated style

a living room, disco decor, 1970s interior design, bauhaus, vivid colors

Various Text Generation Examples

DALL-E 3's outputs are perfect this round. Midjourney, on the other hand, still suffers from word repetition, as seen in the red car image below. This is something that I've noticed with V6, and it shows a lack of understanding of what the words actually mean.

For a more in-depth comparison of DALL-E and Midjourney for text generation, you can read this article.

magazine photography, a teacher instructing her kindergarten class, behind her is a blackboard with the text "A is for Apple"

a logo of a bonsai tree, in the style of paul rand, the text "Biomes" must be below the logo

a 24/7 convenience store with the name "Always Open"

an old red Toyota whose license plate spells out "MCQUEEN"

Final Thoughts

It might be a little anti-climactic, but I have to give this comparison a tie. If I used ChatGPT instead of Bing Create, then this would be a narrow victory for DALL-E 3.

We're now at a point in AI image generation where they're only one or two versions away from completely understanding your every instruction. At their current state, they only skip one or two words per prompt, which is already a significant leap from where they were a year ago.

For now, you'll have to settle for a tie - but that's not necessarily a bad thing. It only means you have two choices for AI art. So, choose wisely and enjoy the creative possibilities that both DALL-E 3 and Midjourney offer. Just go with whatever one fits your style the most.

Want to Learn Even More?

If you enjoyed this article, subscribe to our free newsletter where we share tips & tricks on how to use tech & AI to grow and optimize your business, career, and life.


Written by John Angelo Yap

Hi, I'm Angelo. I'm currently an undergraduate student studying Software Engineering. Now, you might be wondering, what is a computer science student doing writing for Gold Penguin? I took up studying computer science because it was practical and because I was good at it. But, if I had the chance, I'd be writing for a career. Building worlds and adjectivizing nouns for no other reason other than they sound good. And that's why I'm here.

Subscribe
Notify of
guest

1 Comment
Most Voted
Newest Oldest
Inline Feedbacks
View all comments