Midjourney, DALL-E, Firefly, Meta, and Stable Diffusion Same Prompts, Compared

With the release of Meta's AI image generator and Midjourney V6, the competition for the best AI image generator is becoming more fierce. So, let's see how Midjourney, Meta, DALL-E 3, Stable Diffusion XL, and Adobe Firefly 2 stack up against each other with the same prompt.
Updated January 2, 2024
A ship, fireflies, and three painters, generated with Midjourney
A ship, fireflies, and three painters, generated with Midjourney

A few days ago, we had an early Christmas present from the Midjourney team with the sudden release of V6's base model, promising better prompt comprehension and text generation than its previous model. A week before that, Meta also dropped a new AI image generator, which I believe is the best free model right now.

So, it's that time of the year again. No, I'm not talking about the holiday season.

It's time for a major comparison between the market's most popular AI image generators: Midjourney, DALL-E, Firefly, Stable Diffusion, and Meta.

Which one will come out on top this time? Spoiler Alert: The answer may not surprise you.

The Ultimate Output Comparison

This is the biggest comparison we've ever made, so I'll use the same prompt for each image to maintain fairness. I'll also prominently display the ones I like the most, but don't worry: I'll label each picture to avoid confusion.

Realistic (Portraits)

close-up portrait of a weathered fisherman, wrinkles around his eyes,
salt-spray on his beard, hyperrealistic textures, cinematic lighting

Among the five image generators, only Midjourney and Meta managed to create images that would pass the smell test. Firefly's portrait is too waxy and the fisherman's beard looks fake. Stable Diffusion doesn't look realistic at all, but more like an oil painting. DALL-E 3 could've been good, but it overemphasizes on the wrinkles.

Look at the details on Midjourney's image. When if you zoom in, you can see every strand of hair, the age lines, even the reflection on his eyes. It also has consistent lighting and depth of field. Meta is a close second, but it still has that "softened" effect which is a trademark for AI image generators at this point.

Realistic (Landscape)

a rugged coastline eroded by relentless waves,
towering cliffs that's been sculpted into dramatic arches and hidden coves,
seabirds soar above, mist swirls along the horizon, realism

Once again, Midjourney wins this round. V6 really has been a gamechanger when it comes to realistic images. The images it outputs is still a little stylized and vivid, but it can now pass as a real image. However, if you're just looking for a landscape stock image, then Firefly might be the better option for you.

As for the other three: Stable Diffusion and Meta were actually pretty decent, but the cliffs look like a lump of smooth clay when zoomed in. DALL-E 3 opted to make digital art, which isn't what I'm looking for.

Realistic (Sports)

freeze the action as a pickleball player scores the
final point to win the world championship

Okay. There's a lot to unpack here.

Midjourney is the clear winner of this category. It perfectly encapsulates the fast-growing sport of Pickleball and the kinetic energy behind it. DALL-E 3 could've been good but it suffers from repetition of certain elements.

Moving on to the bottom three. Adobe Firefly looks to be the best among them, but it's not an actual photograph, there's no paddle, and the player only has three fingers. As for Stable Diffusion, the player isn't using the proper equipment, he's breaking through the net, and his face is melting. Literally.

This Meta image though. It's freaking hilarious. No further comment.


a stylish man, in the style of orange and green, plants,
postmodern photography, shadow play, elegant figures,
art nouveau fashion

Midjourney looks most like real fashion photography, so it's deserving of a first place for me. The only problem I have with it is that the shadows obscure parts of the outfit, which should be the focus in the first place. Meta created the best top but it would've helped if we could see the entire outfit.

DALL-E 3 is so good but the subject's shadow bothers me too much. Stable Diffusion has good photography, but a rendering issue caused the fingers to bleed into the outfit. Adobe Firefly is so realistic, but it didn't follow my instructions for art nouveau or elegance. This would've been way higher if it was for casual fashion.

Architecture & Interior Design

a realistic dorm room, interior design,
golden hour, noisy, urban, atmospheric

In terms of realism, only Midjourney and Meta passed this interior design test. I actually prefer Meta here because it looks like an actual dorm room. Sure, there are still some mistakes, primarily in the computer screen on the left, but it's unnoticeable from afar. Midjourney's output is good too, but its nuance feels off since that isn't a practical dorm room design.

3D Product Renders

commercial photography, a perfume bottle,
pastel blue background, dreamy, soft lighting, centered, flowers

I'm actually impressed because all of these turned out to be good. However, Midjourney V6 continues to be on a league of its own with another beautiful entry. It's dreamy, well-shot, and has great contrasts. Meta is, once again, a close second. The only letdown is the bad text generation.

Character Design

character design, a human battlemage, forest imagery, inspired by high fantasy

DALL-E 3's artwork in this round is so impressive. It's an amazing template if you're looking for a wise NPC for your next DND session. Stable Diffusion makes more sense if you're brainstorming a character for yourself or a hero in a game.

Midjourney could've been good, but the decision not to display the subject's face makes absolutely no sense for character designs. Firefly is a little too mainstream for my taste and it looks like an NPC from one of those old Adobe Flash games. Meta also made a great design, but I'd argue that it isn't a battlemage.

Digital Art

pixel art scene, a quiet and empty supermarket at night,
atmospheric, 16-bit

This is a matter of personal preference, but upon careful assessment, I prefer DALL-E and Stable Diffusion's version of this prompt because it perfectly emulated the "atmospheric" vibe I was looking for. This is also the first time that Midjourney comes at second for me, mostly because the "pixel art" illusion goes away when you zoom in.

Midjourney has a good entry, but the pixels are too fine to the point that I don't think it should qualify as pixel art anymore. Firefly didn't crack the top two because it generated food market stalls inside a grocery, which shows that it lacks nuance. Meta is, by far, the worst in pixel art, failing in both contextual understanding and pixel art impersonation.

a logo for a barbershop, by paul rand, clean background, minimalist

This is a win for Midjourney. Everyone else went for a generic logo, but Midjourney did something new by taking a barber's pole and turning the colors into something that resembles brush strokes. It's so simple yet so effective and unique. Apart from completely fulfilling a long prompt, this is probably the best case for Midjourney's improved nuance.

DALL-E 3 also deserves a mention here because it managed to create a well-designed logo, albeit common. The biggest problem I have with it though is that it created two different logos when I asked only for one.

Text Generation

a comic panel of a distraught Tony Stark saying "Captain is dead."

It should come as no surprise that DALL-E 3 is in our Top 2 this round, but for the first time ever since I've started comparing AI image generators, I don't find it the best for text generation. But let's start with the Stable Diffusion, Meta, and Firefly first — all of which couldn't write legible text. Oh, and I don't think Firefly knows who Tony Stark is.

When Midjourney V6 came out, they put an emphasis on their text generation improvements and it really shows. Look at the accuracy of that text. That's not even edited. I've said it earlier in my V5 vs. V6 comparison, but Midjourney really is the best at text now.

Now, let's go to DALL-E 3. It may not be as good as V6 but it's almost there. Almost. It certainly didn't help that Tony Stark is shouting "Captan's dead" while Captain America is behind him.

High Context

A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.

This one's actually impressive. If we're only talking about comprehension, then all of these images passed this test. So, we have to factor in which one fulfilled it the best.

I took this prompt from DALL-E 3's announcement page so there's no question that their output is the best. From there, it's tough to rank the others 1 to 4.

Stable Diffusion and Midjourney had the best looking outputs, but it tearing doesn't look like "broken porcelain" to me, more like a crumbling wallpaper. Firefly was almost perfect, but it missed the "splatter paint patterns." Meanwhile, Meta fulfilled every aspect of the prompt, but it generated a subpar image, in my opinion.

So, What Are They Good At?

AI Image Generators

Best For

Worst For


Midjourney V6 is an amazing improvement from V5.2, fixing every problem that its previous generation had. In my opinion, it's now the best for both realistic and digital art, as well as text generation. It's also the best at mimicking certain art styles, which other AI image generators can't do due to policies and guidelines.

Midjourney may be the best at it, but it still has trouble generating long texts. The learning curve for prompts is also much higher with the release of V6.


DALL-E is still the best for prompt comprehension and a great alternative to Midjourney for generating texts. It's also the best at creating pixel art.

DALL-E could use some work in generating realistic images, especially ones with people.


Meta does realistic images really well, especially portraits and landscape photos. It's also the best free AI image generator in the market.

Meta still can't do text generation reliably. In all my testing, I've also found that it struggles a lot with pixel art.


Firefly is best used by digital artists who use the Adobe suite for editing.

Like most generators, Firefly still can't generate text. It also struggles with creating artwork based on existing characters.

Stable Diffusion XL

Stable Diffusion is a good AI image generator if you're looking that can fulfill long prompts for free.

Stable Diffusion can't generate realistic portraits without overemphasizing certain features.

Final Thoughts

With the release of Midjourney V6, it's getting harder and harder to make a case for other AI image generators. The base model is on a league of its own, and it's only going to get better when they officially release it especially since they're taking user opinion to improve their model.

However, if you're just a casual user, Meta is a good alternative since it's free. If you're looking for a model with amazing comprehension, DALL-E (with ChatGPT) is still the best one in the market. Adobe Firefly should also be a great alternative if you're using it with other Adobe products and introducing streamlined inpainting to your workflow. Lastly, Stable Diffusion can also improve a lot with LoRAs and other extensions, which I'm planning to implement in future Stable Diffusion reviews.

The fact of the matter is that there's a lot to love for every AI image generator out there but V6 is a real turning point for AI art. The only question is, where do they go from here?

Written by John Angelo Yap
Hi, I'm Angelo. I'm currently an undergraduate student studying Software Engineering. Now, you might be wondering, what is a computer science student doing writing for Gold Penguin? I took up studying computer science because it was practical and because I was good at it. But, if I had the chance, I'd be writing for a career. Building worlds and adjectivizing nouns for no other reason other than they sound good. And that's why I'm here.
