Gold Penguin Logo with Text

How Midjourney Evolved Over Time (Comparing V1 to V6 Outputs)

Today, Midjourney is one of the best AI image generators anyone has access to. But it hasn't always been that way. Here's a comparison of how Midjourney evolved from a year and a half ago to now.
Updated January 8, 2024
Midjourney v1 to v6 same prompt with a different output
Midjourney v1 to v6 same prompt with a different output

It's hard to believe that, two years ago to this date, AI was mostly treated as science fiction.

It wasn't until November of 2022 that ChatGPT became publicly available. DALL-E was only accessible to a select few. DeepMind and OpenAI were the only two companies that were heavily investing in deep learning.

One of the earliest mainstream AI products was released early that year: Midjourney. It now has millions of daily users worldwide. With its latest model, we're witnessing how awesome and terrifying AI art can be for the future.

But it hasn't always been that way.

Midjourney had a challenging start, to say the least. Now, enough time has passed that we can look back at its improvements over the last 23 months. Here is what Midjourney looked like two years ago, compared to where it is today:

Midjourney's Evolution Through Images

People who were late in the game never experienced the rough beginnings of Midjourney. There was a time when people questioned if it was really worth pursuing AI image generation because of poor results from both DALL-E and Midjourney. Here are some reminders of how far we've come since then:

Portraits - Day

high quality photography of a young Japanese woman smiling, backlighting, natural pale light, film camera, by Rinko Kawauchi, HDR

There's not much difference between V1, V2, and V3. The images produced by these models are a complete mess, but they're a product of their time. It was a period where the only accessible AI image models were the first iteration of DALL-E (which was received better by critics) and some early attempts at creating realistic images from a dataset like ThisPersonDoesNotExist.

V4 was Midjourney's real turning point. It got rid of the jigsaw-like faces and replaced it with a closer approximation of how a human face should look like. However, it still had issues with overemphasis. For example, when I specified that I wanted a Japanese woman as my subject, V4's first instinct was to go overboard with monolid eyes (all the variations' eyes look like the one depicted above).

V5 is ten times better than V4. My only issue with it, as I've mentioned in my previous articles, is that it tends to create flawlessly smooth faces, which are dead giveaways that an image is AI. V6 solved this issue by creating more realistic facial features and an asymmetrical structure.

Portraits - Night

portrait, a beautiful young woman, glamour street medium format photography, feminine, shot on cinealta, night, pastel hues

Everything that I've already said above applies in this set of photos as well. A lack of logical structure characterizes V1 to V3, but you can still determine what the model is trying to make. V4 is the actualization of those concepts: creating coherent and more realistic portraits, although a little uncanny.

V5, again is where it starts to become better, but the subject is still too perfect. V6's subject and background details are a lot more subtle, which makes for better realism while increasing its creativity.


landscape, an autumn in the lake during dusk, tranquility

V1 is actually a little amusing since you can clearly see a Shutterstock logo on the bottom-left corner, showing us where the Midjourney team initially sourced the training data and an insight into how they refined their dataset pre-processing. V2 and V3 is a lot more coherent here than their counterparts, but they still can't generate HD images. The reflections on the water are also inconsistent.

V4 is more creative, but it still has some nuance issues, as seen in the trees submerged in the lake. V5 perfected reflections but still hasn't resolved its realism issues yet. And then we have V6, which accurately emulates real photography by adding little details such as small waves and natural sky gradients.

Food Photography

a photorealistic cheeseburger, white clean background, commercial photography

If I were to describe V1 to V3's images in a sentence, I'd say it's what aliens must assume a cheeseburger looks like. V1 and V2's burgers, in particular, don't even have patties — only onions and a huge block of cheese.

Then V4 creates an almost perfect burger, but the proportions seem a bit off and it appears to have a texture resembling Play-Doh. If I were to nitpick V5's output, I'd say there are a few sesame seeds at the bottom when there shouldn't be.

If you're looking for a photorealistic cheeseburger, V6 won't disappoint you.

Product Photography

commercial photography, a women's necklace with a sunflower pendant, minimal background, natural light

If there's anything that the earlier versions of Midjourney lack, it's structure. In the images above, it's clear that it doesn't see shape the way we do, and that issue doesn't get resolved until V4.

In this case, I'm happy with V4, V5, and V6's outputs. They're all good product mockups in their own right, even if they had different interpretations of my output.

Pixel Art

pixel art scene, the eiffel tower at midnight, city lights, romantic

This might be controversial but I think V4 has the best pixel art artwork here. The size of the "pixels" are more consistent and the art style reminds me a lot of earlier 8-bit games. That said, I still prefer V5 and V6's outputs visually. The only thing weighing them down is the inconsistency of pixel sizes, which is more apparent in the former's output if you zoom in.


anime movie still, studio ghibli, a woman going to the beach alone

It occurs to me that prompt comprehension isn't a big issue with the earlier versions of Midjourney, at least for simple prompts. Of course, they're still unpolished, but you can see that they've managed to understand "how" to create what I'm asking for, they just didn't have the tools to make it.

V4 is a huge step up but it's still a low-resolution. As for V5, there's no beach in the world where its waves physically make sense, and it doesn't resemble Studio Ghibli artwork. V6 manages to capture the hand-drawn realism of Studio Ghibli anime films while creating a pretty darn good animation still.

Text Generation

night photography, a neon sign outside a restaurant saying "Dinner is served"

Something weird that I noticed in this comparison is how close V2 and V3 are to writing "Dinner is served," which suggests that Midjourney must've pulled its focus away from text generation when they rolled out with V4 and V5.

I've already said this is in my other V6 articles, but Midjourney is one of the best AI image models when it comes to text, and its output above proves that point further.

Multiple Subjects [High Context]

a rabbit, a porcupine, two cats, and a wizard having a tea party:: 90s animated tv series

None of these images nailed the prompt at all, but V6 is the closest one. It has two rabbits (instead of one), a cat (who also happens to be a wizard), and some sort of cat-porcupine hybrid. Midjourney is still far from DALL-E 3's nuance, but it's getting there.

Some Observations

After going through all these images, I've come to the conclusion that each Midjourney model must have focused on a few aspects every time they've upgraded after V3. To be more specific:

  • V4: Prompt cohesion and output structure. Figuring out how to put shapes and ideas together to create a coherent image.
  • V5: Once they've figured out how to create coherent images, they improved the generator's overall creativity.
  • V6: This is one of their biggest updates so far, with significant improvements on realism, text generation, and understanding.

The Bottom Line

Through these images, we can clearly see how Midjourney has improved over the last two years. It's not only better than most AI image generators, but it can also genuinely create art better than people.

Midjourney V6's realism, creativity, and speed of improvement are both fascinating and frightening. For us hobbyists and reviewers, it's a cool product for creating artwork. For artists and the world in general, it has the potential to erase jobs and fuel fake news because of deepfakes.

But that's not for at least a couple of years. For now, let's just enjoy what Midjourney has to offer. Have fun prompting!

Want To Learn Even More?
If you enjoyed this article, subscribe to our free monthly newsletter
where we share tips & tricks on how to use tech & AI to grow and optimize your business, career, and life.
Written by John Angelo Yap
Hi, I'm Angelo. I'm currently an undergraduate student studying Software Engineering. Now, you might be wondering, what is a computer science student doing writing for Gold Penguin? I took up studying computer science because it was practical and because I was good at it. But, if I had the chance, I'd be writing for a career. Building worlds and adjectivizing nouns for no other reason other than they sound good. And that's why I'm here.
Notify of

Inline Feedbacks
View all comments
Join Our Newsletter!
If you enjoyed this article, subscribe to our newsletter where we share tips & tricks on how to make use of some incredible AI tools that you can use to grow and optimize a business