How Midjourney Evolved Over Time (Comparing V1 to V6.1 Outputs)
Today, Midjourney is one of the best AI image generators anyone has access to. But it hasn't always been that way. Here's a comparison of how Midjourney evolved from two years ago to now.
John Angelo Yap
Updated September 23, 2024
Different versions of the same AI, generated with Midjourney
Reading Time: 9 minutes
It's hard to believe that, two years ago to this date, AI was mostly treated as science fiction.
It wasn't until November of 2022 that ChatGPT became publicly available. DALL-E was only accessible to a select few. DeepMind and OpenAI were the only two companies that were heavily investing in deep learning.
One of the earliest mainstream AI products was released early that year: Midjourney. It now has millions of daily users worldwide. With its latest model, we're witnessing how awesome and terrifying (thanks to DeepFakes) AI art can be both for 2024 and for years to come.
But it hasn't always been that way.
Midjourney had a challenging start, to say the least. Now, enough time has passed that we can look back at its improvements over the last 25 months. Here is what Midjourney looked like over two years ago, compared to where it is today:
Midjourney's Evolution Through Images
People who were late in the game never experienced the rough beginnings of Midjourney. There was a time when people questioned if it was really worth pursuing AI image generation because of poor results from both DALL-E and Midjourney. Here are some reminders of how far we've come since then:
Portraits - Day
high quality photography of a young Japanese woman smiling, backlighting, natural pale light, film camera, by Rinko Kawauchi, HDR
There's not much difference between V1, V2, and V3. The images produced by these models are a complete mess, but they're a product of their time. It was a period where the only accessible AI image models were the first iteration of DALL-E (which was received better by critics) and some early attempts at creating realistic images from a dataset like ThisPersonDoesNotExist.
V4 was Midjourney's real turning point. It got rid of the jigsaw-like faces and replaced it with a closer approximation of how a human face should look like. However, it still had issues with overemphasis. For example, when I specified that I wanted a Japanese woman as my subject, V4's first instinct was to go overboard with monolid eyes (all the variations' eyes look like the one depicted above).
V5 is ten times better than V4. My only issue with it, as I've mentioned in my previous articles, is that it tends to create flawlessly smooth faces, which are dead giveaways that an image is AI. V6 solved this issue by creating more realistic facial features and an asymmetrical structure.
And what V6 solved, V6.1 perfected. There’s nothing left in the photo to indicate that it’s AI-generated. I’m quite curious to see where Midjourney’s going after this, but one thing’s for sure: V6.1 is a game-changer when it comes to realism.
Portraits - Night
portrait, a beautiful young woman, glamour street medium format photography, feminine, shot on cinealta, night, pastel hues
Everything that I've already said above applies in this set of photos as well. A lack of logical structure characterizes V1 to V3, but you can still determine what the model is trying to make. V4 is the actualization of those concepts: creating coherent and more realistic portraits, although a little uncanny.
V5, again is where it starts to become better, but the subject is still too perfect. V6's subject and background details are a lot more subtle, which makes for better realism while increasing its creativity.
And what else can I say about V6.1 that I haven’t said earlier? It’s pretty indistinguishable from a real image.
Landscape
landscape, an autumn in the lake during dusk, tranquility
V1 is actually a little amusing since you can clearly see a Shutterstock logo on the bottom-left corner, showing us where the Midjourney team initially sourced the training data and an insight into how they refined their dataset pre-processing. V2 and V3 is a lot more coherent here than their counterparts, but they still can't generate HD images. The reflections on the water are also inconsistent.
V4 is more creative, but it still has some nuance issues, as seen in the trees submerged in the lake. V5 perfected reflections but still hasn't resolved its realism issues yet. And then we have V6, which accurately emulates real photography by adding little details such as small waves and natural sky gradients.
V6.1 is a bit more… polished, for a lack of a better term. It’s better, yes — but I do think that there’s a hint of an “AI” uncanny valley going on. Though, to be fair, this is me being a bit too nitpicky with Midjourney’s latest model.
Food Photography
a photorealistic cheeseburger, white clean background, commercial photography
If I were to describe V1 to V3's images in a sentence, I'd say it's what aliens must assume a cheeseburger looks like. V1 and V2's burgers, in particular, don't even have patties — only onions and a huge block of cheese.
Then V4 creates an almost perfect burger, but the proportions seem a bit off and it appears to have a texture resembling Play-Doh. If I were to nitpick V5's output, I'd say there are a few sesame seeds at the bottom when there shouldn't be.
If you're looking for a photorealistic cheeseburger, V6 won't disappoint you.
V6.1 bothers me a bit. It has all the right ingredients, but the wrong order. It’s a good enough image, but who the heck puts lettuce at the bottom then cheese?
Product Photography
commercial photography, a women's necklace with a sunflower pendant, minimal background, natural light
If there's anything that the earlier versions of Midjourney lack, it's structure. In the images above, it's clear that it doesn't see shape the way we do, and that issue doesn't get resolved until V4.
In this case, I'm happy with V4, V5, and V6's outputs. They're all good product mockups in their own right, even if they had different interpretations of my output. But if I were to pick a standout?
V6.1 is simply just too good at restraint. It’s just a good, simple, and minimal product mockup of a sunflower pendant.
Pixel Art
pixel art scene, the eiffel tower at midnight, city lights, romantic
This might be controversial but I think V4 has the best pixel art artwork here. The size of the "pixels" are more consistent and the art style reminds me a lot of earlier 8-bit games. That said, I still prefer V5 and V6's outputs visually. The only thing weighing them down is the inconsistency of pixel sizes, which is more apparent in the former's output if you zoom in.
So, let’s talk about V6.1’s output. This should’ve been the best one, considering the rate Midjourney’s been improving. But it’s just…not good. Pixels bleed into each other and they’re too inconsistent. This is a poor example of pixel art.
Animation
anime movie still, studio ghibli, a woman going to the beach alone
It occurs to me that prompt comprehension isn't a big issue with the earlier versions of Midjourney, at least for simple prompts. Of course, they're still unpolished, but you can see that they've managed to understand "how" to create what I'm asking for, they just didn't have the tools to make it.
V4 is a huge step up but it's still a little too low-resolution. As for V5, there's no beach in the world where its waves physically make sense, and it doesn't resemble Studio Ghibli artwork. V6 and V6.1 manages to capture the hand-drawn realism of Studio Ghibli anime films while creating a pretty darn good animation still.
Text Generation
night photography, a neon sign outside a restaurant saying "Dinner is served"
Something weird that I noticed in this comparison is how close V2 and V3 are to writing "Dinner is served," which suggests that Midjourney must've pulled its focus away from text generation when they rolled out with V4 and V5.
I've already said this is in my other V6 articles, but Midjourney is one of the best AI image models when it comes to text, and its output above proves that point further. V6.1 isn’t exactly a step up though, but it’s still pretty good.
Multiple Subjects [High Context]
a rabbit, a porcupine, two cats, and a wizard having a tea party:: 90s animated tv series
None of these images nailed the prompt at all, but V6 is the closest one. It has two rabbits (instead of one), a cat (who also happens to be a wizard), and some sort of cat-porcupine hybrid. Midjourney is still far from DALL-E 3's nuance, but it's getting there.
Some Observations
After going through all these images, I've come to the conclusion that each Midjourney model must have focused on a few aspects every time they've upgraded after V3. To be more specific:
- V4: Prompt cohesion and output structure. Figuring out how to put shapes and ideas together to create a coherent image.
- V5: Once they've figured out how to create coherent images, they improved the generator's overall creativity.
- V6: This is one of their biggest updates so far, with significant improvements on realism, text generation, and understanding.
- V6.1: Just a small upgrade from V6, if any. It’s better at realism than the base model, but it somehow got worse at certain aspects. That said, this model’s just a stepping stone to V6.2, which is expected to drop in late 2024 as seen in this announcement.
The Bottom Line
Through these images, we can clearly see how Midjourney has improved over the last two years. It's not only better than most AI image generators, but it can also genuinely create art better than people.
Midjourney V6's realism, creativity, and speed of improvement are both fascinating and frightening. V6.1 continues this trend, but falls short in certain aspects. For us hobbyists and reviewers, it's a cool product for creating artwork. For artists and the world in general, it has the potential to erase jobs and fuel fake news because of deepfakes.
But that's not for at least a couple of years. For now, let's just enjoy what Midjourney has to offer and keep our eyes peeled for V6.2. Have fun prompting!
Want to Learn Even More?
If you enjoyed this article, subscribe to our free newsletter where we share tips & tricks on how to use tech & AI to grow and optimize your business, career, and life.