DALL-E. Meta. Firefly. Stable Diffusion. Like it or not, it's undeniable that the AI image generation market is definitely oversaturated now. However, there has always been one standout.
To me, it's obvious that Midjourney was the best AI image generator in the business. However, I do recognize that it still has some flaws, particularly with generating realistic images and ones with long prompts and text.
That's why I've been patiently waiting for Midjourney V6, and last night, it finally came. I quickly hopped on Discord and started generating as many images as I could. Let me tell you a quick spoiler: it's worth the wait.
Here are some of the best images I created using Midjourney V6 along with the same prompt but applied to Midjourney 5.2:
Midjourney v5 and v6 Output Comparison
It’s been a little over 24 hours since Midjourney v6 came out and let me tell you: the hype is real. This has been, by far, my favorite image generator. It somehow fixed every single one of my problems with the previous version. Here are some of my favorite examples:
a woman lying in bed with her eyes closed, golden hour, closeup
My biggest gripe with Midjourney was that it couldn't really generate realistic images on par with DALL-E or Meta. The release of V6 seems to have solved that problem. Their realism is on a whole different level now. No more waxy faces and exaggerated features. V6's output is so good that, even if you zoom in, you can see the imperfections that make us human. This is an immense improvement.
landscape, an autumn in the lake during dusk, tranquility
Don't get me wrong: V5.2's image is pretty good, but it's not exactly the look I'm going for. I'm looking for realistic lake images, something that V6 was able to give me. This upgraded version can create authentic-looking images without sacrificing artistic quality. It's way better than DALL-E 3 on this front, in my opinion.
product photography, a perfume, studio lighting, shadow play, jasmine, soft
I'll admit: I'm not too sure about this one. The key difference is that the product images I'm getting from V5.2 looks processed and market-ready, whereas V6 looks more raw, like it's taken straight out of a camera. It may have something to do with the phrasing of my prompts since I've gotten used to cluttered V5 prompts, something that I need to work on due to V6's evolved nuance.
I will say this though: if you're a seasoned editor looking for detailed, well-shot raw images, V6 is a lot better than V5.2.
Movie Stills (Animated)
animated movie still, a young girl following a magical cat to a tree,
inspired by hayao miyazaki, whimsical, magical realism, clean lines, detailed, 8k
This is a great time to talk about nuance. In my prompt, I specifically requested a film still that looks like Hayao Miyazaki's work. V5.2's output didn't follow this at all, instead going for a generic 3D DreamWorks style of animation. On the other hand, V6 followed this instruction to a tee. It looks straight out of Howl's Moving Castle.
I also highly suggest you to zoom in and look at those details in V6's output. The still is so much more vivid and full of life. It's genuinely mindblowing how good Midjourney has improved over the last couple of months.
Movie Stills (Live Action)
film still, back shot of a man in a green jacket, symmetrical,
muted colors, directed by wes anderson
Midjourney V5 definitely had a problem with oversimplifying or overcomplicating prompts, especially ones with lots of context. Look at the example above: I kept it minimal but still, V5 wasn't able to be creative with the prompt he's given. V6 solves this problem by filling in the gaps of my prompt while retaining its original thought.
PS. Yes, I know. The guy is missing his right ear but hey, it's V6's first week!
logo for a shoe company, clean background, paul rand
I never really had any issue with generating logos with V5.2 but, after seeing these images side-by-side, I could really tell that there was room for improvement in hindsight. V6's output retains the minimalism of V5.2 while adding its unique spin to the illustrations that gives them more identity.
the planets in the galaxy as hatching eggs of lovecraftian entities,
surrealism, cosmic, lovecraftian, ethereal, celestial bodies
I've always praised Midjourney's surrealist images as one of their strong points. However, it has a tendency to overpopulate its outputs with subjects that you sometimes can't figure out what's going on — something that you can see above.
V6, with its improved nuance, manages to strike a balance between fulfilling the prompt and being creative. You can now clearly see what they're trying to portray, even with little to no information about the subject.
a restaurant in a quiet chic neighborhood with a neon sign that says "Closed",
One of Midjourney's biggest promises before V6 came out was that it's going to fix its text generation, which is a huge problem across all AI generators. The only one I've tried that's decent on that end is DALL-E 3, but it looks like Midjourney's next in line.
It perfectly wrote "Closed" in the V6 image, even adding its own flair. As for V5.2, well, unless you've got a restaurant called "CORSTARB," I don't think it's cut out for text generation.
However, it's still not perfect, as you can see here:
comic panel, panicked captain america yelling "Get out of here", speech bubbles, gritty
This just shows that Midjourney still doesn't recognize letters since it's still missing a word from my prompt. In my opinion, this works best with single or two-word texts only. But hey, it's miles better than its competitors. Even DALL-E 3 isn't this good.
A detailed oil painting of an old sea captain, steering his ship through a storm.
Saltwater is splashing against his weathered face, determination in his eyes.
Twirling malevolent clouds are seen above and stern waves threaten to submerge
the ship while seagulls dive and twirl through the chaotic landscape.
Thunder and lights embark in the distance,
illuminating the scene with an eerie green glow
Just a heads-up, I borrowed this prompt from OpenAI's DALL-E 3 page. It's sometimes hard to think of elements to add to a prompt. This is also a prompt that OpenAI used to test DALL-E's nuance, so I could also test it with V5 and V6, and then compare.
V5.2 actually did a pretty good job, but still missed a couple of elements like the eerie green glow, seagulls, and thunder. V6 followed everything except seagulls, but there's still one solitary seagull in the background, so this one passes the smell test.
So, Did It Improve?
It did improve, by a lot.
I couldn't show you every test I've done yet (I'm reserving some for my next article) but it's already a hundred times better than V5.2 in my book. It managed to solve the text generation and nuance issues while simultaneously improving its creativity. Every image I've created so far with V6 is crisp, detailed, and accurate.
What else is there to ask for?
The Bottom Line
When V5 came out, some said that it was a backward step from V4.
Gradually, the team listened to the community and improved its creativity, even adding some functionalities in the process. The result was Midjourney V5.2, which was already my favorite AI image generator in the market.
Midjourney V6 is a significant improvement on V5.2. It took everything that was already good with V5.2 and significantly tweaked its model to create more detailed and accurate images. Everything that I've complained about with V5.2 — nuance, text, realism — they've fixed that and then some.
The best thing is that we can only expect it to get better from here on out. The Midjourney team is already crowdsourcing user image opinions through A/B testing to improve its model.
Mark my words: Midjourney V6 is a turning point in AI image generation.