Sora vs. DALL-E 3 Prompt Comparison: Two OpenAI Products, One Winner
Sora is OpenAI's attempt to conquer a new frontier in the AI space: text-to-video. Let's compare its nuance and creativity to one of OpenAI's other models: DALL-E 3.
John Angelo Yap
Updated September 17, 2024
A green robot with a video camera, generated with Midjourney
Reading Time: 8 minutes
I've been hearing about text-to-video for a while now, and I haven't really given it a second thought because I was frankly unimpressed with what I've been seeing online. Clear rendering issues, chaotic movement, unblended motion blurring, and subjects that veer too closely to the uncanny valley.
It was never “good enough.”
I've always thought that I'll give it a try once they've fixed those issues. However, as months passed, I'd check in with the latest news in that space, and I remained uninterested.
That was until February 2024 when OpenAI shocked the world once again by revealing a project that they've kept under tight wrap for years: Sora.
Now, like most people, I couldn't give it a try yet — in fact, it’s now September 2024 and they’re still just deploying it to redteamers only. So, we did the next best thing: compare their showcased outputs against OpenAI's own AI image generator: DALL-E 3. In this article, I'll show you their differences and compare them without bias.
What is Sora?
Similar to DALL-E 3, Sora is another one of OpenAI's attempts to conquer the AI space. It's a diffusion model for text-to-video generation, whereas DALL-E 3 is only for text-to-image. Unfortunately, as of September 2024, it's not available to the masses yet, but we should be expecting a public beta sooner or later (At the rate they’re going, it’s probably the latter)
From what I've seen online, Sora seems to be more creative and realistic than DALL-E 3. As for their similarities, Sora also uses transformer technology (meaning it’s integrated with GPT-4 technology) to understand prompts better as part of its "recaptioning" feature.
What's more is that, beyond text-to-video, it can also take pre-existing videos as input and fill in the blanks or extend the video. This opens up more possibilities for the tool, as Sora can presumably be used to create B-roll footage for films in the future.
Sora vs. DALL-E 3: Output Comparison
Since I can't tweak DALL-E's aspect ratio with Bing Create, I have no choice but to compare 1:1 images to 16:9 (or longer) videos. It shouldn't change much though, as we're only comparing their creativity and nuance, and it would be unfair to compare an older model with a different use case to a new one like Sora.
The Coral Reef
Prompt: A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.
The Man on the Clouds
Prompt: A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.
The Zen Garden
Prompt: A close up view of a glass sphere that has a zen garden within it. There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand.
Bamboo in a Petri Dish
Prompt: A petri dish with a bamboo forest growing within it that has tiny red pandas running around.
The Fluffy Creature
Prompt: 3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail. It hops along a sparkling stream, its eyes wide with wonder. The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies. The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring. The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest.
The Church
Prompt: A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.
Winter in Japan
Prompt: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.
The Old, Wise Man
Prompt: An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.
Atlantis in New York City
Prompt: New York City submerged like Atlantis. Fish, whales, sea turtles and sharks swim through the streets of New York.
The Cloud Monster
Prompt: A giant, towering cloud in the shape of a man looms over the earth. The cloud man shoots lighting bolts down to the earth.
Unfiltered Thoughts
Let's start with nuance first. First, we have to recognize that there might be a bias here since these prompts came from OpenAI themselves, meaning that they likely picked the best outputs for their showcase.
However, Sora seems to have far better prompt accuracy than DALL-E 3.
For instance, DALL-E 3 — despite consistently being the best AI image generator for nuance — missed a couple of supporting details in their prompts. The image of the old man didn't have cinematic lighting, and the fluffy creature didn't have any fairies with him. There's also the fact that DALL-E is confused with real-world physics, as demonstrated by the weird-looking petri dish images it generated.
Also, from what I've been seeing so far online, it appears that Sora took everything that's good from DALL-E and made it better, then fixed everything that's bad. It's far more creative and creates more realistic images of people. Look at the "Man on the Clouds" comparison and focus on the subject of the image. Sora's output is not as smooth and waxy as DALL-E's.
And it's not limited to portraits either. Scroll up and compare their "Winter in Japan" outputs. Notice how Sora is more realistic and less dreamy? It makes for a more accurate atmosphere. Truth be told, I'm not convinced that OpenAI didn't hire someone to take these videos and package them as "AI."
I kid, but to be honest, Sora is no laughing matter. The realism of these videos are both genuinely amazing and scary. I've heard this talking point over and over online, but this is the first time that I believe a film could be completely made using AI.
When Will Sora Be Released To The Public
Like I said, Sora is not available yet for public consumption. This begs the question: if not now, then when?
It’s anti-climactic, but the answer is simply “we don’t know yet.” Last March, Mira Murati — OpenAI’s CTO and formerly interim CEO — said that Sora should be available sometime this year. Well, it’s already August, and the last news we have is that it was just “becoming available” to redteamers for assessment.
If it will be available by this year, expect it to be at the tail-end of 2024. However, in my opinion, it’s more likely that it’s going to be public by 2025. In the meantime, more text-to-video models (Pika Labs and Runway being the most popular) have popped up since Sora’s announcement.
The Bottom Line
I haven't been this wowed by an AI model since Midjourney. And the fact that this came from out of left field from an AI company filled with controversy and uncertainty last year is just the cherry on top.
But to give credit where credit is due, OpenAI isn't the first model to attempt text-to-video. Off the top of my head, I could name Runway and Pika Labs as the (previous) frontrunners in this space.
Beyond name recognition, what separates Sora apart from them is its realism. It's not just the subject that's more true-to-life, but also it's camera movement and motion blurring.
I'm definitely excited to give Sora a go myself. Unfortunately, that might have to wait. In the meantime, you can read more about Sora in our article here.
Want to Learn Even More?
If you enjoyed this article, subscribe to our free newsletter where we share tips & tricks on how to use tech & AI to grow and optimize your business, career, and life.