Gold Penguin Logo with Text

Runway vs. Sora: An Introduction to Text-to-Video AI Generation

OpenAI is in the news once again after unveiling their newest product: Sora, an AI video generator. If pit against one of the veterans in this segment like Runway, how will Sora perform?
Updated March 27, 2024
A machine using a clapperboard for filming, generated with Midjourney
A machine using a clapperboard for filming, generated with Midjourney

For the first time in a while, an AI model that's not text-to-text or text-to-image is taking the internet by storm. Last February, OpenAI finally unveiled a project they've kept under wraps for years: Sora, a text-to-video AI generator.

While it's probably the first of its kind to reach mainstream success, it's far from the first text-to-video generator. Before ChatGPT, RunwayML is a company who's primary focus is to create an AI video generator that can be used to create movies using only textual descriptions.

As consumers, one of the most important questions we must know to ask is "Which is better?" And that's what we're asking today with Sora and Runway. In this article, I'll be going through what they are exactly, features, output quality, and potential future.

What are Runway and Sora?

As mentioned earlier, Sora is OpenAI’s latest addition to its pool of AI tools. It’s a powerful AI model that can generate realistic or creative videos based on textual descriptions. In simpler words, it allows you to turn your written ideas into visual stories. As of March 2024, Sora is yet to be publicly available. All we have now are the videos from their showcase page and some outputs from people who were given access early.

Some might think this is new technology, but I’m here to dispel that rumor. Text-to-video has been around for a while now, albeit underexposed thanks to text-to-image generators like Midjourney and DALL-E. One of the earliest text-to-video generators in the market is called Runway, which has been around since mid-2019.

Features

Let’s start with Runway since we have a better picture of what it offers. Beyond generating videos from text, Runway offers features as “tools,” which include the following and more:

  • Background Remover
  • Image-to-Video
  • Image Expander
  • Backdrop Remix: Changes the background of a video.
  • Erase and Replace: Creates variations of a selected region from a video.
  • Video-to-Video: Change video styles using written or visual descriptors.
  • Text-to-Speech: Generates audio using video.
  • 3D Capture: Creates 3D models.

We don’t know the bulk of Sora’s features yet, but what we do know is that (like DALL-E 3) it generates a better version of your original prompt using GPT-4. Like RunwayML, it can also create video versions of an input image or extend videos using AI.

Runway vs. Sora: Output Comparison

Beyond text-to-video generation, the biggest reason why so many people are interested in Sora is because of the promises of their showcase. Every single one of them could’ve been created by a real person and no one would tell the difference. But how exactly does it shape up against a generator like Runway who’s been working on their model for at least five years?

Here’s a direct comparison of their outputs using prompts from OpenAI’s Sora showcase:

The Otter

An adorable happy otter confidently stands on a surfboard wearing a yellow lifejacket, riding along turquoise tropical waters near lush tropical islands, 3D digital render art style.

Sora's Output

RunwayML's Output

The Cliffs

Drone view of waves crashing against the rugged cliffs along Big Sur’s gray point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.

Sora's Output

RunwayML's Output

The Monster

Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.

Sora's Output

RunwayML's Output

The Cloud Man

A young man in his 20s is sitting on a piece of cloud in the sky, reading a book.

Sora's Output

RunwayML's Output

The Televisions

The camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.

Sora's Output

RunwayML's Output

The Train Footage

Reflections in the window of a train traveling through the Tokyo suburbs.

Sora's Output

RunwayML's Output

The Wise Old Man

An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.

Sora's Output

RunwayML's Output

Overall Thoughts

Let me preface this section by saying that I truly believe Runway does incredibly well especially knowing that text-to-video is a relatively new segment and that it has a lot of potential. However, based on these outputs alone, it doesn’t hold a candle to Sora.

What bothers me most about Runway boils down to three things: photorealism, movement, and physics. When the subject of the video is human, it tends to create a waxy face which is, ironically, my biggest complaint about OpenAI’s DALL-E 3. Runway’s man in the clouds video is the worst offender especially when you zoom in and figure out that it’s not even rendered properly.

As for the movement, it’s just too smooth to the point of being unnatural. It’s as if someone applied motion blur to the video and put it at 1000%. However, the reason why these look so fake is that the physics make no sense. To be more specific:

  • The old man’s beard doesn’t sway in a uniform direction. 
  • The parallax effect on the man in the clouds video isn’t integrated properly.
  • The waves are flowing in different directions in both the cliffs and otter videos.
  • The windows of the train clip with each other.

Oh and there’s something so unsettling about Runway’s monster video too. It starts so innocently, then it suddenly rolls its eyes in such an unnatural way.

On the other hand, Sora doesn’t have any of these issues. If I were to be nitpicky, you could argue that the camera movement looks a bit too erratic in some instances and too smooth in others. However, this is much easier to patch than all of Runway’s issues.

That said, take this with a grain of salt. After all, these prompts and outputs are taken directly from Sora’s showcase. We can’t tell how good it actually is without trying. But for now, Sora is the clear winner of this head-to-head prompt comparison.

All Said and Done

Despite coming to this comparison as the newcomer and challenger, OpenAI's Sora single-handedly wins this head-to-head. It just goes to show that, in this fast-paced era, it doesn't matter which comes first. What matters is how effective they can be once they're there.

Runway has been around for years and yet it still looks amateurish compared to Sora's polished outputs. But then again, as I mentioned earlier, we can't take their showcase videos at face value. Because OpenAI is likely sharing their best outputs, rather than a representative of how good their product actually is.

But here's the truth: If Sora is capable of generating videos as good as this, then other AI video generators don't hold a candle to its creativity. That's what happens when the best AI company in the world decides to pool their resources towards a project. OpenAI wins, once again.

Want To Learn Even More?
If you enjoyed this article, subscribe to our free monthly newsletter
where we share tips & tricks on how to use tech & AI to grow and optimize your business, career, and life.
Written by John Angelo Yap
Hi, I'm Angelo. I'm currently an undergraduate student studying Software Engineering. Now, you might be wondering, what is a computer science student doing writing for Gold Penguin? I took up studying computer science because it was practical and because I was good at it. But, if I had the chance, I'd be writing for a career. Building worlds and adjectivizing nouns for no other reason other than they sound good. And that's why I'm here.
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
Join Our Newsletter!
If you enjoyed this article, subscribe to our newsletter where we share tips & tricks on how to make use of some incredible AI tools that you can use to grow and optimize a business
magnifiercross