Meet Sora, OpenAI’s Impressive AI Video Generator

digiflowz Business Blog Post

[ad_1]

OpenAI announced an exciting new tool — Sora, an AI model that can generate hyper-realistic video from text.

Sora can create videos up to a minute long featuring highly detailed scenes, complex camera motion and multiple characters with vibrant emotions, OpenAI wrote in a recent tweet — like this generated historical footage of California during the gold rush:

OpenAI shares a hyper-realistic video from Sora, their new text-to-video AI model. The prompt: Historical footage of California during the gold rush. pic.twitter.com/ZRzWm5BHQH

— Michelle Hawley (@msophiahawley) February 16, 2024

Sora builds on the tech of DALL-E, OpenAI’s text-to-image generation tool. Sora not only understands what users ask for in the text prompt, but also how those things exist in the physical world.

Sora Only Available to Select Users

Sora is not currently available to the general public. It’s only available to red teamers — experts in areas like misinformation, bias and hateful content — to test critical areas for harm or risk. OpenAI also granted access to a handful of visual artists, designers and filmmakers in an attempt to gain feedback on how to improve the model for creative professionals.

OpenAI has not yet released information on when Sora will be available for general use, and there is no waitlist users can join. However, if you want to see the AI model in action, plenty of users (along with OpenAI) are sharing their experiences online.

Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” pic.twitter.com/0JzpwPUGPB

— OpenAI (@OpenAI) February 15, 2024

Sam Altman, CEO of OpenAI, also requested prompts for Sora videos on Twitter, wanting to show off the AI model in action. He followed up by telling users not to “hold back on detail or difficulty.”

https://t.co/uCuhUPv51N pic.twitter.com/nej4TIwgaP

— Sam Altman (@sama) February 15, 2024

Related Article: Midjourney vs. DALL-E 2 vs. Stable Diffusion. Which AI Image Generator Is Best for Marketers?

How Sora Works

Sora is a diffusion model that builds on past research in DALL-E and GPT models. It uses the recaptioning technique from DALL-E 3, meaning it can generate highly descriptive captions for the visual training data and follow users’ text instructions more accurately.

Similar to GPT models, Sora uses a transformer architecture, allowing it to have excellent scaling performance. This new AI model creates videos that start off looking like static noise. It then gradually transforms those videos by removing the noise over many steps.

While Sora generates videos from text, users can also prompt it with other inputs, such as pre-existing images or videos. For instance, users can create an image with DALL-E, then ask Sora to animate that image.

Video-to-video editing is also an option. Users can upload videos to Sora and use the diffusion model to edit the video — like changing the video’s setting, connecting two input videos with a seamless transition or extending videos backward or forward in time to produce an infinite loop.

And while right now Sora is the talk of the internet for its impressive realistic videos, the model is also capable of generating images of up to 2048×2048 resolution.

Sora Still Has Weaknesses

According to OpenAI, Sora still has some imperfections, claiming it may struggle with accurately simulating the physics of a scene or understanding specific instances of cause and effect.

One example they gave was if a person takes a bite out of a cookie, afterward the cookie may not have a bite mark.

Many of the videos shared online have these tell-tale AI signs, like this video shared by Altman on Twitter where a woman giving a cooking demonstration has a magically disappearing spoon.

https://t.co/rmk9zI0oqO pic.twitter.com/WanFKOzdIw

— Sam Altman (@sama) February 15, 2024

Or this video showing a pack of coyotes that seem to merge and unmerge from each other. OpenAI commented on the video, “Animals or people can spontaneously appear, especially in scenes containing many entities.”

Another Sora video showing one of the AI models weaknesses.

Prompt: Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing. pic.twitter.com/Jr0FJfzWeg

— Michelle Hawley (@msophiahawley) February 16, 2024

AI Video Safety and Concerns

OpenAI is building tools to help detect misleading content, such as a detection classifier that can tell when a video was generated by Sora.

In addition to developing new tools and techniques, the company also plans to utilize existing safety methods built for DALL-E 3 — like prompt transformations, which rewrite submitted text to comply with guidelines (such as not using public figure names) and blocklists, which can block certain images from being outputted.

OpenAI also plans to work with global policymakers, educators and artists to understand concerns and identify positive use cases for the new technology. However, the company said, “Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it.”

With easy access to AI-generated content, many are concerned about the rise of misinformation, going so far as to say AI is a potential threat to democracy.

In a PBS NewsHour interview, Lauren Barrón-López, White House correspondent, said that while AI has been used before in past elections, “AI generative tools are now more widely available, and they’re much more sophisticated.” And while some companies have decided to label AI content, they aren’t outright banning it, with Twitter not even agreeing to label AI generative content that might be fake.

Political interference is not the only concern when it comes to generative AI video, either. Many public figures have been caught in the crossfire of fake videos, with notable examples including an AI version of Tom Hanks promoting dental insurance, an AI-generated stand-up special of the late comedian George Carlin and even sexually explicit AI content of music superstar Taylor Swift.

Related Article: What Brands Need to Know About AI Image Generation Models

Sora Not the Only AI Video Generator

Sora is not the first AI model that can produce video from text prompts — but it may be the most impressive.

AI video generators first started cropping up in late 2022, like Meta’s Make-A-Video, Google’s Lumiere and Runway’s Gen-1 model. However, most of these models produce low-quality and glitchy results that are only a few seconds long.

OpenAI’s Sora, on the other hand, can produce videos up to a minute long, with the added ability of creating transition videos that can stitch multiple videos together seamlessly. While Sora can produce high-quality animated content, many of its videos are full of rich detail that make them easy to mistake for real-life content, especially for those not used to looking for the tell-tale signs of AI.

Is Sora the Path to AGI?

Artificial general intelligence (AGI), a type of intelligence where a machine can understand, learn and think like a human, is still only a hypothetical. But that’s not stopping companies like OpenAI, Microsoft, Meta and others from trying to make it a reality.

Ultimately, said OpenAI, Sora serves as the foundation for models that can understand and stimulate the real world — “a capability we believe will be an important milestone for achieving AGI.”

Altman tweeted after the release of Sora that OpenIA is “extremely focused on making AGI.”

Have a tip to share with our editorial team? Drop us a line: