Meet Sora, OpenAI’s Impressive AI Video Generator


OpenAI announced an exciting new tool — Sora, an AI model that can generate hyper-realistic video from text. 

Sora can create videos up to a minute long featuring highly detailed scenes, complex camera motion and multiple characters with vibrant emotions, OpenAI wrote in a recent tweet — like this generated historical footage of California during the gold rush:

Sora builds on the tech of DALL-E, OpenAI’s text-to-image generation tool. Sora not only understands what users ask for in the text prompt, but also how those things exist in the physical world. 

Sora Only Available to Select Users 

Sora is not currently available to the general public. It’s only available to red teamers — experts in areas like misinformation, bias and hateful content — to test critical areas for harm or risk. OpenAI also granted access to a handful of visual artists, designers and filmmakers in an attempt to gain feedback on how to improve the model for creative professionals. 

OpenAI has not yet released information on when Sora will be available for general use, and there is no waitlist users can join. However, if you want to see the AI model in action, plenty of users (along with OpenAI) are sharing their experiences online. 

Sam Altman, CEO of OpenAI, also requested prompts for Sora videos on Twitter, wanting to show off the AI model in action. He followed up by telling users not to “hold back on detail or difficulty.”

Related Article: Midjourney vs. DALL-E 2 vs. Stable Diffusion. Which AI Image Generator Is Best for Marketers?

How Sora Works 

Sora is a diffusion model that builds on past research in DALL-E and GPT models. It uses the recaptioning technique from DALL-E 3, meaning it can generate highly descriptive captions for the visual training data and follow users’ text instructions more accurately. 

Similar to GPT models, Sora uses a transformer architecture, allowing it to have excellent scaling performance. This new AI model creates videos that start off looking like static noise. It then gradually transforms those videos by removing the noise over many steps.

OpenAI's Sora video generation process

While Sora generates videos from text, users can also prompt it with other inputs, such as pre-existing images or videos. For instance, users can create an image with DALL-E, then ask Sora to animate that image. 

Video-to-video editing is also an option. Users can upload videos to Sora and use the diffusion model to edit the video — like changing the video’s setting, connecting two input videos with a seamless transition or extending videos backward or forward in time to produce an infinite loop. 

And while right now Sora is the talk of the internet for its impressive realistic videos, the model is also capable of generating images of up to 2048×2048 resolution. 

Sora Still Has Weaknesses

According to OpenAI, Sora still has some imperfections, claiming it may struggle with accurately simulating the physics of a scene or understanding specific instances of cause and effect. 

One example they gave was if a person takes a bite out of a cookie, afterward the cookie may not have a bite mark. 


Source link