What is OpenAI's Sora? A Look at Its Use Cases, Features, and Challenges
Generative artificial intelligence (AI) remains a hot topic of conversation, and companies are rushing to join the bandwagon. As more companies and organizations adopt AI as part of their strategies and operations, the top developers remain focused on improving and introducing new AI models. One of the latest projects is OpenAI's Sora, a groundbreaking text-to-video AI. Explore the new AI model's capabilities, features, and potential risks.
For months now, OpenAI's ChatGPT remains the face of generative AI. We can't deny the popularity and use cases of ChatGPT. ChatGPT is trained on a vast amount of data as a text-based AI model, allowing it to learn our language, make predictions, and generate answers. ChatGPT-4, its most recent upgrade, now boasts improved features like accepting image prompts, processing complex inputs, and offering more accurate responses. It's the most widely used generative AI model because it has practical use cases for diverse users, including students, employees, and ordinary internet users.
However, as we have seen, the AI industry is growing fast, with new technologies and innovations being introduced regularly. One of the latest projects introduced in the market is OpenAI's Sora, a text-to-video AI model. While ChatGPT and Bard generate texts based on prompts, and DALL-E generates images from prompts, Sora is built as a text-t0-video AI model that can produce videos up to a minute long that adhere to the user's prompts and maintain visual quality.
What is OpenAI's Sora?
OpenAI's latest project is Sora, an AI model that generates realistic and creative scenes based on the user's prompts. According to the developer's website, Sora is a text-to-video model that "can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt." OpenAI says its latest AI model boasts a deep understanding of our language, allowing it to interpret prompts and develop compelling characters that show diverse emotions.
The company also adds that the model can generate multiple shots within a single generated video, expertly retaining the character designs and visual styles. OpenAI further explains that Sora can create complex scenes with several characters, with great attention to details of the subject and background. It adds that Sora understands the prompts and can discern how they exist in the physical world. Aside from textual prompts, this model can generate a 60-minute video based on a still image, fill in missing frames in an existing video, or extend an old one.
Sora's research techniques
OpenAI shared the research techniques and technology used in building Sora on its website. Sora is a diffusion model that creates a video by starting with one that appears like static noise and slowly transforming it by removing noise. This model can generate an entire video or extend an existing one.
Like other ChatGPT models, Sora also adopts the transformer architecture, making it highly scalable. Sora represents images and videos as collections of smaller data units called patches, similar to a GPT token. Using this approach, the developers can train the diffusion transformers on different types of visual data, creating videos in different resolutions, durations, and aspect ratios.
Sora gets inspiration from the research used on GPT and DALL-E models, using the recaptioning technique from DALL-E 3, allowing it to faithfully follow the users' prompts in generating videos. In addition to textual prompts, Sora can also accurately generate videos from an existing image.
The model's ability to generate videos from textual prompts or images expands the generative AI use cases. And based on the initial reactions of those who have received the model's preview, OpenAI's Sora is surprisingly good at generating quality videos. OpenAI is generating interest in its new generative AI model by hosting reviews for media industry executives. While the initial previews are glowing, there were a few concerns regarding Sora's capabilities, particularly its security, potential risks, and threat to the creative industry.
OpenAI addresses Sora's safety concerns
OpenAi recognizes the limitations of its new generative AI model and says it's working to address safety and other areas for improvement. On its blog, the team admitted that the model may struggle to simulate complex scenes and still encounter issues when illustrating cause and effect. There are also concerns regarding safety and bias, which are now common complaints against most generative AI models.
The company acknowledges these concerns and issues, so Sora is still in testing. It shared that it is working with a group of testers who are experts in areas such as bias and misinformation and will test the product before its official launch to the public. OpenAI has also announced that it is building tools that can help detect misleading content. Developers are also working with another tool that can help tell if Sora generates the video.
Toys 'R' Us is first
OpenAI's Sora may still be in its review and testing phase, but one company has already put its confidence in the model: Toys 'R' Us. The popular US toy retailer company is the first brand to use OpenAI's Sora for its advertising. Multiple sites have reported that the revived retailer used the model to generate a one-minute video about the company.
The 66-second video was previewed at the Cannes Lions, the annual gathering of ad execs on the French Riviera. Toys 'R' Us has early access to OpenAI's tool through its creative partner, Native Foreign. The video tells the origin story of the popular company and how its founder, Charles Lazarus, steered the company with the participation of its loveable mascot, Geoffrey the Giraffe.
Reactions to the video generated through AI have been mixed; some marvel at the creative use of generative AI, while others aren't impressed. Others have raised the threat of this model to the creative industry.
Native Foreign, the video's creator, also gave it a mixed review. According to its chief executive officer, Nik Klerov, working with Sora was also "a mixed bag in terms of ease and speed." Nik shared that everything on the video is generated from textual prompts, with some shots coming together quicker and others taking several interactions.
Addressing Sora's limitations, Native Foreign shared that about a dozen people worked on the video and that they also used "corrective VFX."
So, what are your thoughts about this new product from OpenAI?