Following the debut of Watrix's multimedia mega-model "Tianmu" just under a month ago, the chatter around video-generation mega-models has reached a crescendo with OpenAI's release of Densha, a video-generation mega-model, sparking intense discussions.
With just a snippet of text input, Densha can autonomously craft a full minute of high-definition video. What sets it apart is its ability to not only grasp textual prompts accurately but also to depict coherent, lifelike scenes adhering to the laws of physics. Whether it's the movement of a pirate ship navigating through liquid, the interplay of light and shadow in a neon-lit environment, or the smooth texture of fur on animals, Densha's output achieves a seamless, natural flow, with transitions between shots feeling remarkably organic. Screenshot showcasing Sora's website functionalities.
In the week following its release, Sora not only dominated the headlines of major tech media platforms but also drew attention from industry giants like Elon Musk, Runway CEO Valenzuela, Stability CEO Mostak, Meta's Chief AI Scientist Yang Likun, and others. Zhou Hongyi, the chairman of 360, even stated that "The birth of Sora means that the realization of AGI may be shortened from ten years to one or two years."
Overall, Sora's global discussion and attention stem from two major disruptive potentials: first, significantly lowering the barrier to video production, enabling ordinary people to easily create complex videos using natural language; second, OpenAI specifically noted that Sora is the foundation for AI to "understand and simulate the real world," with the potential to develop into a "world simulator."
Amidst the flood of discussions, the industry debates whether Sora has the capability to become a "world simulator." However, when viewed as a content creation tool, Sora's ability to greatly enhance video creation efficiency has seen little refutation.
Whether Sora is indeed a "world simulator" remains unknown. Nevertheless, under global scrutiny, Sora has led to a significant leap in AI video generation capabilities. Just as Wu Taibing, the chairman of Wanxing Technologies, asserted at the "Tianmu" large model launch event, "Large models are accelerating from the era of text and images 1.0 into the era of audio and video multimedia 2.0."
The "GPT Moment" in Video Generation Field
In fact, just as ChatGPT was not the first AI conversational program in history, Sora is not the first text-based video large model in the industry. However, this does not diminish Sora's "disruptive" advantages in aspects such as video duration, image fidelity, scene realism, and motion coherence, thus propelling the video generation field into the "GPT moment."
The arrival of this moment signifies a qualitative change and democratization of video generation technology, directly bringing revolutionary changes to video content creation, as well as industries such as gaming, education, entertainment, and advertising.
Firstly, a common view in the industry is that Sora will first impact the short video industry, both by eliminating homogenized short video creators and favoring high-quality content creation.
With the reduction of technical barriers, everyone can easily create a 60-second high-quality short video, leading to a situation where creators who rely on simple shooting and video editing to gain attention are no longer competitive. As such short videos become oversaturated, the value of content based on creativity, personalization, and depth will once again become apparent. From the perspective of content production costs, high-quality creators can also break through previous limitations in resources and funding, freeing their imagination and focusing more energy on conceiving high-quality content.
Similar to the short video industry, the film industry is also poised to become more vibrant due to the qualitative change in AI video generation technology.
From the first AIGC animated short film "Dog and Boy" to Runway's "Transient Entire Universe," and now to the world's first AI animated feature film "The Foolish Old Man Moves Mountains," the film industry's application of AI technology is no longer new. Just as CGI forever changed Hollywood, AI video generation technology represented by Sora will also serve as a powerful efficiency tool, helping film production companies reduce production costs and cycles, thereby expanding new frontiers for cinematic art. Screenshot from "Dog and Boy" Video
At the same time, the enhancement of video creation efficiency will directly impact various industries such as gaming, education, and entertainment marketing in terms of content production. For instance, AI video generation technology can save time and costs in the trial and error phase of game style exploration, character prototyping, and visualization. It can assist educators in teaching complex knowledge concepts in a more vivid manner by combining textual information with visual content. Brands and advertisers can also produce more visually appealing advertisements efficiently, enabling more imaginative creative videos to be implemented in marketing campaigns.
However, the efficiency revolution that Sora is about to unleash has also sparked a series of controversies and concerns regarding video misinformation, copyright protection, personal privacy, and data security. Shortly before Sora's release, a large-scale AI "deepfake" fraud case was reported by the media in Hong Kong. This case utilized AI technology to achieve "multi-face swapping," increasing the difficulty for ordinary individuals to discern the authenticity of videos, thereby providing opportunities for malicious actors. Therefore, when we return to reality and consider aspects such as application regulation, risk control, and ethical considerations, Sora still has a long way to go.
"How far is the Chinese version of Sora?"
After Sora's debut, the Chinese tech scene once again stirred up "AI anxiety": How significant is the gap between China and the United States in the field of AI? Why wasn't China the first to develop something like Sora? An AI algorithm engineer from a certain tech giant even expressed a widely circulated pessimistic sentiment on Zhihu: "I'm somewhat afraid that products from tech giants will pass by like roaring trains, while what I do is like roadside weeds. In this era where technological progress is like a revolving lantern, not a trace is left behind."
However, as pointed out by Zhu Wei, Vice President of Wanxing Technology, "The opportunities brought by Sora outweigh the challenges." Sora may be a passing train, but it could also be the "locomotive" paving the way for the global development of video generation models.
Objectively speaking, there is indeed a considerable gap between Chinese tech companies (or all tech companies excluding OpenAI) and OpenAI, but there is still a chance to catch up. In terms of underlying technology, algorithms, data, and computing power will be the three major areas for China to catch up in the development of large-scale video generation models.
Zhu Wei stated, "Firstly, there is the development of large-scale model algorithms. Even if others release open-source large-scale models, many detailed algorithms still need to be researched by ourselves to catch up. Secondly, there is data. Currently, there is not much data focused on China, and it will take a long time to clean and accumulate Chinese big data. Considering that Sora still lacks understanding of Chinese elements and Chinese language, this may become an important opportunity for domestic video large models to overtake on the inside track. Thirdly, there is computing power. For videos, training requires billions or even tens of billions of data. With the blockade of Chinese computing power, especially training computing power by the United States, localizing computing power should greatly help the rapid development of domestic large-scale model research."
Furthermore, from an application perspective, Sora has not yet formally penetrated into true vertical application fields. Chinese companies still have the opportunity to create blockbuster applications in vertical markets based on their deep understanding of local user needs.
Taking AIGC software company Wanxing Technology (300624.SZ), which focuses on "applications," as an example. At the end of January this year, relying on the accumulation of over 1.5 billion user behaviors and 10 billion high-quality localized audio and video data, Wanxing Technology launched the first domestic multimedia large-scale model "Wanxing Tianmu." Different from Sora, which is based on visual data, Tianmu focuses on multimedia creation in specific verticals, inherently carrying the mission of "application." It targets more segmented markets such as general knowledge, marketing, and entertainment. Once launched, its capabilities quickly scaled for commercial use overseas. Wanxing's "Skyscreen" showcases its capabilities with the large model.
"Sora gives us (Wanxing Technology) more confidence in the development of our business," said Zhu Wei. "Initially, we needed some refinement in the quality and length of our Wensheng videos. Now, with Sora, we can expedite this process. Whether it's drawing inspiration from its technical solutions or its training programs, it will accelerate the development of our large models, enabling us to better address the issues faced by our niche users and deliver greater value to our customers."
In terms of breakthroughs in AI video generation effects, Sora has indeed bridged a technological gap that seemed insurmountable in the short term. However, when viewed from the perspective of real-world application, Sora still grapples with immature issues such as semantic understanding biases, application stability, and generating content that adheres to physical laws. Considering practical application factors like computational power, data volume, and energy consumption, the process of scaling up Sora's application may take longer than we anticipate.
By standing in the gap of technological development history, we can take a more rational view of Sora. Conversely, Chinese companies embracing long-term AI strategies may find untapped opportunities in the convergence of technology and business.