Home > News > AI

Kunlun Wanwei Chairman and CEO Fang Han: Tian Gong's Large Model Drives New Transformation in the AI Era

Thu, May 30 2024 07:47 AM EST

?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0528%2F3894cdc4j00se6rc6001jd000u000k0m.jpg&thumbnail=660x2147483647&quality=80&type=jpg On May 15, 2024, the conference "AI Genesis Era - 2024 Jiazi Gravity X New Trend of Technology Industry" was held at the Wanda Realm Hotel in Zhongguancun Dongsheng Science Park, Beijing, organized by Beijing Jiazi Light-Year Technology Services Co., Ltd. and co-organized by Zhongguancun Dongsheng Science Park. Dozens of professionals from the technology industry gathered to focus on cutting-edge topics in the current technology field and delve into the development trends and broad prospects of the technology industry in the AI genesis era.

During the opening ceremony on the morning of the 15th, Fang Han, Chairman and CEO of Kunlun Wanwei, delivered a speech titled "Tiangong Big Model Drives New Changes in AI Era Applications" to the audience.

Fang Han believes that in the AIGC field, such as novels, comics, music, videos, and other vertical domains, as long as one can achieve SOTA (vertical class dividends), they can attract a large number of users.

Below is an excerpt from the speech by Fang Han, Chairman and CEO of Kunlun Wanwei, as compiled and edited by "Jiazi Light-Year":

I am very grateful to the organizers for giving me the opportunity to share the latest explorations and progress of our company in AI-driven application transformation.

Since 2020, our company has been working on the research and development of large-scale Chinese pre-training models. Subsequently, in 2021, we started developing music large models. By December 2022, we successfully open-sourced the first large-scale Chinese pre-training model. In April of last year, we released Tiangong 1.0, followed by the launch of Tiangong AI Search in August. This April, we further released Tiangong 3.0, an open-source MoE large model with 400 billion parameters.

The performance of Tiangong 3.0 has surpassed some mainstream products in the market, such as xAI's Grok-1. Additionally, the capabilities of Tiangong 3.0 have also been comprehensively upgraded.

By applying the Tiangong 3.0 large model in Tiangong AI Search, what improvements can be achieved? Firstly, its multi-round search and integrated tool invocation capabilities have been significantly enhanced, meaning that users can obtain more accurate and comprehensive results when using our search service, greatly improving work efficiency. Furthermore, we are aware of users' strong demand for further inquiries and deeper understanding of results when using search. Therefore, in the multi-round search function of the Tiangong 3.0 large model, we focus on enhancing the user experience, making the search process more like a conversation with a large model. Users can deepen their understanding of the search topic through continuous questions and follow-ups, obtaining more personalized and precise answers. Additionally, after Tiangong AI Search completes the search results, we utilize the capabilities of the large model to automatically generate research outlines, knowledge graphs, and mind maps for users. These features greatly enhance the user experience, allowing them to conveniently use Tiangong AI Search to assist in their work. Furthermore, we have specially launched an Intelligent Body Square, starting to release a large number of powerful AI intelligent bodies.

1. Technological dividends and product innovation are key to user retention

Next, I would like to talk about GPT-4o. GPT-4o has just been released, showcasing a new form of interaction for super personal assistants. Previous generations of personal interactive assistants such as Siri, Google Assistant, etc., left many users dissatisfied with their interaction forms, leading to relatively low penetration rates of the previous generation of personal assistants. However, the end-to-end, voice-to-voice interaction form of GPT-4o is the ultimate interaction form we expect for the next generation of super personal assistants, providing users with a more natural and efficient way of interaction.

I believe that this interaction form will greatly increase the penetration rate of super personal assistants. However, the real determinant of the upper limit of the capabilities of super personal assistants is actually the "eyes" of the intelligent body. The "eyes" of the intelligent body, that is, its ability to perceive and understand the world, directly determines the functional boundaries of this super personal assistant. Taking the AI super APPs we can currently see as an example, they are essentially personal super assistants based on intelligent bodies. By combining new forms of high-speed voice interaction and video interaction, they provide users with a more natural and efficient personal assistant experience.

Next, let me introduce the Tiangong Music Large Model. This is currently China's first SOTA model in the field of music AIGC (AI-generated content). It has powerful functions, even in a new scene without sound, it can automatically compose, arrange, perform, and synthesize music based on text uploaded by users, even recipes and other random content, ultimately generating a complete music piece. Additionally, we have collaborated with some creators, such as Pang Bo. He wrote a simple poem, which we transformed into music using the Tiangong Music Large Model.

In terms of scoring comparison, we compared with the globally leading Suno model. The results show that we are ahead in approximately three indicators, while slightly behind in three others. Overall, our scores have reached the top global SOTA level. We are confident that the next generation of models will be able to significantly surpass them.

In terms of technical architecture, we have adopted a DiT architecture similar to Sora since last year. This architecture effectively addresses certain key issues in music generation models. Currently, our dataset contains approximately 20 million human songs. In the next generation version, we plan to increase the number of songs to around 100 million.

Next, I would like to talk about our exclusive advantages.

Firstly, we can "create songs with songs," using example audio sources to generate music. Secondly, we support various dialects of single languages globally. Especially in China, we already support dialects such as Cantonese, Sichuanese, Shanghainese, Beijing dialect, and more. This capability allows our model to more accurately understand and generate music content that fits the characteristics of various dialects. In comparison, similar products abroad have not yet achieved this function. Lastly, in generating more distinctive natural human voices, the Tiangong Music Large Model is also at the forefront globally, which is another important advantage of ours. When discussing product development and business logic, especially in the current AI field, it is undeniable that technology-driven innovation plays a crucial role. The concept of State-of-the-Art (SOTA) advantage is significant in this domain. Simply put, achieving a globally leading technological level in a specific field can attract a large user base, resulting in substantial user dividends. This is precisely the formidable strength demonstrated by companies like OpenAI in the field of large-scale text models, as they continuously attract a significant number of users by maintaining their leading position in the domain.

However, in other areas such as 3D model generation and music sound generation, OpenAI has not shown the same dominant advantage. This provides more opportunities for other startups and teams to make technological breakthroughs and attract users in these domains. Therefore, our business logic is based on the belief that in the AI-generated content field, such as novels, comics, music, and videos, as long as we can achieve the State-of-the-Art level, we can acquire a large user base.

So, how can we acquire and retain these users effectively? I believe that initially attracting users through technological innovation is crucial because having cutting-edge technology means your product can offer higher quality and more unique services, which sets you apart from competitors. Subsequently, stabilizing users requires product innovation. This involves continuous product innovation to maintain their interest and loyalty, including adding new features, optimizing user experience, and providing personalized services. Taking our Tian Gong Music large model as an example, we can offer users unique and high-quality music content through features like automatic lyric and melody generation, which competitors cannot match. We will continue innovating our products to meet the evolving needs of users, thereby stabilizing and expanding our user base.

However, how can we retain users on our platform when others start adopting similar technologies? We believe an AI User-Generated Content (UGC) platform is a crucial strategy. Through an AI UGC platform, we can encourage users to store their habits, preferences, and data on our platform, enhancing the connection between users and the product.

Relying solely on technological innovation to attract users is not enough. In terms of product form, significant innovation is also necessary. Users are not particularly concerned about the source of content consumption, whether AI-generated or manually produced; they care more about the quality and cost of the content. Therefore, meeting user demands requires optimizing algorithms, enhancing content quality, and reducing production costs.

For our AI-generated content products, significant breakthroughs in product form are indeed necessary to attract users to our platform. For instance, in the short drama market, breaking away from the traditional TV show length restrictions and compressing storylines into five-minute episodes with intense conflicts and fast-paced rhythms has attracted a large user base through innovative product forms.

The Potential of Vertical Fields in the AI Era

Regarding the question of whether the "Scaling Law" will slow down, I would like to share my perspective. From current observations, the "Scaling Law" is closely related to the volume of data in vertical fields.

In the text domain, for example, with over two thousand years of accumulated human textual knowledge, the vast amount of data enables rapid progress in large text models. However, in other fields like music, video, and 3D model generation, data accumulation is relatively limited. For instance, the global data volume in 3D model generation may not exceed 12 million entries, while the number of popular songs in the music generation field is around 100 million. Due to these data limitations, technological advancements in these fields seem noticeably constrained. Even with high-performance hardware like GPUs, the lack of data remains a significant hindrance. In the field of image generation, the difference between open-source and proprietary products is minimal, indicating that data volume restrictions remain a critical factor limiting technological development, despite advanced hardware support.

Similarly, in the video generation field, products like Sora have garnered attention, but due to data limitations, their rate of innovation is not substantial. Other products of similar caliber are rapidly emerging, such as China's Vidu and Google's recently released similar products. The narrowing gap between these products further underscores the importance of data volume in technological development.

So, is our data volume in the text domain sufficient? I can confidently say that the text domain also faces data scarcity issues. However, it is worth noting that many companies and researchers in the industry are actively utilizing synthetic data as a solution. For example, Microsoft's recent Phi-3 model synthesized full textbooks from 128 human subjects as a basis for model training. Similarly, OpenAI's GPT-4o model also leveraged a significant amount of synthetic data for training. This method of compensating for data scarcity through simulation and synthetic data plays a crucial role in sustaining and driving technological advancements.

Based on this perspective, we firmly believe that Chinese entrepreneurs still have significant opportunities in vertical fields, especially in domains with relatively limited data. To seize these opportunities, we have established six major business matrices, including AI large models, AI search, AI music, AI games, AI video, and AI social. In essence, these businesses can be categorized into two core product lines: AI search and AI UGC platforms. We firmly believe that AI search will become a super app that enhances the efficiency of end users. Our AI UGC platform aims to enable more users worldwide to express themselves better. To illustrate this point, let's consider a specific example. With operations globally covering various aspects such as content and social interactions, we have observed an interesting phenomenon: when the number of users using a particular language is less than 50 million, the local culture of that language often struggles to compete with the dominant English culture. Take Nigeria, for instance, where the budget for producing a movie can range from as low as $20,000 to $200,000. In comparison, such budgets in China may not even cover the cost of producing a short film. Therefore, Nigerian movies often struggle to find a local audience, with viewers more inclined towards the dominant Hollywood and Bollywood cultures.

However, with the introduction of AIGC technology, this scenario has started to change. It's common knowledge that in the past, taking a product photo on Taobao could cost around 200 RMB, but now, with the help of AIGC technology, this cost can be significantly reduced, almost negligible. The essence of AIGC technology not only lowers the barrier for users to create content but also reduces the cost of content creation. This will spark a wave of content creation globally, empowering marginalized cultures to better express themselves. This is one of the key motivations behind launching our AIUGC platform, and we have already seen promising results overseas.

As a company, Kunlun Wanwei firmly believes that our mission, vision, and values are dedicated to achieving general artificial intelligence, allowing everyone to better shape and express themselves.