Home > News > AI

Alibaba Cloud's large-scale models are now offered at unprecedentedly low prices, ushering in a frenzy beyond Moore's Law.

Wed, May 22 2024 08:28 AM EST
?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0521%2F437dbc84j00sdu7fb002gd200u000k1g00id00c9.jpg&thumbnail=660x2147483647&quality=80&type=jpg Making cheaper large models the foundation for accelerating innovation across society.

Author: Ray

Editor: Jingyu

On May 21st, at the Wuhan Optics Valley Marriott, the Alibaba Cloud "AI Leader Summit."

Upon arrival, attendees noticed something unusual: a series of summits that had taken place in Hangzhou, Beijing, and Xi'an over the past month. This time, Alibaba Cloud's Senior Vice President and President of the Public Cloud Business Unit, Liu Weiguang, was personally present - Alibaba Cloud seemed poised for something big.

Sure enough, just over ten minutes into the event, Liu Weiguang brought up the age-old topic in the hardware field, "Moore's Law."

The ultimate tribute for geeks is "show me code"; for business leaders, it's turning their views into industry laws, with Moore's Law being one of them.

In 1965, a co-founder of Intel stated, "The density of transistors on a unit area doubles every 18 months," which has guided the semiconductor industry's development for over 60 years.

However, with Moore's passing, Moore's Law has gradually faded in the microcosm of transistor technology. The original limitations of the von Neumann architecture and quantum tunneling have become obstacles.

Should the perspective shift to the macro level or become user-oriented? This has been a question Alibaba Cloud has pondered for many years.

From a user's perspective, transistor density is secondary; the fundamental significance of Moore's Law lies in users being able to buy double the computing power for the same price every 18 months.

However, achieving increasingly cheaper computing power involves more than just transistor density.

Perhaps public cloud and AI are Alibaba Cloud's attempt to surpass Moore's Law.

01

The cost of AI inference

is exponentially decreasing

"I believe that the cost of AI inference needs to decrease by tenfold or even a hundredfold each year to truly drive the explosion of AI applications across various industries."

As Liu Weiguang spoke, the audience below exchanged glances: currently, 80% of tech companies in China, including half of the large model companies, are running on Alibaba Cloud. A hundredfold decrease annually means that Alibaba Cloud aims to drastically reduce the cost of using domestic large models.

Shortly after, Alibaba Cloud announced that nine core commercial and open-source models would all be reduced in price, and they are now available for use through the Alibaba Cloud BaiLian website. Among them, Alibaba Cloud's long-text model Qwen-Long, comparable to GPT-4, has been reduced to 1/400th of the price of GPT-4, making it the lowest globally.

Qwen-Long is an enhanced version of the Thousand Questions model for long texts, suitable for scenarios with the highest token consumption in long texts, with a maximum context length of 10 million, capable of processing around 15 million words or 15,000 pages of documents.

Following this announcement, the API input price for Qwen-Long decreased from 0.02 yuan per thousand tokens to 0.0005 yuan per thousand tokens, a direct 97% decrease. This means that with 1 yuan, you can buy 2 million tokens, equivalent to the text volume of 5 copies of the Xinhua Dictionary. In comparison, the input prices per thousand tokens for foreign models such as GPT-4, Gemini1.5 Pro, Claude 3 Sonnet, and Ernie-4.0 are 0.22 yuan, 0.025 yuan, 0.022 yuan, and 0.12 yuan respectively, all significantly higher than Qwen-Long. Additionally, the output price for Qwen-Long decreased from 0.02 yuan per thousand tokens to 0.002 yuan per thousand tokens, a 90% decrease.

The flagship large model Qwen-Max, which matches the performance of GPT-4-Turbo on the authoritative benchmark OpenCompass, also participated in this price reduction, with the API input price reduced to 0.04 yuan per thousand tokens, a 67% decrease.

For other open-source models like Qwen1.5-72B and Qwen1.5-110B, the input prices have also been reduced by over 75%.

From the widest application range to the best performance, Alibaba Cloud has presented the most significant products, demonstrating their determination.

02

Alibaba Cloud's determination: to become the infrastructure

for the explosion of large models

Why the price reduction?

The answer lies in the theme of this event: "Making AI applications easier worldwide," becoming the infrastructure for the era of large models.

According to several insiders, the positioning of AI at Alibaba Cloud has risen to an unprecedented strategic height. In multiple internal meetings, Alibaba Cloud executives have compared AI in 2024 to short videos in 2017 and mobile payments in 2012. From 2012 to 2013, during the transition from 3G to 4G, China's mobile payments grew by 800% in two years; from 2017 to 2018, the explosion of various short videos led to an 8.5-fold growth in the entire Chinese short video industry.

The speed of AI's explosion in the future will far exceed everyone's imagination - currently, the daily API call volume for all large model companies in China does not exceed one billion, but by the end of the year, this figure will reach 10 billion, a hundredfold increase.

To achieve the goal of being an "AI infrastructure," Alibaba Cloud has set four objectives: YooYi, a powerful provider of leading global model services. Recently, OpenAI's Sam Altman retweeted the Chatbot Arena list to confirm the capabilities of GPT-4o, where three YooYi models representing China's model strength ranked among the top 20 globally. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0521%2F0c1a99d0j00sdu7fc002ed200u000u0g00id00id.jpg&thumbnail=660x2147483647&quality=80&type=jpg Sam Altman shared the GPT-4o test results on X.

Firstly, there is the ability to build the largest inference cluster domestically. With 30 public cloud regions globally, totaling 89 availability zones, it achieves a 4X increase in inference throughput, 8X savings in computing resources, and rapid dynamic scaling in minutes.

Secondly, there is a commitment to openness and continuous open-source contributions. As the first cloud provider to propose the concept of Model as a Service (MaaS), aiming to better serve models, Alibaba Cloud's Bai Lian Model Service platform can now access numerous third-party and vertical domain large models. The MoTa community is China's largest model community, with over 4500 models and more than 5 million users. Continuously open-sourcing the entire family of full-mode full-size models, with a total of 7 million open-source model downloads.

Thirdly, there is confidence in meeting the explosion of AI applications. Financial data shows that Alibaba Cloud's AI-related revenue has grown by a three-digit percentage compared to last year.

In contrast to Alibaba Cloud's "Four Abilities," the industry faces the challenge of "Two Expensives."

On one hand, the cost of human resources for development and fine-tuning is expensive. The scarcity of talent for large models is a common industry consensus. For a certain consumer electronics giant, the average after-tax cost of talent for large model research and development is 1 million RMB. Even for ordinary enterprises, skipping the basic model step and using open-source large models for fine-tuning incurs high costs.

On the other hand, hardware costs are "Silicon Valley expensive." A startup embracing large models may need to start by purchasing 50 GPU servers, or even 100, 200, or larger clusters. For training a 100B-scale LLM benchmarked against global top standards, using the Falcon series model as an example, it requires 3.5 trillion tokens of resources, approximately a cluster of 4096 A100s, and a training duration of about 70 days. The price of one A100 is usually around ten thousand US dollars or more. Building a cluster involves not only GPU procurement but also software deployment, network costs, electricity bills, maintenance costs, and continuous trial and error costs, far beyond what ordinary enterprises can afford.

Therefore, Alibaba Cloud's core goal is to use its "Four Abilities" to address the "Two Expensives" encountered in the AI explosion.

Factors determining the value of APIs:

Technical sophistication + Inclusive capability

It is not difficult to see that in this event, Alibaba Cloud's focus is twofold: emphasizing the value of APIs and the ability to lower prices for inclusivity.

APIs are easy to understand. In the development of the Internet, there is no need to reinvent the wheel, and not everyone needs to start from scratch with basic models in the development of large models. By combining APIs with public clouds, not only can the human resource costs of developing large models be reduced, but it is also a necessary path to make large models more accessible.

On one hand, the inherent openness of cloud providers can offer developers a rich set of models and toolchains that private deployments lack. The Bai Lian platform on Alibaba Cloud brings together hundreds of high-quality models from China and abroad, including Tongyi, Baichuan, ChatGLM, and Llama series, with built-in tools for customizing large models and application development. Developers can easily test and compare different models, develop exclusive large models, and effortlessly build applications like RAG. From model selection, tuning, application development to external services, everything can be done in one place.

On the other hand, it is easier to call multiple models on the cloud and provide enterprise-level data security. Alibaba Cloud can provide each enterprise with a dedicated VPC environment, ensuring computing isolation, storage isolation, network isolation, data encryption, and full data security. Currently, Alibaba Cloud has led or deeply participated in the formulation of more than 10 international and domestic technical standards related to large model security.

The logic behind the price reduction is the ability to be inclusive.

In the PC era, the development of the industry was driven by Moore's Law. Andy represents Intel selling CPUs, while Bill represents Microsoft, making Windows operating systems. The combination of the two, as software represented by operating systems grows, users must regularly update and iterate new hardware.

Similarly, in the AI era, as the development of large models progresses, the demand for computing power such as cloud computing increases. "The computing power required for generative AI is not just brought by CPUs and simple strategies, but more by large-scale clusters, large-scale GPU clusters providing the foundation of computing power, including the improvement of network storage capabilities. Therefore, generative AI, whether for inference or training, is increasingly moving towards the cloud, igniting the explosion of public clouds once again," said Liu Weiguang.

The confidence behind this unprecedented price reduction lies in the characteristic of cloud computing surpassing Moore's Law.

In the past, Moore's Law doubled the density of transistors on a chip every 18 months. With the same computing power, users' costs halved every 18 months.

Today, Moore's Law for transistors has become ineffective, but the technical dividends and economies of scale of public clouds continue to optimize computing costs. An example is that over the past decade, Alibaba Cloud has reduced computing costs by 80% and storage costs by 90%.

Specifically in the AI direction, Alibaba Cloud has built an extremely elastic AI computing power scheduling system based on self-developed heterogeneous chip interconnection, high-performance network HPN7.0, high-performance storage CPFS, and AI platform PAI. Combined with the Bai Lian distributed inference acceleration engine, it significantly reduces model inference costs and speeds up model inference. Therefore, even with the same open-source models, the cost of calling them on public cloud platforms is much lower than private deployments. Taking the example of using the Qwen-72B open-source model with a monthly usage of 1 billion tokens, calling the API directly on Alibaba Cloud's Bailian platform costs only 600 RMB per month, while the average cost of private deployment exceeds 10,000 RMB per month.

With this, the explosion of large models has just begun, but laying the groundwork for how to fight this prolonged battle, Alibaba Cloud's infrastructure has quietly been completed.

Alibaba Cloud's Chain Reaction

In fact, the aggressive price reduction by Alibaba Cloud is not the end of the story.

Just today, following the significant price reduction of large models by Alibaba Cloud in the morning, Baidu announced the free availability of two lightweight flagship large models in the afternoon. Although these models' capabilities do not fully match the ones with reduced prices on Alibaba Cloud, it seems to be a rapid response in terms of momentum. Undoubtedly, this wave of accessibility to large model capabilities initiated by Alibaba Cloud will trigger further industry chain reactions.

For application innovation, the decrease in costs of large model APIs, even approaching free trials, is a good thing for sparking application innovation. In the past, China was known for its infrastructure mania in the physical economy, with the saying "to get rich, build roads first" becoming a well-known common sense. The underlying economic law behind this is a delicate seesaw structure between the price of infrastructure and the total amount of social innovation:

Only when the price of infrastructure decreases, innovation will spread like mature dandelions, leveraging trends to sow seeds far and wide. This was the case in the physical economy in the past, and it is hoped that the AI era will follow suit.

*Image source: Visual China

This article is an original piece by Geek Park. For reprints, please contact Geek Park's WeChat account geekparkGO.

Last week was a crazy week in the AI industry. OpenAI and Google successively released GPT-4o, Project Astra, and other "AI toolkits"; Byte's "Dou Bao" large model family and Tencent's Hanyuan large model collective made their debut. Whether it's international big model star companies or internet giants, they are all sprinting towards AI. However, for small and medium-sized companies, solutions that can deploy AI with just one click, like "moving to the cloud" back in the day, are still rare.

On Wednesday, May 22nd, at 20:00, Geek Park's founder & CEO, Zhang Peng, will have a conversation with the founder & CEO of Matrix Origin, Wang Long, discussing how traditional IDCs are transitioning to AIDCs and why the path to AGI cannot avoid "data + computing power."

Feel free to reserve a spot for the live broadcast~ ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0521%2Fa4921254j00sdu7fd003gd200u0005ig00id003d.jpg&thumbnail=660x2147483647&quality=80&type=jpg