Home > News > AI

Breaking News: Huge Discount on Tongyi Qianwen GPT-4 Level Large Model - 97% Off! Get 200 Million Tokens for Just $1

Thu, May 23 2024 08:26 AM EST

Tongyi Qianwen GPT-4 level large model is now available at an unprecedented discount!

Just now, Alibaba made a surprising move by announcing a price drop on 9 Tongyi large models.

Among them, the flagship model Qwen-Long, which competes with GPT-4 in performance, has seen its API input price reduced from 0.02 yuan per thousand tokens to 0.0005 yuan per thousand tokens. This means you can get 200 million tokens for just $1, equivalent to the text volume of 5 copies of the "Xinhua Dictionary," making it the king of cost-effectiveness among global large models.

For a more direct comparison -

Qwen-Long supports long text inputs of up to 10 million tokens, priced at only 1/400th compared to GPT-4. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0521%2F2fffca97j00sdtllv000wd000hs00gpm.jpg&thumbnail=660x2147483647&quality=80&type=jpg A new addition to the markdown, the recently released Qwen-max by Tongyi Thousand Questions is now on the list of discounted items, with its API input price slashed by 67% to as low as 0.02 yuan per thousand tokens.

In terms of open source, the input prices of 5 open source models including Qwen1.5-72B and Qwen1.5-110B have also been reduced by over 75%.

This move once again breaks through the lowest price across the entire web, making it a frenzy exclusive to large model enterprises and programmers during the 618 shopping festival. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0521%2F5e23653dj00sdtllv000wd000hs00hlm.jpg&thumbnail=660x2147483647&quality=80&type=jpg 1 dollar for 2 million tokens

Let's take a closer look at the specific price reductions: ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0521%2F9109a0ccj00sdtllv001sd000hs00q3m.jpg&thumbnail=660x2147483647&quality=80&type=jpg In this price reduction, a total of 9 models from the Qwen Thousand Questions series are covered, including both commercial and open-source models.

They are:

  • Qwen-Long, performance comparable to GPT-4, with API input prices reduced from 0.02 yuan per thousand tokens to 0.0005 yuan per thousand tokens, a decrease of 97%; API output prices reduced from 0.02 yuan per thousand tokens to 0.002 yuan per thousand tokens, a decrease of 90%.

  • Qwen-max, matching the performance of GPT-4-turbo on the authoritative benchmark OpenCompass, with API input prices reduced from 0.12 yuan per thousand tokens to 0.04 yuan per thousand tokens, a decrease of 67%.

  • In the open-source models of the Qwen 1.5 series that made it to the leaderboard of large model competitions, the API input prices for Qwen1.5-72B reduced from 0.02 yuan per thousand tokens to 0.005 yuan per thousand tokens, a decrease of 75%; API output prices reduced from 0.02 yuan per thousand tokens to 0.01 yuan per thousand tokens, a decrease of 50%. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0521%2F36ab9074j00sdtllv0013d000hs00hsm.jpg&thumbnail=660x2147483647&quality=80&type=jpg Compared to OpenAI's GPT series, the discounted Qwen-Thousand-Questions series is basically a steal at 90% off.

Take Qwen-Long, with the biggest price drop, for instance. It's only 1/400 the price of GPT-4, yet its performance is just as impressive. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0521%2Fd440a5c7j00sdtllv000od000hs00a0m.jpg&thumbnail=660x2147483647&quality=80&type=jpg Especially in long texts, Qwen-Long supports ultra-long context dialogues of up to 10 million tokens, which means it can easily handle around 15 million words or 15,000 pages of documents. With the synchronous document service, it can also parse and converse in various document formats such as Word, PDF, Markdown, EPUB, and MOBI.

Of note is that unlike most domestic vendors who price input and output the same, Qwen-Long has a much larger decrease in input pricing compared to output pricing.

Alibaba officials have provided an explanation for this:

Nowadays, combining long texts (papers, documents, etc.) to query large models has become one of the most common demands, so the model's input call volume is often greater than the output call volume.

Alibaba makes a big move right off the bat.

Speaking of which, this isn't the first time Alibaba Cloud has broken through industry bottom prices.

Just this February 29th, Alibaba Cloud just had a big event called "Crazy Thursday": prices for all cloud products were slashed by 20%, with the highest reduction reaching 55%.

Indeed, that's quite a significant price cut. db6dda12g00sdtllv00ivd0008w0078m.gif With such a bold move, the confidence of Alibaba Cloud, as the largest public cloud provider in China, stems from its long-term technical accumulation and economies of scale, which have established a comprehensive AI infrastructure and infrastructural technological advantage.

Behind this generous price reduction lies the emergence of the era of large model applications, with this technological dividend becoming one of the "killer features" for public cloud providers.

At the AI infrastructure level, from the chip layer to the platform layer, Alibaba Cloud has built a highly elastic AI computing scheduling system based on core technologies and products such as self-developed heterogeneous chip interconnection, high-performance network HPN7.0, high-performance storage CPFS, and artificial intelligence platform PAI.

For example, PAI supports cluster scalability of up to 100,000 cards, with super-large-scale training linear expansion efficiency reaching 96%. In large model training tasks, achieving the same effect can save over 50% of computing resources, with performance reaching a globally leading level.

In terms of inference optimization, Alibaba Cloud mainly provides three capabilities:

Firstly, high-performance optimization, including system-level inference optimization technology, as well as the ability for high-performance operators, efficient inference frameworks, and compilation optimization.

Secondly, adaptive tuning. With the diversification of AI applications, a single model is difficult to maintain optimal performance in all scenarios. Adaptive inference technology allows models to dynamically adjust the application of inference techniques and the selection of computing resources based on the characteristics of input data and constraints of the computing environment.

Thirdly, scalable deployment. The expansion and elasticity of model inference deployment resources can address the tidal phenomenon of inference services over a certain period.

Previously, Liu Weiguang, Senior Vice President of Alibaba Cloud Intelligence Group and President of the Public Cloud Business Unit, also stated that the technological dividend and economies of scale of public clouds will bring significant cost and performance advantages.

This will drive "public cloud + API to become the mainstream way for enterprises to call large models."

Mainstream route in the era of large model applications: public cloud + API

This is also the core reason why Alibaba Cloud has once again pushed the "price war" of large models to a climax.

Especially for small and medium-sized enterprises and startup teams, public cloud + API has always been seen as a cost-effective choice for developing large model applications:

Although the development of open-source models is rapidly gaining momentum, the issue of high costs still hinders private deployment. For example, using the Qwen-72B open-source model with a monthly usage of 100 million tokens, directly calling the API on Alibaba Cloud's Bailian platform costs only 600 yuan per month, while the average cost of private deployment exceeds 10,000 yuan per month.

In addition, the public cloud + API model is also conducive to multi-model calls and can provide enterprise-level data security. For example, Alibaba Cloud can provide enterprises with exclusive VPC environments, achieving computing isolation, storage isolation, network isolation, and data encryption. Currently, Alibaba Cloud has led and deeply participated in the formulation of more than 10 international and domestic technical standards related to large model security.

The openness of cloud providers can also offer developers a wider range of model and toolchain choices. For example, in addition to Tongyi Qianwen, Alibaba Cloud's Bailian platform supports hundreds of domestic and foreign large models such as the Llama series, Baichuan, and ChatGLM, while also providing a one-stop development environment for large model applications, enabling the development of a large model application in 5 minutes and the construction of an enterprise-level RAG application in 5 to 10 lines of code.

According to the Quantumbit Think Tank's "China AIGC Application Panorama Report," products based on self-built vertical large models and API access account for nearly 70% of AIGC application products.

This data also indirectly confirms the market potential of the "public cloud + API" model: in the application market, understanding business and accumulating data are the key to breaking through. Building applications on the basis of public cloud + API is a more realistic choice in terms of cost and speed of deployment. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0521%2F3b6f3859j00sdtllv009zd000hs00bqm.jpg&thumbnail=660x2147483647&quality=80&type=jpg In fact, whether it's the visible battle of prices or the deeper AI infrastructure issues, what is reflected is how as the focus of large models gradually shifts from basic models to practical applications, how platform companies reduce the barriers to using large models has become a key point of competition.

Liu Weiguang pointed out:

In summary, on one hand, for platform companies, the "price war" is actually a competition of infrastructure and technical capabilities; on the other hand, for the entire large model industry, whether applications can continue to flourish, further popularize, the entry barriers, and operating costs have become critical factors.

In this light, the recent trend of price reductions is undoubtedly good news for developers and enthusiasts looking forward to more applications of large models.

What do you think?

— End —