"Turing-QA" Open Source Model Dominates with 110 Billion Parameters, Ranking First Globally in Chinese Language Capabilities

Wed, May 01 2024 08:23 AM EST

Author | Coconut Mailbox | [email protected]

Whether an open-source model is hot or not depends on how quickly products in the ecosystem show their support for it.

On April 26th, Tongyi Qianwen wasted no time and open-sourced their latest 110 billion parameter model, Qwen1.5-110B, setting a new performance benchmark for open-source models. In less than 24 hours since the model's release, Ollama swiftly added support for the 110B model. This means that besides just trying out demos on the Moda Community and HuggingFace, you can now deploy it on your own computer right when the model is released. There are also cloud deployment platforms like SkyPilot, which quickly jumped on the Qwen1.5 hype train. Looking at the open-source community for large models, only Llama seems to be the one everyone wants to ride on. The Qwen series has been open-source for almost half a year now, and its position in the open-source ecosystem is gradually moving closer to Llama. On the day of its release, Qwen1.5-110B briefly claimed the top spot on the Hacker News popularity rankings. The last time there was this much buzz and discussion was when the Thousand Questions of Tongyi was first announced as open-source back in August last year. However, the focus of the discussions has shifted from "What is this?" to serious debates on "How powerful is this?" The skeptical noise has gradually faded as Qwen's capabilities have strengthened. Some netizens have expressed approval of the capabilities of Qwen1.5-110B in abstract generation and information extraction, believing it to be more effective than Llama 3. However, some friends express their affection in a rather rough manner. The recently released Qwen1.5-110B open-source model by Tongyi Qianwen is the first trillion-parameter model in the Qwen series, showing significant performance improvement compared to the 72B model in the same series. The 72B model by Tongyi Qianwen has been a community favorite and has consistently topped the charts. However, in this model, there have been no major changes to the pre-training methods, so the performance boost mainly comes from the increase in model size.

Similar to other Qwen1.5 models, Qwen1.5-110B adopts the same Transformer decoder architecture and utilizes Grouped Query Attention (GQA). It supports a context length of 32K tokens and multiple languages including English, Chinese, French, Spanish, German, Russian, Japanese, Korean, Vietnamese, Arabic, and more.

In terms of benchmark performance, most test metrics have surpassed Llama 3 70B. Apart from benchmark scores, we are more curious about the actual performance of Qwen1.5-110B and how it compares to Llama 3-70B. Let's proceed with the real-world testing.

Qwen1.5-110B VS Llama 3-70B

Let's start with a few fresh and straightforward questions:

Without any personalized prompt, the language of Qwen1.5-110B is more logical and informative, providing correct answers. In contrast, the responses from Llama 3 are even more nonsensical than those from a dimwit, containing unnecessary elaboration like stating one and a half hours is equal to 1.5 hours, as well as illusions like an electric car turning into a tricycle. Perhaps for a dimwit, this might be considered a correct answer?

Now, let's assess its Chinese comprehension abilities: The correct answer to this sentence should be: I immediately grabbed the "handle" / "handlebar".

Qwen's answer is correct, but it lacks the meaning of grabbing the handlebar. Llama 3 thinks he is very funny.

Continuing with another round of follow-up questions and answers: Qwen made a thoughtful attempt and almost answered the question correctly. Meanwhile, Llama 3 continued to be humorous. I couldn't help but laugh at Llama 3's response.

Here is a serious math problem for you:

Old Lady Wang went to the market to sell eggs. The first person bought half of the eggs in the basket plus half an egg, and the second person bought half of the remaining eggs plus half an egg. At this point, there was only one egg left in the basket. How many eggs did Old Lady Wang sell in total?

Their answers were: Qwen has a clear train of thought and provides correct answers. However, Llama 3's process is correct, but they made a mistake in solving a linear equation. In terms of problem-solving approach, Qwen demonstrates reverse thinking, which is quite clever. Llama, on the other hand, exhibits typical elementary school thinking, and I believe all elementary school students would solve the problem using Llama 3's method.

Without clearing the chat history, when communicating in Korean, Llama 3 will continue the previous habit of answering in Chinese, while Qwen will switch to answering in Korean. The Qwen1.5-110B model outperformed the Llama 3 70B in these test questions. It's not that the Llama 3 is inadequate, but in the realm of Chinese language, it's safe to say that the Qwen1.5-110B can be considered the strongest open-source model.

Carry Open Source to the End

On Hugging Face, the Qwen series models have consistently held top positions in popularity since their open-source release. With the arrival of version 1.5 and the introduction of the 72B and 110B large-parameter models, they have become one of the most dazzling open-source models alongside Llama. Especially in the field of Chinese language, they are essentially unrivaled across the entire internet.

Since August last year, the pace of open-source releases from Tongyi Qianwen has been relentless. Following the release of the Qwen 1.5 series in early February, they have introduced 10 different parameter specifications of open-source models in three months, including 8 large language models, Code series models, and MoE models. Towards the end of last year, Tongyi Qianwen also open-sourced two multimodal models, the visual understanding model Qwen-VL and the audio understanding model Qwen-Audio. If we count all the different versions deployed and debugged, there are already 76 different models of the Qwen series on HuggingFace. In comparison, Mistral and Llama only have single-digit models. Qwen is truly a hardworking star in the open-source community.

The hard work has paid off, with the Qwen series models being downloaded over 7 million times in just half a year. It's easy to stumble upon models and applications based on the Qwen series on HF and MoDa platforms.

For many developers and businesses, the Qwen series, ranging from 500 million to 110 billion parameters, offers the most ideal selection of models. Recently, the Tongyi large models have frequently announced customer cooperation information, with institutions and companies such as the National Astronomical Observatory of the Chinese Academy of Sciences, New Oriental, Tongcheng Travel, and Changan Automobile successively joining. The National Astronomical Observatory of the Chinese Academy of Sciences has developed a new generation of astronomical large model "Xingyu 3.0" based on the Tongyi Qianwen open-source model. This is the first time a large model in China has been applied to the field of astronomical observation. Recently, as model capabilities have been gradually equalized, the debate between open source and closed source has become more meaningful. Compared to closed-source models that aim for self-contained commercialization, the open-source track unfolds a different kind of imaginative "anything is possible."

Open source only makes sense when people use it and discuss it.

From this perspective, the Qwen series has become one of the most successful open-source products in China.

pre：Google Pixel 8a Smartphone Leaked Live Images, Eye-catching Color Options

next：Elon Musk left a question mark

"Turing-QA" Open Source Model Dominates with 110 Billion Parameters, Ranking First Globally in Chinese Language Capabilities

Navigation

Related Articles