Llama 3: The Unrivaled Open Source Superpower

Ke Lei Xi /Yu Yang Sat, Apr 27 2024 08:24 PM EST

Introducing Llama 3!

Today, on the Meta website, a new announcement heralded the arrival of Llama 3 in its 80 billion and 700 billion parameter versions. And the release is open-source SOTA:

According to Meta's official data, both the 8B and 70B versions of Llama surpass their competitors in respective parameter scales.

The 8B model outperforms Gemma 7B and Mistral 7B Instruct on benchmarks such as MMLU, GPQA, and HumanEval.

Meanwhile, the 70B model surpasses the closed-source sensation Claude 3 Sonnet and engages in a back-and-forth battle with Google's Gemini Pro 1.5. The release of the Hugging Face link sparked excitement once again in the open-source community.

Sharp-eyed enthusiasts quickly spotted something intriguing:

Meta has quietly unveiled the Llama 3 version with a whopping 400 billion parameters, rivaling the colossal Opus by Claude 3! The CEO of HyperWriteAI, a startup specializing in AI writing assistants, couldn't help but remark:

"We are entering a new era, a world where a GPT-4 level model is open-source and freely accessible." NVIDIA scientist Jim Fan believes that the still-in-training Llama 3 400B will serve as a watershed for open-source large models, altering the trajectory of academic research and startup development in significant ways. Major cloud and chip vendors have promptly responded. Companies like Baidu, Amazon, Intel, and Wuwen Chip Universe have all rolled out support for Llama 3.

Achieving state-of-the-art performance, but with an 8k window.

For more technical details, Meta has provided them in a blog post.

Architecturally, Llama 3 has opted for the classic decoder-only Transformer architecture, employing a tokenizer with a vocabulary of 128K tokens.

In terms of training, Meta utilized a cluster of 24,000 GPUs, resulting in a training dataset for Llama 3 comprising 15 trillion tokens, all sourced from publicly available information, with 5% being non-English data covering over 30 languages.

The data volume for Llama 3's training is seven times larger than that of Llama 2, with code content being four times more abundant than Llama 2.

Furthermore, to enhance the inference efficiency of the Llama 3 model, Meta AI has also employed the Grouped Query Attention (GQA) mechanism, training the model on sequences of 8192 tokens and using masking to ensure self-attention does not cross document boundaries. As a result, whether in the 8B or 70B version, Llama 3 has made significant leaps compared to the previous generation Llama 2, which was of a similar scale.

In both the 8B and 70B parameter-scale models released so far, Llama 3 has emerged as the new state-of-the-art (SOTA) model.

In various capabilities such as language (MMLU), knowledge (GPQA), programming (HumanEval), and mathematics (GSM-8K, MATH), Llama 3 is nearly unparalleled among models of similar scale. In addition to these standard datasets, Meta AI also evaluated the performance of Llama 3 in real-world scenarios and developed a high-quality test dataset for this purpose.

This test set comprises 1800 data points, covering 12 key use cases such as coding, reasoning, writing, summarization, and others, and it is kept confidential for the development team.

As a result, Llama 3 not only significantly outperformed Llama 2 but also surpassed well-known models such as Claude 3 Sonnet, Mistral Medium, and GPT-3.5. On higher-tier and more challenging datasets like AGIEval, BIG-Bench, and ARC-Challenge, Llama 3's performance is equally impressive.

The 8B version outperforms Mistral and Gemma on these tasks, while the 70B version surpasses Gemini Pro and the MoE architecture of Mixtral, securing the corresponding scale's SOTA. However, the downside is that Llama 3's context window is only 8k, which seems to be stuck in the previous generation compared to the large models today with context windows in the tens or hundreds of millions. But there's no need to worry excessively. Matt Shumer remains optimistic about this, believing that with the efforts of the open-source community, window length will soon expand. Llama Welcomes Official Web Version

Currently, both the base and Instruct versions of Llama 3 are available for download on Hugging Face.

Additionally, cloud service platforms such as Microsoft Azure, Google Cloud, Amazon AWS, NVIDIA NIM, and others will gradually roll out Llama 3.

Meta also indicates that Llama 3 will receive hardware platform support from multiple manufacturers, including Intel, NVIDIA, AMD, Qualcomm, and others. It's worth mentioning that along with the base model, an official web version based on Llama 3 was released, called Meta AI. The platform currently offers two main functions: conversation and drawing. You can start using the conversation feature right away without the need to register or log in. However, to use the drawing function, you'll need to log in to your account first. However, currently, the platform does not support Chinese and has not yet launched features such as text uploading. In terms of code, the platform can also run some simple Python programs, but it seems to only output text. Tasks involving graphics such as drawing are not supported. Overall, this web version seems rather rudimentary, but there's potential for some updates down the line.

One More Thing

A little anecdote is that, actually a few hours before Meta's official announcement, Microsoft's Azure marketplace had already leaked the news about the Llama 3 8B Instruct version. The price list for Llama 3 on the open-source model machine learning online platform Replicate was quickly dug up by netizens. But soon, those "rumors" were all 404'd.

Luckily, the misunderstanding was cleared up, and the officials didn't drag their feet. For those interested in open-source large models, it's time to get busy (doge).

pre：Wuling Responds to Zhou Hongyi's Comment on Small Space, Posts Video of 23 People Inside Car

next：Woman Buys Used BMW Promised Accident-Free: Discovers Over 100 Parts Replaced Upon Resale

Llama 3: The Unrivaled Open Source Superpower

Navigation

Related Articles