Home > News > Hardware

April Sees a Trifecta of AI Chip Releases! NVIDIA Unfazed by Competition

Xiao Lan Mao Thu, Apr 18 2024 08:31 AM EST

April witnessed a triple release of AI chips!

Intel kicked things off on April 9th at the Vision 2024 event with the unveiling of their new generation Gaudi 3 AI chip. On the same day, at the Cloud Next 2024 conference, Google Cloud unveiled its first-ever Arm architecture CPU designed specifically for data centers—the Google Axion. The following day, on April 11th, Meta officially announced their new in-house AI chip, MTIA.

Among these three products, Intel's new Gaudi 3 directly competes with NVIDIA's H100. In terms of AI model performance, the Gaudi 3 AI chip excels in both training and inference speeds, boasting improvements of 40% and 50%, respectively. On average, its performance has been enhanced by 50%, with a remarkable 40% increase in energy efficiency. Importantly, the Gaudi 3 AI chip comes at a lower cost compared to the H100, offering stronger performance at a more competitive price. S17ed6cdb-8a44-4054-87d1-738ae01798d6.jpg Based on official data, Gaudi 3 performs admirably even when faced with NVIDIA's H200 GPU. In certain scenarios within LLAMA-7B and LLAMA-70B, its performance is essentially on par with the H200, with differences typically within 10%. S488410d7-00da-4de4-8d1b-4898a4d9ed77.jpg Intel also introduced the production node of this chip at Vision 2024, planning to ship the Gaudi 3 AI chip to customers in the third quarter of this year. OEM manufacturers such as Lenovo, HP, Dell, and Supermicro will all use this new product to build systems.

However, Gaudi 3 faces challenges in shaking NVIDIA's position in the AI field, even with AMD's Instinct MI300.

According to Bank of America analyst Vivek Arya's research report, NVIDIA's share of the AI accelerator market in 2024 will exceed 75%, while the share of custom chips (such as Google TPU, Amazon Trainium/Inferentia accelerators, Microsoft Maia) will be 10-15%. The remaining 10-15% is for AMD, Intel, and other unlisted companies.

Although the current market share of custom chips is not high, almost all service providers are accelerating the development of various chip products, and Google is no exception. S3dd1dab3-8a52-42ce-9451-11171bad9347.jpg At the Cloud Next 2024 conference held on the 9th, Google Cloud unveiled its first Arm architecture CPU designed specifically for data centers ─ Google Axion. Compared to the latest generation of equivalent x86 architecture processors, Axion boasts a maximum performance increase of 50% and a maximum energy efficiency increase of 60%.

The Axion CPU is already supporting various Google services such as YouTube ads and Google Earth Engine. Google Cloud stated that Axion is designed based on open architecture, making it easy for customers using Arm technology to adopt without the need for application refactoring.

Google Cloud customers will be able to utilize the Axion CPU in its Compute Engine, Kubernetes Engine, Dataproc, Dataflow, Cloud Batch, and other cloud services. Google Cloud plans to open up the use of Axion CPU to customers later this year. S9cf9c337-1d87-4948-978f-ef9e8bd9444b.jpg In addition, Google Cloud has launched the next-generation AI accelerator, TPU v5p. A single TPU v5p Pod contains 8,960 chips, more than twice the previous TPU v4 Pod.

TPU v5p is primarily used for training the largest and most demanding generative AI models. Google Cloud does not directly sell Axion CPU and TPU v5p chips to the public, but provides them to enterprise customers as cloud services. This approach not only reduces dependence on external suppliers like Intel and NVIDIA but also allows better hardware optimization to meet specific business needs, providing customers with more competitive cloud computing and AI services.

Compared to Google's scale in AI computing power, Meta has relatively fewer resources. However, Meta's investment in AI is substantial. Earlier reports indicated that Meta purchased 350,000 NVIDIA H100 GPUs, each costing tens of thousands of dollars, greatly boosting its AI computing power and providing strong support for Meta's research in Artificial General Intelligence (AGI).

Meta plans to upgrade its computing infrastructure to the equivalent of "nearly 600,000 H100s in computational power." In addition to purchasing GPUs, another path is self-research. Alexis Bjorlin, Vice President of Meta's infrastructure, stated that in-house hardware allows the company to control the entire technology stack, from data center design to training frameworks, a vertical integration crucial for AI research breakthroughs.

Last May, Meta announced the first-generation AI inference accelerator, MTIA v1, and recently released the next-generation product. The new MTIA chip adopts 5nm technology, has more processing cores, increases power consumption from 25W to 90W, and raises clock frequency from 800MHz to 1.35GHz.

Meta states that the new MTIA chips are already in use in 16 data centers, with a threefold overall performance improvement compared to MTIA v1. However, Meta emphasizes that this enhancement is based on testing the performance of the "four key models" with both chip versions. S84706b2c-7c0c-4c40-a906-ffc77a4c6ea4.jpg According to Meta's official description, the design philosophy of the new generation MTIA seeks to achieve an ideal balance between computation, memory bandwidth, and memory capacity. This improvement not only optimizes the chip's performance but also makes the execution of inference tasks smoother.

Tech giants develop customized chips internally, aiming to match their specific needs while considering security and cost-effectiveness. Take NVIDIA H100, for instance; it's not only pricey but also faces production capacity issues, highlighting AI's high dependence on computational power.

Hence, there's a growing trend among tech giants to develop their own chips internally. Meta is now joining the ranks of Amazon AWS, Microsoft, and Google's parent company Alphabet, attempting to reduce reliance on expensive solutions.

However, this hasn't significantly affected the industry's massive demand for NVIDIA AI accelerators. Amid the AI boom, NVIDIA has become the world's third-largest tech company, trailing only behind Microsoft and Apple.

In the fiscal year 2024, its sales to data center operators totaled $47.5 billion, up from $15 billion the previous year. Analysts predict this figure will double in fiscal year 2025, with the company's position likely to further solidify in the coming years.

AI Transforming PC Processor Upgrade Focus

The flourishing development of AI technology is not only changing the direction of server-side chips but also profoundly influencing the evolution of personal computer (PC) processors. Since Apple's M-series chips first integrated Neural Processing Units (NPUs), other manufacturers have followed suit.

AMD has started incorporating NPUs from the Ryzen 7000 series laptop processors onwards. Intel, on the other hand, explicitly includes NPU as one of the key indicators when introducing the concept of an "AI PC."

NPUs are processors specifically optimized for artificial intelligence and machine learning scenarios. Unlike general-purpose CPUs and GPUs, NPUs are hardware-structured for efficient execution of AI-related computing tasks, such as neural network inference. Sd2b66bdd-6afd-4d62-8432-9d56be39cc7c.jpg In recent decades, the development of PC processors has primarily revolved around improving CPU performance. However, in the era of AI, the explosive growth of various AI technologies has forced chip manufacturers to put significant effort into AI capabilities. Features like Microsoft's Windows Copilot are raising the bar for AI performance on PC chips. Sdc137c2a-8620-4465-97f8-da5c0b8af02e.png To meet this demand, AMD plans to significantly enhance NPU performance on the upcoming Strix Point APU, even sacrificing some CPU and GPU cache space. Intel, on the other hand, is investing heavily in next-generation chips like Arrow Lake, Lunar Lake, and Panther Lake, aiming to boost NPU computing power to around 35 TOPS, 105 TOPS, and 140 TOPS respectively. Sdc747c47-6dca-4d86-9557-5f21818b3dab.jpg It's evident that AI processors are becoming the new focal point for chip manufacturers. NPUs are transitioning from auxiliary to core functions, potentially becoming the focal point for PC processor upgrades, replacing the traditional prioritization of CPU and GPU performance. This trend reflects how AI is fundamentally altering the technical architecture of the PC ecosystem.

The Rise of SoC

The inclusion of NPUs is also impacting chip design and manufacturing. SoCs (System-on-Chip) have become prevalent in the mobile industry, and now, this integrated design concept is beginning to permeate the design of personal computer chips.

The advantage of SoC design lies in its ability to integrate various functional units such as CPU, GPU, and NPU onto a single chip, enabling tighter integration of memory and processors to improve data transfer speeds and overall system performance. This is exemplified by the advantages showcased by Apple's M-series chips: significantly boosting memory bandwidth through tightly integrated memory. Sae5d7b95-97a5-465b-afaa-552c319c28ca.png Both Intel's Core Ultra and AMD's Ryzen 8000 series processors adopt System-on-Chip (SoC) designs, showcasing a prominent trend. These next-generation processors integrate various functional units like CPU, GPU, and NPU, with most directly connected to onboard memory, further enhancing system performance.

However, SoC design still poses limitations for desktop PCs. Desktop processors often require greater upgradability, while SoC architecture hinders future hardware upgrades. Therefore, Intel and AMD employ a dual-track strategy on laptop chips, combining SoC with traditional processor + chipset setups to cater to diverse market needs.