Home > News > AI

Tripling Energy Efficiency in Three Years! AMD Unveils Bold Chip Plan to Challenge NVIDIA

Mon, May 27 2024 08:12 AM EST

Image Source: AI Revolution

Recently, at the ITF World 2024 conference held in Belgium, AMD's Chairman and CEO, Su Zifeng, was awarded the IMEC Innovation Prize in recognition of his achievements in industry innovation and leadership. The well-known Gordon Moore (who proposed the famous Moore's Law) and Bill Gates have both been recipients of this prestigious award. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0527%2F33d3ae94j00se4h01002gd000xc00irm.jpg&thumbnail=660x2147483647&quality=80&type=jpg Image Source: tomshardware

During the award acceptance speech, Su Zifeng revealed AMD's ambitious plan for the next three years: AMD is striving to increase computing efficiency by 30 times compared to 2020 by 2025. Following this plan, there is a goal to further increase efficiency to 100 times that of 2020 by 2027. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0527%2F95514b8ej00se4h01000sd000go009dm.jpg&thumbnail=660x2147483647&quality=80&type=jpg Image Source: AMD

Computational efficiency, in simple terms, refers to the effectiveness of a computer in utilizing energy when performing computational tasks. While computational efficiency may not seem prominent compared to various performance parameters such as computing power and core count, it is actually a reflection of core performance, power management, process technology, and other technical aspects.

Higher computational efficiency allows a computer system to operate with greater efficiency. As early as 2014, AMD set a plan called "25x20," aiming to increase the energy efficiency of AMD processors, graphics cards, and other products by 25 times over a six-year period.

The outcome of this plan is the well-known Zen and RDNA architectures we are familiar with today. With the outstanding performance of these two architectures, AMD not only achieved its set goal in 2020 but also exceeded it with a 31.77-fold increase in energy efficiency.

Why does AMD consistently prioritize the enhancement of computational efficiency as one of its core goals? Firstly, let's consider the current AI computing demands and explore the benefits of improving computational efficiency.

Racing Towards Supercomputing Centers

It is widely recognized that AI has become the most crucial and extensive demand in the semiconductor industry, propelling the semiconductor industry forward. Recently, as a leader in the AI era, the semiconductor company NVIDIA's market value briefly reached $2.62 trillion, surpassing the total market value of all listed companies in Germany.

The sole reason for NVIDIA's skyrocketing market value is its dominant strength in the field of AI computing hardware. Currently, the world's top professional computing cards are from NVIDIA. In addition to mainstream chips like H100 and H200, NVIDIA recently introduced GB100 and GB200, where the computing power of a single chip is equivalent to that of a supercomputer from the past. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0527%2F439e8f63j00se4h01002pd000go009dm.jpg&thumbnail=660x2147483647&quality=80&type=jpg Image Source: NVIDIA

Of course, great computing power comes at a cost. The TDP of the H100 reaches up to 700W, while the latest GB200 goes as high as 2700W. In NVIDIA's official solution, a single GB200 NVL72 server can accommodate up to 36 GB200 chips, with each chip consuming a maximum of 97200W, not including the power consumption of other accompanying hardware.

And this is just the beginning. A supercomputing center is often composed of multiple server units. Amazon previously announced a plan to purchase 20,000 GB200s to build a new server cluster. Leading the forefront of AI research, Microsoft and OpenAI recently unveiled an ambitious project called "Stargate."

Reportedly, the project is divided into five phases with the goal of constructing the largest supercomputing center in human history. The total investment for the entire project is estimated to reach $115 billion, requiring billions of watts of electricity once completed. When this 'Stargate' is finished, its power consumption alone would rank it in the top 20 among major cities worldwide, considering it is just one of many computing centers. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0527%2F17845192j00se4h01002ld000zk00k0m.jpg&thumbnail=660x2147483647&quality=80&type=jpg Actually, as early as last year, multiple reports have pointed out that the power consumption of data centers is skyrocketing, leading to power shortages in some cities in the United States. From an energy perspective, it often takes several years for a power plant to be sited, built, and operational. If faced with protests from environmental groups, the process could be further delayed.

In situations where energy issues cannot be quickly resolved, improving computational efficiency is the only solution. By utilizing electricity more efficiently per watt-hour to sustain larger-scale AI model training. In fact, some believe that the slow progress of OpenAI's ChatGPT-5 is largely due to the inability to significantly increase computing power.

During a speech, Su Zifeng also mentioned that improving computational efficiency can better address the conflict between energy and computing power, allowing supercomputing centers to be deployed in more locations. In the vision of some AI companies, every city in the future should have its own super AI center to handle various AI needs such as smart driving and urban security.

To achieve this goal without significantly increasing the city's energy burden, higher computational efficiency GPUs are the only solution. Moreover, computational efficiency is directly related to the cost of AI computing. Only by reducing the cost of AI computing can widespread AI adoption become a reality.

AMD's Bold Plan

Stimulated by Nvidia, as the only company in the GPU field that can compete with Nvidia, AMD has been accelerating the development and market launch of its AI chips, releasing multiple professional computing cards such as MI300 and V80.

Reportedly, in order to expedite the progress of AI chips, Su Zifeng reorganized the GPU team, reallocating a large number of personnel to support the development of AI chips. This has significantly impacted the release schedule of the next generation of AMD consumer-grade graphics cards, such as canceling the planned flagship product release and only retaining the mid-range graphics card release plan.

With the concentrated research efforts, AMD's progress is currently rapid. The latest MI300X has already surpassed Nvidia's H100 in performance, with a majority of 42 petaFLOPs and up to 192GB of memory, while consuming a similar amount of power as the H100, at only 750W. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0527%2F9c1fe5a0j00se4h01000xd000go009dm.jpg&thumbnail=660x2147483647&quality=80&type=jpg Image Source: AMD

With its excellent computing efficiency, the MI300X has attracted attention in the market, leading tech giants like Microsoft, OpenAI, and Amazon to submit procurement requests, causing a surge in AMD's chip shipments in the computing field. According to forecasts from relevant agencies, AMD's AI chip shipments in 2024 may reach 10% of NVIDIA's shipments, increasing to 30% next year.

According to Su Zifeng, in order to enhance the computing efficiency of chips, AMD has developed several new technologies, such as 2.5D/3D hybrid packaging technology. With this technology, AMD can cram more transistors and memory into the chip without increasing the package size, reducing the data exchange between the chip and memory, effectively boosting computing performance per watt. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0527%2F42d6fc53j00se4h01000qd000go009dm.jpg&thumbnail=660x2147483647&quality=80&type=jpg Image Source: AMD

In addition, AMD will also improve chip architecture and introduce a new generation of more energy-efficient architecture, expected to be released as early as 2025, achieving the goal of 25x30 (a 30-fold increase in computing efficiency by 2025). However, to achieve the goal of 27x100 (a 100-fold increase in computing efficiency by 2027), improvements in many areas are still needed. Relying solely on process technology upgrades and architectural enhancements may not be sufficient.

It must be said that AMD's plan is quite ambitious. If successful, AMD may once again stand shoulder to shoulder with NVIDIA.

So, what is NVIDIA's response? In fact, NVIDIA responded early on with the release of the GB200, an unprecedented computing power monster that also stands out in terms of computing efficiency. According to NVIDIA, the inference performance of the GB200 is 30 times that of the H100, and its computing efficiency is 25 times that of the H100 (considering factors such as computing power and power consumption).

Clearly, NVIDIA is not lagging behind. In the next three years, regardless of whether AMD can achieve its ambitious hundredfold plan, the AI chip market will witness a revolution.