Despite a year of hype surrounding AI large models sparked by ChatGPT, the fervor for AI shows no signs of waning, and the industry continues to witness disruptive applications. In early 2024, AI PCs, AI phones, and AI edge devices hit the market, and Sora has sparked widespread discussions during the recent holiday season.
One could argue that the AI landscape is perpetually evolving. However, with the rapid growth in computing power required for large models, current chip production struggles to keep pace with industry demands.
Amidst the AI frenzy, accelerators like GPUs and AISCs take center stage. In reality, CPUs play an indispensable role in any data center, akin to the symbiotic relationship between fish and water. In December last year, Intel officially launched its 5th Gen Xeon Scalable Processors (codenamed Emerald Rapids), unveiling a treasure trove of AI capabilities.
Is GPU the Only Option for Running AI?
As large models gain momentum, tech giants worldwide have turned their attention to AI chips, particularly GPUs. However, GPU production is directly tied to HBM or, more precisely, 2.5D packaging capabilities. This bottleneck exacerbates the already strained GPU supply, resulting in a significant demand-supply imbalance.
Contrastingly, the key to AI large model advancement lies in scaling up parameters, leveraging brute force to achieve more powerful intelligent emergence. One could argue that even with inflated prices for AI chips, companies would make purchases to avoid losing their competitive edge.
For large data centers, every chip operates at full capacity, expending maximum effort. If AI performance can be enhanced, would there still be a need for additional GPUs?
Fundamentally, we have been trapped in a mindset that GPUs are the exclusive choice for running AI; however, CPUs have also achieved remarkable AI capabilities.
Asin Technology adopted CPUs as the hardware platform in its OCR-AIRPA solution, enabling quantization from FP32 to INT8/BF16. Consequently, throughput increased while accelerating inference with acceptable accuracy loss. Human resource costs were reduced to one-fifth to one-ninth, and efficiency improved by a factor of 5 to 10.
The transformation extends beyond the internet and communications domains. AI-powered drug discovery is hailed as a potential antidote to the "decade-long drought" in pharmaceutical R&D. In this realm, large models like AlphaFold2 are deemed the most critical algorithms. Since last year, the Xeon Scalable platform has boosted the end-to-end throughput of AlphaFold2 by 23.11x, while the 4th Gen Scalable Processor further elevated this number by 3.02x.
It is evident that the utilization of CPUs for AI inference is proving its feasibility. The 5th Gen Xeon Scalable Processors now empower model inference with up to 200 billion parameters without the need for separate accelerators, achieving latency below 100 milliseconds. A processor tailored for AI acceleration and enhanced performance has emerged.
Unlocking AI Capabilities with CPUs
Many may wonder how the 5th Gen Xeon, a general-purpose processor, can handle AI workloads. In addition to AI capabilities inherent to the 5th Gen Xeon, the key lies in the integrated suite of accelerators.
This design mirrors the current trend in MCUs (microcontrollers), which leverage built-in DSPs and NPUs to shoulder a portion of the AI workload, resulting in more efficient AI execution and reduced power consumption. The 5th Gen Xeon employs a similar principle.
Although this design was present in earlier Xeon Scalable Processors, it did not garner significant attention at the time, as there were fewer AI tasks to execute.
In the case of the 5th Gen Xeon, the built-in Intel AVX-512 and Intel® AMX (Intel® Advanced Matrix Extensions) features are crucial. While these accelerators were introduced in the 4th Gen Xeon, the 5th Gen Xeon's AMX supports the new FP16 instruction, delivering a 2–3x performance improvement for mixed AI workloads.
Coupled with the inherent performance enhancements of the 5th Gen Xeon, its capabilities enable it to tackle AI workloads with greater ease: the number of CPU cores has increased to 64, single-core performance has been boosted, and each core possesses AI acceleration capabilities. Additionally, the adoption of new I/O technologies (CXL, PCIe5) and increased UPI speeds contribute to its prowess.
According to industry experts, the primary bottleneck for CPU-based large model inference lies not in computational power but in memory bandwidth. The 5th Gen Xeon addresses this challenge by enhancing memory bandwidth from 4800 MT/s to 5600 MT/s and tripling the size of the L3 cache. Moreover, its socket scalability allows for expansion from one to eight sockets, providing a solid foundation for large model support. Performance Enhancements
Compared to the previous generation, the 5th Gen Xeon delivers an impressive 21% average performance improvement at the same thermal design power. Relative to the 3rd Gen, it boasts an 87% average performance boost. Beyond performance gains, the 5th Gen Xeon also brings a 42% AI inference performance improvement.
Enhanced Security and Data Management
Intel® Trust Domain Extension (Intel® TDX) provides VM-level isolation and confidentiality, enhancing privacy and data governance within the suite of accelerators.
Energy Efficiency Improvements
As the greenest Xeon processor yet, the 5th Gen enables users to manage power consumption and reduce their carbon footprint. Innovative technologies and features within the 5th Gen Xeon, combined with software, work together to enhance efficiency, ultimately resulting in reduced power consumption.
Future Trends in CPU Development
The future of CPU development hinges on reducing power consumption. This requires a multi-faceted approach. Firstly, advancements in process technology—Intel 3, Intel 20A, Intel 18A—will lead to significant reductions in power consumption, with double-digit decreases in each generation. Advanced packaging techniques, such as Chiplet architecture, enable the integration of chips with different process technologies, allowing specific regions to be utilized based on workload, thus reducing power consumption. Additionally, workload-specific optimizations can further enhance efficiency.
Application Architecture Adjustments
Optimizing application architecture can also significantly reduce power consumption. For instance, consider training 20 large models, each taking three months to train, utilizing 1,000 machines, each consuming 10 kW of power. By training only five of these models and skipping the remaining 15, 75% of the electricity can be saved.
Driving AI Development
Software ecosystems, such as CUDA in the GPU landscape, are crucial for AI development. Intel, with its strong software foundation, continues to invest heavily in the software stack, providing a significant advantage for the 5th Gen Xeon in AI.
Unified and User-Friendly Development
Intel emphasizes unification and ease of use in AI development. OpenVINO enables developers to "write once, deploy anywhere." Intel's foundational software and libraries support its own CPUs, GPUs, IPUs, and AI accelerators through popular frameworks like Pytorch and ONNX Runtime.
Library extensions for PyTorch and TensorFlow also empower developers to utilize these extensions with default installations for the latest software acceleration. This allows users to continue using PyTorch or TensorFlow while leveraging OpenVINO, enabling cross-language development on a single platform.
Additionally, OpenVINO version 2023.1 advances Intel's goal of "any hardware, any model, anywhere," expanding OpenVINO as a comprehensive software environment for running AI models for inference and deployment, spanning clients and edge devices. At the recently concluded MWC2024 in late February, Intel showcased the Sierra Forest processor with up to 288 power-efficient cores (E-cores), while the Granite Rapids processor with performance cores (P-cores) is also on the horizon. It's safe to say that Xeon will continue to excel in the field of AI inference.