Home > News > Internet

Intel Unveils Gaudi3 Chip: Outperforms NVIDIA H100, Set for Wide Release in Q3

Wed, Apr 17 2024 07:56 PM EST

At the Vision 2024 event tonight, Intel unveiled its next-generation Gaudi 3 AI chip, slated for a widespread release through OEM systems in the third quarter of 2024.

According to reports, the new Gaudi 3 boasts a 170% improvement in training performance and a 50% boost in inference capabilities compared to the NVIDIA H100. Additionally, efficiency has been enhanced by 40%, all at a significantly lower cost. (Note: The H100 is NVIDIA's previous-generation product, and Intel did not compare it to the latest Blackwell series.) ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0409%2F2d5f36abj00sbomn50011d000i200a6g.jpg&thumbnail=660x2147483647&quality=80&type=jpg In addition, Intel has introduced a new brand name for its data center CPU product portfolio: chips formerly codenamed Granite Rapids and Sierra Forest will now be referred to as the "Xeon 6" series. These chips are scheduled to be launched later this year and will support the standardized MXFP4 data format for enhanced performance.

Intel has also announced the development of AI NIC ASICs and AI NIC microchips for Ethernet networks. These microchips will be utilized in their future XPU and Guadi 3 processors and will be provided to external customers through Intel's foundries. However, Intel has not disclosed further details about these networking products. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0409%2F7af4f225j00sbomn5001dd000qo00h7g.jpg&thumbnail=660x2147483647&quality=80&type=jpg Intel claims that the FP8 performance of Gaudi 3 is double that of the previous generation product, while the BF16 performance is quadruple. Additionally, the network bandwidth is twice that of the previous generation, and the memory bandwidth is 1.5 times higher. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0409%2F29945ebej00sbomn5001gd000qy00f6g.jpg&thumbnail=660x2147483647&quality=80&type=jpg Gaudi offers two form factors, with the OAM (OCP Accelerator Module) HL-325L featuring a common design for systems based on high-performance GPUs.

This accelerator integrates 128GB of HBM2e, providing a bandwidth of 3.7 TB/s. Additionally, it boasts 24 200 Gbps Ethernet RDMA NICs.

The HL-325L OAM module has a 900W TDP (potentially higher with apparent liquid cooling) and a rated FP8 performance of 1,835 TFLOPS. OAMs are deployed in groups of eight per server node and can then scale up to 1,024 nodes. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0409%2F8933f6dbj00sbomn5001yd000qy00f6g.jpg&thumbnail=660x2147483647&quality=80&type=jpg

?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0409%2F1b7c4068j00sbomn50027d000qy00f6g.jpg&thumbnail=660x2147483647&quality=80&type=jpg

?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0409%2Fbaf88698j00sbomn5001dd000qy00f6g.jpg&thumbnail=660x2147483647&quality=80&type=jpg

?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0409%2F6a403cd6j00sbomn5001hd000qy00f6g.jpg&thumbnail=660x2147483647&quality=80&type=jpg

?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0409%2F24b4398bj00sbomn5001sd000qy00f6g.jpg&thumbnail=660x2147483647&quality=80&type=jpg

?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0409%2F3079b240j00sbomn5002hd000qy00f6g.jpg&thumbnail=660x2147483647&quality=80&type=jpg

?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0409%2Ff6320522j00sbomn5002bd000qy00f6g.jpg&thumbnail=660x2147483647&quality=80&type=jpg