Meta Launches New Version of In-House AI Chip: Performance Improves by 3x

HuHanYan Sun, Apr 14 2024 11:03 AM EST

Amid a scarcity of AI (Artificial Intelligence) chips, more and more tech giants are opting for in-house development.

On April 10th, local time, social media giant Meta unveiled the latest version of its independently developed chip, MTIA. MTIA is a custom chip series designed by Meta specifically for AI training and inference tasks. Compared to Meta's first-generation AI inference accelerator, MTIA v1, announced in May last year, the latest version of the chip shows significant performance improvement, tailored for Meta's ranking and recommendation systems for its social software. Analysis suggests that Meta's aim is to reduce reliance on chip manufacturers like Nvidia.

On the 10th, Meta (Nasdaq: META) closed at $519.83 per share, up 0.57%, with a total market value of $1.33 trillion. Wind data shows that Meta's stock price has risen by over 47% since the beginning of this year.

From its name, MTIA stands for "Meta Training and Inference Accelerator." Despite containing the word "training," this chip is actually not optimized for AI training but rather focuses on inference, which involves running AI models during production processes.

Meta wrote in a blog post that MTIA is a "critical part of the company's long-term plans," aimed at using AI to build infrastructure for Meta's services: "To achieve our ambitions for custom chips, this means investing not only in compute chips but also in memory bandwidth, networking and capacity, and other next-generation hardware systems." The new MTIA chip from Meta, according to their official website, "fundamentally focuses on providing the right balance of compute, memory bandwidth, and memory capacity." The first-generation MTIA v1 chip was built using TSMC's 7nm process, while the new MTIA chip is crafted with TSMC's 5nm process and boasts more processing cores. This chip will feature 256MB of on-chip memory with a frequency of 1.3GHz, compared to the MTIA v1's 128MB and 800GHz on-chip memory. Early testing by Meta indicates that the performance of the new chip surpasses the first-generation version by three times when testing the performance of the "four key models."

On the hardware front, to support the next generation of chips, Meta has developed a large rack-mounted system capable of accommodating up to 72 accelerators. It consists of three chassis, each containing 12 boards, with two accelerators per board. This system can ramp up the chip's clock frequency from the initial 800 MHz to 1.35GHz, running at 90 watts of power, whereas the initial design consumed 25 watts. Meta has developed a large-scale rack-mounted system called MTIA. According to Meta's official website, in terms of software, Meta emphasizes that the software stack running on the new chip system is very similar to MTIA v1, which speeds up the team's deployment. Additionally, the new MTIA is compatible with the code developed for MTIA v1, and since Meta has integrated the complete software stack into the chip, developers can launch and run Meta's traffic using this new chip within a few days, enabling Meta to deploy the chip to 16 regions and run production models within nine months.

According to Meta's summary, test results so far indicate that the MTIA chip can handle low-complexity (LC) and high-complexity (HC) ranking and recommendation models as components of Meta products. "Because we control the entire stack, we can achieve higher efficiency compared to commercial GPUs."

Currently, the new MTIA chip has been deployed in Meta's data centers and has shown promising results: "The company is able to invest more computing power for more intensive AI workloads. It turns out that the chip is highly complementary to commercial GPUs in providing the best combination of performance and efficiency for workload specific to Meta."

In February of this year, media reports revealed information about the second-generation MTIA chip, stating that Meta plans to mass-produce an AI chip internally referred to as "Artemis" this year to further accelerate the company's expansion in the AI field. At the time, a Meta spokesperson confirmed the plan, stating that the chip would work in conjunction with the hundreds of thousands of GPUs Meta has procured.

As the AI competition intensifies, high-performance AI chips are becoming increasingly scarce. On January 18 of this year, Meta CEO Mark Zuckerberg announced that Meta plans to build its own AGI (Artificial General Intelligence) and plans to obtain approximately 350,000 H100 GPUs from NVIDIA by the end of this year. Even if calculated based on the lowest price of $25,000 for the star chip H100, Meta will still pay about $8.75 billion for 350,000 H100 GPUs.

Of course, Meta is not the only tech giant turning its attention to self-developed chips. Just a few days ago, Google announced that it is manufacturing custom CPUs based on the ARM architecture, called "Axion", which is planned to support services such as YouTube ads on Google Cloud and will be launched later in 2024. Previously, Microsoft and Amazon have also begun developing custom chips capable of handling AI tasks.

Analysts at market research firm CFRA say that these large tech companies are facing cost pressures and need to rely on self-developed chips to alleviate them. Although these chips are "necessary" for the companies, they may not be able to match the performance of NVIDIA's latest Blackwell platform products.

pre：Capturing the "Afterglow" of the Big Bang

next：South Korea Plans to Invest $7 Billion in the AI Field

Meta Launches New Version of In-House AI Chip: Performance Improves by 3x

Navigation

Related Articles