Home > News > Hardware

Moore Thread and Wuwen Xinqiong Achieve Major Milestone in Domestic GPU Development! Thousand-Card Cluster Completes Training of 3 Billion-Parameter Large Model

Shang Fang Wen Q Mon, May 27 2024 08:46 AM EST

On May 27th, Moore Thread and Wuwen Xinqiong jointly announced the successful completion of training the MT-infini-3B model, which consists of 30 billion parameters. This training was conducted using a thousand-card cluster composed of Moore Thread's domestically produced full-function GPU MTT S4000 and Wuwen Xinqiong's AIStudio PaaS platform.

This training fully validates the reliability of the Kuaye Thousand-Card Intelligent Computing Cluster in large model training scenarios, and also pioneers a new paradigm of deep cooperation between domestically produced large language models and domestically produced GPU thousand-card intelligent computing clusters in the industry.

It is reported that the training of the MT-infini-3B model took a total of 13.2 days, running smoothly without interruptions throughout the process. The stability of the cluster training reached 100%, and the efficiency of scaling compared to single-machine training exceeded 90%.

Currently, the performance of the trained MT-infini-3B model ranks among the top in models of the same scale. In comparison to models trained on mainstream international hardware (especially NVIDIA), it demonstrates superior performance on three test sets: C-Eval, MMLU, and CMMLU. S4e007e7e-edc4-4b2c-b128-d63411d649df.png The collaboration between Wuwen Xinqu and Moore Threads aims to create intermediate products between "M types of models" and "N types of chips," enabling efficient and unified deployment of various large-scale model algorithms on diverse chips. They have formed a deep strategic partnership, with Moore Threads being the first domestic GPU company to access Wuwen Xinqu for training models at the kilo-calorie level. The Kuajia kilo-calorie cluster has successfully integrated with Wuqiong Infini-AI for system-level adaptation, completing the training and testing of the LLama2 model with 70 billion parameters.

The training of T-Infini-3B marks the industry's first end-to-end large-scale model training case based on domestic GPU chips, starting from scratch. 397fea66-9952-4f22-ac0f-477eb48b6fde.png Recently, based on Moore's Law, the Kuajie Qianka cluster, developed by the Hangu Group, successfully completed distributed training of large models with parameter magnitudes of 7B, 34B, and 70B. The two parties have also entered into a strategic partnership.

After rigorous testing by both parties, the compatibility and adaptability are high, training efficiency meets expectations, and the accuracy meets the requirements, ensuring a stable training process throughout. a9749f4b-9480-4dd3-b842-f0171d94d115.png