Home > News > Techscience

Multi-Perspective 4D Facial Scanning System Gives AI "Insightful Eyes"

DiaoWenHui Thu, May 02 2024 11:29 AM EST

Image recognition, virtual assistants, digital humans, and lifelike videos... The continuous development and innovation of general artificial intelligence technologies are propelling us further into the era of intelligence. However, most AI-generated visual content currently remains predominantly two-dimensional, leaving ample room for improvement in terms of spatial, temporal, and detail aspects.

How can we equip AI with "insightful eyes" to enhance visual clarity, detail, and naturalness?

On April 29th, it was reported that researchers at the Machine Vision Research Center of the Institute of Advanced Technology at the Chinese Academy of Sciences in Shenzhen have made advancements in the field of dynamic 3D facial imaging technology. The team led by researcher Song Zhan has independently developed a multi-perspective 4D high-precision facial 3D imaging system. Compared to traditional 3D facial scanning technologies, this system has significantly improved in terms of accuracy, resolution, and speed. It can be applied in various fields including but not limited to facial recognition, medical diagnosis, and film special effects.

Faster, Clearer, Finer Facial Scanning

The multi-perspective 4D facial scanning system developed by the research team consists of three structured light cameras operating in different near-infrared bands. The underlying algorithm utilizes the high-frequency stripe displacement encoding 3D reconstruction method proposed by the Song Zhan team. This system can achieve a scanning speed of over 100 frames per second at a resolution of 1080P (1920*1080 pixels). Furthermore, the GPU-based highly parallel 3D reconstruction algorithm can achieve real-time reconstruction speeds of up to 300 Hz, with a depth imaging error of less than 0.05 millimeters.

"Each depth camera is composed of a near-infrared structured light projector and an industrial camera. For each reconstruction, the projector projects a set of pre-designed high-frequency stripe patterns onto the object being measured, and the camera captures the deformations of these patterns on the object. Ultimately, depth information, i.e., 3D information, is obtained by analyzing the captured deformations. The depth camera performs continuous scanning, adding temporal axis information to derive 4D information," explained Wu Di, a master's student at the Institute of Advanced Technology in Shenzhen. 662f484ee4b03b5da6d0df17.png Song Zhan and the team debug the multi-angle 4D facial scanning system. Photo by Lin Yicheng.

In addition, to achieve a more complete three-dimensional dynamic imaging of the face from multiple angles, the system uses three different bands of near-infrared light (invisible to the human eye) as light sources. This not only avoids dazzling the face but also prevents interference between the three sets of projector devices projecting patterns, greatly improving imaging integrity.

"Near-infrared light is low and friendly to the human eye, but it has a certain degree of penetration on the skin, causing the projected high-frequency light grating patterns to become blurred, thus reducing the accuracy of 3D reconstruction," said Song Zhan. To address this, the research team has employed innovative image enhancement algorithms, combined with robust stripe encoding and decoding algorithms breakthrough, to enhance the phase calculation accuracy of decoding projected patterns, thereby improving the accuracy of 3D reconstruction. At the same time, real-time and parallelism of the algorithms need to be considered to provide high-precision data support for research on 3D dynamic model acquisition, head pose estimation, and facial expression transfer.

Supporting AI to generate higher quality 3D data

Song Zhan introduced that the system has broad application prospects in multiple fields. For example, in the field of new display technologies, it is expected to provide 3D data acquisition devices for holographic projection, air imaging, and other new display technologies, as well as AR display terminals. In the film and television industry, it can capture actors' high-precision facial expressions in real-time, combined with expression transfer technology to achieve expression transfer from real human expressions to cartoon characters. In the gaming industry, the system can capture user facial information and combine it with gaze tracking technology to achieve human-machine interaction. In the medical field, the system can help provide patients' facial expression information to assist in medical diagnosis. In the field of humanoid robots, this technology can provide more precise and sensitive 4D visual perception methods for robots, enabling them to perform more refined tasks beyond simple activities.

The development of artificial intelligence technology largely depends on data-driven approaches.

"Images or videos generated on a two-dimensional plane often struggle to present the three-dimensional structure of the real world. In the future, videos generated by AI technology will gradually evolve from two-dimensional to three-dimensional. To generate higher quality 3D videos, the support of 3D data is indispensable," Song Zhan stated.

This system can provide real, detailed, high-quality 3D data for "3D+AI" research, addressing the current lack of high-precision 3D data in this research field, and offering real-time, high-precision, high-resolution data support for generating higher quality videos with AI models.

It is understood that the research team has already applied this technology to areas such as movie special effects, special processing, facial 3D diagnosis and treatment, dynamic 3D visual guided assembly, and has achieved good results.

In the future, the research team will further strengthen fundamental algorithm research, improve encoding efficiency and imaging speed, reduce hardware costs, develop modular high spatiotemporal resolution 4D imaging devices, and apply them to more industrial and information communication fields, providing sharp visual imaging technology support for the development of new quality productivity.