Home > News > AI

Accessibility: Achieving Real-Time Full-Body Motion Capture with Just 6 Sensors, a New Breakthrough from Tsinghua University

Sun, May 26 2024 08:06 AM EST

With only 6 coin-sized sensors, real-time and precise full-body motion capture is now possible.

The latest research from a team at Tsinghua University has been selected for the prestigious computer graphics conference SIGGRAPH 2024.

Thanks to this technology, embodying a small black figure becomes more accessible. 4c56683bg00se1b2802ymd000et008cm.gif Engaging in vigorous physical activities such as boxing is also not out of the question. a46c48c0g00se1b2906n6d000et008cm.gif Conciseness is a core advantage of this type of technology. Six inertial measurement units (IMUs) are worn on the limbs, head, and back.

IMU sensors are actually quite common in everyday life, found in smartphones, fitness trackers, watches, and headphones. The IMUs used in the video are very small, about the size of a coin, making them almost imperceptible when worn on the body. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0525%2Fc4990c01j00se1b280015d000rp00iim.jpg&thumbnail=660x2147483647&quality=80&type=jpg △ A one-yen coin (left) and the inertial sensor (right) used in the technology.

Not only is it sleek and easy to wear, but compared to traditional motion capture equipment that can easily cost hundreds of thousands, its cost has been reduced to a level that the average user can easily afford.

For example, Sony in Japan released the Mocopi product last year, offering users a solution for motion capture with 6 IMUs, priced at $449.99. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0525%2F60662320j00se1b28001nd000ns00dfm.jpg&thumbnail=660x2147483647&quality=80&type=jpg

  • Sony's Mocopi product utilizes 6 IMUs for motion capture.

Researchers from Tsinghua University have introduced a new technology called PNP, building upon existing expertise in the field, significantly surpassing current academic and industrial solutions in motion capture accuracy.

Compared to Sony's Mocopi, this new approach demonstrates more standardized and natural completion of most human movements. 079a14f2g00se1b29032yd000et008cm.gif

34359e4fg00se1b2903jbd000et008cm.gif Comparison of real-time motion capture results between Sony Mocopi (left, black) and our technology PNP (right, orange).

Not only does Mocopi show significantly higher accuracy compared to industry products, but our technology PNP also demonstrates clear advantages over the most advanced solutions in academia. ff47cf04g00se1b2903egd000et008cm.gif Comparison of real-time motion capture results between the state-of-the-art academic solution PIP (left, blue) and our technology PNP (right, orange).

This technology will be presented at SIGGRAPH 2024, and the code has been open-sourced. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0525%2F99543570j00se1b28001wd000u000com.jpg&thumbnail=660x2147483647&quality=80&type=jpg Modeling Non-Inertial Forces Introduces "Virtual Acceleration"

This technique addresses a problem in previous work, which overlooked the issue of non-inertial forces when estimating human motion using inertial measurement data.

Specifically, human motion capture tasks are typically decomposed into two subtasks: human pose estimation and human motion estimation. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0525%2Fef905a4ej00se1b27000id000u000a0m.jpg&thumbnail=660x2147483647&quality=80&type=jpg In human pose estimation tasks, previous methods have often simplified network training by using the root node coordinate system of the human body. This involves using IMU measurements (including acceleration and rotation) in the root node coordinate system to estimate human pose (joint rotations).

However, due to the acceleration and rotation of the human body, the root node coordinate system is typically a non-inertial frame. When transforming acceleration to a non-inertial frame, the influence of non-inertial forces must be considered.

For example, if a subject is standing on a turntable, an observer at rest would perceive the IMU acceleration measurements to be consistent with the motion of the person (indicating centripetal acceleration due to rotation). However, a dynamic observer on the turntable would point out that the IMU acceleration readings do not match the motion they see (they would see the person as stationary). ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0525%2F06a2322ej00se1b270011d000u000gqm.jpg&thumbnail=660x2147483647&quality=80&type=jpg The reason behind this is that the observer in a non-inertial reference frame must account for non-inertial forces (such as centrifugal force and Coriolis force) when reading the data from the IMU in order to obtain consistent results with the observation. ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0525%2F1de3ca51j00se1b280016d000u000gqm.jpg&thumbnail=660x2147483647&quality=80&type=jpg This technology ensures that the neural network perceives acceleration consistent with human motion by modeling the "virtual acceleration" caused by non-inertial forces. By more fully utilizing acceleration measurements, the precision of motion capture can be improved.

Illustrating the practical impact of this technology with an example: When comparing two movements, circling the body and contracting the arm (as shown in the left image below), the IMU on the arm will measure inward acceleration. Simply transforming this to the root node coordinate system would make these two movements indistinguishable (as shown in the middle image). However, in this technology, the "centrifugal force" generated by body rotation counteracts the centripetal force measured by the IMU, allowing these two movements to be effectively differentiated (as shown in the right image below). ?url=http%3A%2F%2Fdingyue.ws.126.net%2F2024%2F0525%2Fc06e4098j00se1b28002bd000qu00j5m.jpg&thumbnail=660x2147483647&quality=80&type=jpg The videos on the homepage of this project provide a clear and intuitive explanation of the core ideas of this technology and the underlying physics knowledge for readers to reference.

Accurate and in line with the laws of physics

Thanks to the more comprehensive use of acceleration, this technology can address movements such as raising hands and throwing punches that were difficult to capture in previous work, where IMU rotations remained almost constant and actions could only be reconstructed through acceleration. fd5c7ce6g00se1b2800gnd000et008cm.gif

f5fa76cdg00se1b2904aqd000et008cm.gif Compared to the previous method PIP (left, blue), the new technology PNP (right, orange) can more accurately reconstruct hand-raising and punching movements. The capture of complex movements is also more precise. e1b07c36g00se1b29062fd000et008cm.gif The new PNP technology (right, orange) can more accurately capture complex movements compared to the previous PIP method (left, blue).

Unlike industrial solutions like Sony mocopi, this technology utilizes human body physics optimization, ensuring that the reconstructed results adhere to physical laws (e.g., avoiding issues like feet sliding on the ground). ad5ab2e7g00se1b2800kbd000et008cm.gif

  • Compared to Sony Mocopi (left, black), the PNP technology (right, orange) captures movements that adhere to the laws of physics (feet do not slide).

For complex movements such as squat walking, this technology can also handle them better. 6ac1def8g00se1b2804k9d000et008cm.gif Compared to Sony Mocopi (left, black), our PNP technology (right, orange) captures complex movements more accurately.

For more comparative results, please refer to the video on the homepage.

Paper link: https://arxiv.org/abs/2404.19619Project homepage: https://xinyu-yi.github.io/PNP/Open-source code: https://github.com/Xinyu-Yi/PNP