Home > News > Techscience

"Fudan·MouSi" Empowers Visually Impaired to "See" the World

Jiang Qing Ling Thu, Mar 07 2024 12:14 AM EST

Recently, the "Hear the World" app, tailored for visually impaired individuals, has been launched based on the multimodal large model "Fudan·MouSi" (MouSi) developed by the Natural Language Processing Laboratory (FudanNLP) at Fudan University. This app is set to become a lifeline and intelligent assistant for the visually impaired community. 65e2cb98e4b03b5da6d0a986.png "Mousi," developed by Fudan Natural Language Processing Laboratory, shares a homophonic resemblance to the text-based MOSS, but unlike the text-based MOSS, it can understand and identify image content, aiming to serve as a visual aid for the visually impaired.

The team has shifted from a text-based model based on GPT3.5 to reproducing a multimodal large model around GPT4-v. They are conducting research on the core key points of the model, aiming to improve the accuracy of individual tasks and the reinforcement learning of large models. Building upon the "Mousi" large model trained on hundreds of millions of images, the team has conducted special sample training using tens of thousands of images to enable "Mousi" to adapt to a wider range of scenarios, particularly those relevant to the visually impaired.

According to reports, the "Hearing the World" app based on "Mousi" has been designed to meet the daily life needs of the visually impaired. It offers three modes: street walking mode, where "Mousi" can carefully scan the road conditions and alert potential risks, accompanying visually impaired individuals for safe passage; free question and answer mode, whether in museums, art galleries, or parks, "Mousi" can capture every detail of the surroundings and use sound to construct rich living scenes.

It is expected that by March of this year, the "Hearing the World" app will complete its first round of testing and will be launched in synchronous trial runs in first and second-tier cities and regions in China, with promotion based on computational power deployment. More modes are also under development, such as reading mode, to assist blind friends with tasks such as ordering food and reading books, and commentary mode, to serve as audio describers for accessible movie experiences. 65e2cba7e4b03b5da6d0a988.png The images are all sourced from the School of Computer Science and Technology. In the first half of this year, the team will enhance the positioning accuracy within the app to sub-meter level using AR technology. In the second half of the year, the team aims to upgrade "Mousi" to be based on video-based analysis. Professor Qi Zhang from the Natural Language Processing Lab at Fudan University stated, "Artificial intelligence is advancing rapidly, and technology should change more people's lives. We hope 'Mousi' can help visually impaired individuals step out of their homes, allowing them to explore more job opportunities and create more possibilities in life."