Google Upgrades Gemini 1.5 Pro, Opens AI to Audio Listening

Thu, Apr 11 2024 06:48 AM EST

On April 10th, Google announced an upgrade to its large language model, Gemini 1.5 Pro, equipping it with "ears" to listen and analyze uploaded audio files, extracting key information from earnings calls or video audio without the need for written transcripts.

During the Google Next conference held on Tuesday in the United States, Google announced the first external access to Gemini 1.5 Pro through its artificial intelligence development platform, Vertex AI. This model was initially unveiled in February of this year.

Considered as a "middle-weight" model within the Gemini family, Gemini 1.5 Pro surpasses the capabilities of the largest and most powerful Gemini Ultra. Google states that Gemini 1.5 Pro can comprehend complex commands without requiring specific adjustments to the model.

It's worth noting that users not utilizing Vertex AI won't have access to all features of Gemini 1.5 Pro. Currently, the general public mainly interacts with Gemini chatbots and the Gemini large language model. Although Gemini Ultra provides robust support to Gemini Advanced chatbots, capable of understanding longer commands, it falls short in response speed compared to Gemini 1.5 Pro.

Apart from the updates to Gemini 1.5 Pro, Google also upgraded other large-scale artificial intelligence models. Particularly, Imagen 2, serving as a text-to-image generation model, enhances Gemini's image generation capabilities. By introducing Outpainting and Inpainting functionalities, users can now more flexibly add or remove elements from images.

To ensure the traceability of copyright and sources of images generated by the Imagen model, Google has incorporated SynthID digital watermarking technology into all generated images. This innovative technology distinctly identifies the image source with nearly invisible watermarks that can be detected using specialized tools.

Many of Imagen model's new features, such as Outpainting and Inpainting techniques, have been adopted by other text-to-image models like Stability AI's Stable Cascade and Getty's Generative AI by iStock. Additionally, these technologies are widely used in consumer electronics products such as Samsung Galaxy phones.

In addition to the innovative image generation, Google also showcased a method combining AI-generated answers with Google search results, aiming to provide users with more real-time and accurate information. However, answers generated by large language models are not always entirely accurate and may sometimes mislead users. Therefore, Google has imposed certain restrictions on Gemini models, such as prohibiting answers to questions related to the 2024 US elections.

Previously, the Gemini model faced criticism for inaccuracies in generating descriptions of historical figures.

pre：The world's deepest cave exceeds 2212 meters! Human corpse found at 1100 meters

next：AI Boom: Microsoft Invests $2.9 Billion in Japan, South Korea to Invest $7 Billion in Chip Development

Google Upgrades Gemini 1.5 Pro, Opens AI to Audio Listening

Navigation

Related Articles