Home > News > Internet

The Basics of AI: Unveiling the Role and Value of Fundamental Models

cici Fri, Apr 19 2024 07:47 PM EST

These neural networks, trained on massive datasets, provide support for the applications driving the AI revolution.

+++

Editor's Note: This article is part of the "Decoding AI" series, aimed at making technology more accessible and understandable, decoding AI while showcasing new hardware, software, tools, and acceleration features to RTX PC users.

Great achievements start from humble beginnings, with each brick laid as a foundation. The same goes for AI-driven applications.

Fundamental models are AI neural networks trained on vast amounts of raw data, primarily through unsupervised learning.

These artificial intelligence models, specially trained, can comprehend and generate human language. Imagine placing a computer in a vast library with plenty of books for it to read and learn from; soon, it would understand context and the meanings behind words and sentences, much like a human.

Due to their vast knowledge base and ability to communicate using natural language, fundamental models have a wide range of applications, including text generation and summarization, assisting in code generation and analysis, creating images and videos, as well as transcribing audio and synthesizing speech.

ChatGPT, a well-known example of generative AI, is a typical representative of conversational agents built upon the GPT fundamental model. The model has now evolved to its fourth version, GPT-4, capable of processing both text and images for responses.

Traditionally, online applications built upon fundamental models would access these models through data centers. However, many such models and their AI-driven applications can now run locally on PCs and workstations equipped with NVIDIA GeForce and NVIDIA RTX GPUs.

Applications of Fundamental Models

Fundamental models serve various functions, including:

  • Language Processing: Understanding and generating text.

  • Code Generation: Analyzing and debugging computer code (supporting multiple programming languages).

  • Visual Processing: Analyzing and generating images.

  • Speech: Generating speech based on text and transcribing speech into text.

Users can further fine-tune fundamental models or deploy them directly. Training a brand-new AI model for each generative AI application is costly and time-consuming, so users often fine-tune fundamental models to fit specific use cases.

Thanks to techniques like prompt engineering and retrieval-augmented generation (RAG), pre-trained fundamental models demonstrate exceptional performance. Additionally, fundamental models excel at transfer learning, allowing users to train the model for a second task related to its initial purpose.

For example, if a general-purpose large language model (LLM) is designed for human conversation, it can be further trained to serve as a customer service chatbot, utilizing enterprise knowledge bases to assist customers.

Today, companies across industries fine-tune fundamental models to optimize the performance of their AI applications.

Types of Fundamental Models

Currently, there are over 100 types of fundamental models in use, with this number expected to continue growing. LLMs and image generators are two highly popular categories of fundamental models. Anyone can try out various models for free through the NVIDIA API catalog, with no hardware requirements for such trials.

LLM models can understand natural language and respond to queries. For instance, Google's Gemma excels at text comprehension and transformation, as well as code generation. Ask it about astronomer Cornelius Gemma, and it will respond: "His contributions to astronomical navigation and astronomy have had a significant impact on scientific progress," along with providing information on Cornelius Gemma's major achievements and valuable legacy.

Google's CodeGemma, accelerated by NVIDIA TensorRT-LLM on RTX GPUs, offers powerful lightweight encoding capabilities for the developer community, further extending the collaboration between NVIDIA and Google in Gemma model development. CodeGemma provides 7B and 2B pre-trained models dedicated to code completion and code generation tasks.

MistralAI's Mistral LLM can follow user instructions, fulfill various requests, and generate creative text. In fact, when crafting the title of this article, we asked Mistral to use synonyms for "AI decoding," and it provided the current title, along with further defining fundamental models. 25bb94e0-bf95-40db-9998-409e41444e72.jpg A Genuine Hello, world

Meta's Llama 2 is an advanced LLM capable of generating text and code based on prompt words.

Users can experience Mistral and Llama 2 on RTX PCs and workstations through NVIDIA ChatRTX technology demos. ChatRTX allows users to personalize models by associating base models with their profiles, such as documents, doctor's notes, and other data, through RAG. Powered by TensorRT-LLM acceleration, ChatRTX provides contextually relevant responses swiftly. Additionally, ChatRTX runs locally, ensuring data security and rapid responsiveness.

Users can generate various images and stunningly realistic visual effects using StabilityAI's Stable Diffusion XL and SDXL Turbo image generators. StabilityAI's video generator, Stable Video Diffusion, takes a single image as a conditional frame and generates multiple frames based on the conditional frame using a generative diffusion model, then synthesizes them into a video sequence.

Multimodal base models can simultaneously process multiple types of data, such as text and images, to produce more complex outputs.

If a multimodal model supports both text and images, users can upload an image and ask questions related to that image. Such models are quickly integrating into practical applications like customer service, offering faster responses compared to traditional manuals and being more user-friendly.

Kosmos 2 is Microsoft's pioneering multimodal model designed to understand and process visual elements in images according to human patterns.

Think globally when considering, run AI models locally.

GeForce RTX and NVIDIA RTX GPUs can run base models locally.

This ensures both data security and rapid response because users don't have to rely on cloud-based services. They can process sensitive data on their local PCs using applications like ChatRTX without the need to connect to the internet or share data with third parties.

Users can select open-source base models from the list and download them, then run them on their hardware. This approach, compared to using cloud-based applications and APIs, not only lowers costs but also addresses issues related to latency and network connections.

Subscribe to "Decoding AI" newsletter for the latest updates delivered directly to your inbox.