23 key gen AI terms and what they really mean

As abruptly as generative AI burst on the scene, so too is the new language that’s come with it. A complete list of AI-related vocabulary would be thousands of entries long, but for the sake of urgent relevance, these are the terms heard most among CIOs, analysts, consultants, and other business executives.

Agentic systems

An agent is an AI model or software program capable of autonomous decisions or actions. When multiple agents work together in pursuit of a single goal, they can plan, delegate, research, and execute tasks until the goal is reached. And when some or all of these agents are powered by gen AI, the results can significantly surpass what can be accomplished with a simple prompt-and-response approach. Gen AI-powered agentic systems are relatively new, however, and it can be difficult for an enterprise to build their own, and it’s even more difficult to ensure safety and security of these systems.

“Agents and agentic AI is obviously an area of enormous investment for VCs and startups,” says Gartner analyst Arun Chandrasekaran. “And we’ll perhaps see more agent frameworks evolve and mature in 2025.”

Alignment

AI alignment refers to a set of values that models are trained to uphold, such as safety or courtesy. But not all companies share the same values, and not all AI vendors make it clear exactly which values they’re building into their platforms.

“It’s an issue, and it’s not easy to solve,” says JJ Lopez Murphy, head of data science and AI at Globant. “There’s only so much you can do with a prompt if the model has been heavily trained to go against your interests.”

Black box

A model whose internal mechanisms are not clearly understandable and the inner processes are concealed, making it difficult to tell how the model comes up with its answers. This is a significant problem for enterprises today, especially with commercial models.

“If I don’t know what data that model was trained on and the fine tuning that was done on the model, I wouldn’t trust it to be in alignment with my company values,” says Priya Iragavarapu, VP of data science and analytics at AArete.

Context window

The number of tokens a model can process in a given prompt. A token is, on average, three-quarters of a word. Large context windows allow models to analyze long pieces of text or code, or provide more detailed answers. They also allow enterprises to provide more examples or guidelines in the prompt, embed contextual information, or ask follow-up questions.

At press time, the maximum context window for OpenAI’s ChatGPT is 128,000 tokens, which translates to about 96,000 words or nearly 400 pages of text. Anthropic released an enterprise plan for its Claude model in early September with a 500,000 token window, and Google announced a 2 million token limit for its Gemini 1.5 Pro model in June, which translates to about 1.5 million words or 6,000 pages of text.

Distillation

The process of reducing the size of one model into a smaller model that’s as accurate as possible for a particular use case.

“Using models that have been distilled or pruned during training can provide a similar level of performance, with fewer computational resources required during inference,” says Ryan Gross, senior director of data and applications at Caylent, a cloud consultancy. That means they use less memory and can answer questions faster and cheaper.

Embeddings

Ways to represent text, images, or other data so similar objects can be located near each other. This is typically done using vectors in multi-dimensional space, where each dimension reflects a particular property about the data. They’re typically stored in a vector database and used in conjunction with retrieval augmented generation (RAG) to improve the accuracy and timeliness of AI responses.

Fine-tuning

The process of further training a pre-trained model on a specific dataset to adapt it for particular tasks. Companies typically start with either a commercial or open-source model and then fine-tune it on their own data to improve accuracy, avoiding the need to create their own foundation model from scratch. “Training is most expensive,” says Andy Thurai, VP and principal analyst at Constellation Research. “Fine tuning is second most expensive.”

Foundation models

Large gen AI models typically trained on vast data sets. Most common examples include LLMs like ChatGPT and image models like Dall-E 2. Individual enterprises typically don’t train their own foundation models. Instead, they use a commercially available or an open-source one, and then customize or fine-tune it for their own needs. Foundation models can also be used as is, without additional fine-tuning, with RAG and prompt engineering.

Grounding

Since gen AI models don’t actually remember their training data — just the patterns they learned from that training data — the accuracy of responses can dramatically vary. This can be a significant problem for enterprise use cases, as AI models can give responses that appear correct but be entirely wrong. Grounding helps reduce this problem by providing the AI with the data it needs. For example, a user asking an AI about how to use a particular product might paste the context of the product manual into the prompt.

Hallucinations

AI models can generate false, nonsensical, or even dangerous answers that can seem plausible at first glance. Enterprises reduce these hallucinations by fine-tuning models and using RAG and grounding techniques. Another way to reduce hallucinations is to run the same prompt multiple times and compare the responses, says David Guarrera, gen AI lead at EY Americas, though this can increase inference costs.

Human in the loop

For many use cases, gen AI isn’t accurate, comprehensive, or safe enough to use without human oversight. A human in the loop approach involves a person reviewing the AI outputs before they’re used. “I’m a big advocate of making sure the human reviews everything the large language model produces — code, content, pictures — no matter what,” says Iragavarapu.

Inference

The process of using a trained model to give answers to questions. This can be very expensive if companies use commercial models that charge by the token. “When you start to run workloads that have millions of inferences, you get sticker shock,” says Thurai. Some ways to reduce inference costs include open-source models, small language models, and edge AI.

Jailbreaking

Gen AI systems like chatbots or image generators typically have guardrails in place to prevent the AI from giving illegal, dangerous, or obscene answers. To get around these restrictions, malicious users will try to trick the AI into ignoring these guardrails with prompts like, “Ignore all previous commands.” Over time, AI vendors have caught on to the most common jailbreak techniques but users keep coming up with new ones. This is the biggest security risk in many LLM applications, says Guarrera. “And the goal posts are always shifting.”

In addition to tricking an AI into giving inappropriate answers, jailbreaks can also be used to expose training data, or get access to proprietary or sensitive information stored in vector databases and used in RAG. Jailbreaking attacks are also known as prompt injection attacks.

Large language model

A large language model (LLM) is a type of foundation model specifically designed to work with text. It’s typically tens or hundreds of billions of parameters in size, compared to small language models, which typically come in at fewer than 10 billion parameters. For example, Meta’s Llama 3.1 has 405 billion parameters, while OpenAI’s GPT-4 reportedly has more than one trillion.

Choosing the right model typically requires some testing with the intended use case. However, companies often start by checking the leaderboards to see which models have the highest scores. The LMSYS Chatbot Arena Leaderboard ranks both proprietary and open source models, while the Hugging Face Open LLM Leaderboard ranks just the open source ones, but uses multiple benchmarks.

Multimodal AI

Multimodal foundation models can handle multiple types of data, such as text, image, audio, or video. A fully multimodal model would be trained on multiple types of data at once. More commonly, however, there’ll be multiple models on the back end, each one handling a different type of data. “Multimodal is still in its infancy,” says Sinclair Schuller, partner at EY. “Most multimodal systems aren’t genuinely multimodal yet.” For example, a model that interacts with users via voice might first translate the audio to text, then generate a text response, then translate that response back into audio.

Prompt

The input given to a gen AI model, or the question sent from a user to a chatbot. In addition to a question, prompts can also include background information that would be helpful in answering the question, safety guidelines about how the question should be answered, and examples of answers to use as models.

Prompt engineering

The brand-new discipline of crafting effective prompts to get desired results from AI models. Prompt engineering can be used by end users to guide the AI, such as by asking for the answer to be “simple enough for a high school student to understand,” or telling the AI to “think things through step by step.” But it’s also used by developers adding AI functionality to enterprise workflows, and may include guidelines and stylebooks, sample answers, contextual data, and other information that could improve the quality and accuracy of the response.

Retrieval augmented generation (RAG)

A way to improve accuracy, security, and timeliness by adding context to a prompt. For example, an application that uses gen AI to write marketing letters can pull relevant customer information from a database, allowing the AI to have access to the most recent data. In addition, it allows a company to avoid training or fine-tuning the AI model on the actual customer data, which could be a security or privacy violation.

But RAG has downsides. First, there’s the added complexity of collecting the relevant information and moving it into vector databases. Then there’s the security overhead to ensure the information is only accessed by authorized users or processes. And there’s the added cost of the inference itself, since the pricing is typically based on the number of tokens.

“If you’re ingesting documents each a thousand pages long, your embedding costs can get significantly high,” says Swaminathan Chandrasekaran, head of solution architecture for digital solutions at KPMG.

Responsible AI

Development and deployment of AI systems with consideration of ethics, bias, privacy, security, compliance, and social impacts. Responsible AI can help increase trust on the part of customers, employees, and other users and stakeholders, as well as help companies avoid public embarrassment and stay ahead of regulations.

PwC’s responsible AI lead Ilana Golbin Blumenfeld recommends that enterprises start by defining their responsible AI principles that will guide the development and deployment of AI systems. They could include fairness, transparency, privacy, accountability, and inclusivity. She also recommends that companies maintain human oversight and accountability. “Design AI systems to augment human decision-making, rather than replace it entirely,” she says.

Small language model

The best-known gen AI models, like OpenAI’s ChatGPT or Anthropic’s Claude, are LLMs, with tens or hundreds of billions of parameters. By comparison, small language models typically have 7 or 8 billion and can offer significant benefits for particular use cases. “Smaller models generally cost less to run but may offer reduced accuracy or capability,” says Caylent’s Gross. But choosing the right model size for the specific task can optimize costs without compromising performance too much, he adds.

Synthetic data

Artificially generated data used to train AI models, often created by other AI models. “Real-world data is very expensive, time-consuming, and hard to collect,” adds Thurai. “For example, some large language models are trained on billions of parameters, and the more data you feed in, the better the model is.” Synthetic data can also be used to fill in gaps, or replace personally identifiable information. But too much of it can introduce new biases and, if models are trained on synthetic data and then used to produce more synthetic data, repeated cycles can lead to model collapse.

Vector database

Typically used to store information that’s then used to provide needed context to AI models via RAG. Vector databases store data in multi-dimensional space, allowing closely related information to be located near each other for easier searches. Hyperscalers and AI platform vendors will typically include a vector database in their toolsets. In addition, Pinecone is a popular open source vector database, and Elasticsearch and OpenSearch are popular for full-text search.

Zero-shot prompting

A gen AI use case in which the user doesn’t provide an example of how they want the LLM to respond, and is the simplest way of using a gen AI chatbot. “With zero-shot, anyone can get in front of a gen AI tool and do something of value to the business,” says Sheldon Monteiro, chief product officer at Publicis Sapient. “Like a developer going in and saying, ‘Help me write code.’”

Other common zero-shot prompt examples include general knowledge questions or requests to summarize a piece of text. By comparison, few-shot prompting requires the user to provide examples to guide the AI. For example, a user looking for a sales letter might provide instances of previous sales letters so the AI can do a better job matching the company’s style and format.

Source link

Thomson 158 Reuters