An AI system that autonomously performs tasks using reasoning, tools, and memory.
Standard tests that evaluate agent performance across tasks.
Memory systems allowing agents to store and recall information across steps or sessions.
A multi-step process executed autonomously by AI agents.
Agents designed to autonomously complete tasks from start to finish with minimal user input.
Policies, processes, and controls that ensure AI systems are safe, compliant, and aligned.
Mechanisms allowing one agent to pass tasks or context to another agent.
AI systems that can autonomously plan, act, and use tools.
Ensuring AI systems act according to human values, safety goals, and organizational rules.
Fully automated end-to-end workflows where agents coordinate tasks without human intervention.
Dynamically adjusting sampling parameters to improve accuracy and reduce hallucinations.
Persistent and contextual information storage for AI agents.
AI systems that autonomously plan, decide, and act toward goals.
Systems where agents evaluate and improve their own outputs without human intervention.
Systems where small models handle easy tasks and escalate difficult ones to larger models.
The process of breaking documents into smaller segments for retrieval.
Automatically inserting retrieved or generated context into LLM prompts.
Techniques that ensure individual data points cannot be reverse-engineered from model outputs.
Automatically generating or modifying prompts based on real-time context.
Models that identify cause–effect relationships instead of correlations.
Distributing complex reasoning tasks across multiple agents or models.
The maximum number of tokens an LLM can process at once.
A governance system where policies proactively shape and direct agent behavior.
Splitting documents into smaller pieces for retrieval and context injection.
A reasoning technique where LLMs generate step-by-step logic before answering.
Systems that estimate how confident an LLM is in its own answer.
Using multiple models to vote or agree on an answer.
Adapting a general LLM to perform well in a specific industry or task domain.
Vector representations of text that capture semantic meaning.
Assessing correctness, safety, robustness, and task success of LLMs and agent systems.
Agents that trigger actions in response to system or data events.
Comparison of modifying model weights (fine-tuning) versus injecting external knowledge (RAG).
Numeric vector representations that capture semantic meaning.
Controls that constrain AI behavior within safe and compliant boundaries.
A structured blueprint for deploying agentic AI safely and at scale within enterprises.
Standardized tests to measure the performance, safety, and reasoning of LLMs and agents.
Models designed to evaluate factual correctness of LLM outputs.
Models designed to tie LLM outputs to real data sources.
A scalable, secure architecture for deploying agentic AI across enterprises.
Detecting uncertain or unstable LLM outputs using entropy measures.
Detailed logs showing every reasoning step, tool call, and decision made by an agent.
Training models across distributed data sources without moving the data.
Controls ensuring AI systems behave safely, remain compliant, and avoid harmful actions.
Measuring performance, reliability, and safety of LLMs and agents.
Methods used to reduce incorrect or fabricated LLM outputs.
Training LLMs on curated instruction–response datasets.
Situations where the LLM produces incorrect or fabricated information.
Detecting what the user wants so the system can route to the correct model or agent.
Combining semantic search and keyword search for optimal retrieval accuracy.
A retrieval method where the model predicts what information it needs, then retrieves it.
Combining multiple sources of information into coherent, stable long-term memory.
Merging multiple knowledge sources into a single cohesive representation.
Neural models trained on massive datasets to understand and generate language.
Models capable of processing extremely long sequences of tokens.
Model architecture where only specialized subsets of parameters activate per task.
Rules and constraints that restrict what an LLM or agent is allowed to do.
Multiple specialized agents collaborating to solve complex tasks.
Teaching smaller agents or models to mimic more advanced ones.
Graph structures encoding entities and their relationships.
Identifying which model generated a piece of text using statistical or embedding patterns.
Removing outdated or irrelevant information from agent memory.
Methods to shrink LLMs while preserving accuracy.
Managing models from deployment to updates, monitoring, retraining, and retirement.
Solving problems that require several reasoning jumps or intermediate steps.
Identifying when the knowledge used by models or RAG systems becomes outdated.
The speed and efficiency of LLM and agent workflow execution.
Boundary layers that block unsafe inputs, outputs, or tool actions.
Systems that route what should be stored, recalled, or forgotten in agent memory.
Compressing a large model into a smaller one while preserving performance.
Choosing the best LLM for a specific task based on capability, cost, and performance.
A large-scale neural network trained to understand and generate human language.
Search engines powered by neural embeddings instead of keywords.
The coordination layer managing agent tasks, tools, memory, and workflows.
AI governance mechanisms that enforce safety, compliance, and access policies.
Monitoring, tracing, and understanding LLM and agent behavior in production.
Monitoring and tracing the internal behavior of LLMs and agent systems.
Reusable workflow templates for building AI and agent systems.
The craft of designing prompts to optimize LLM outputs.
Coordinating agents, tools, and models into controlled workflows.
Models running directly on edge devices like laptops, phones, or IoT hardware.
The process of breaking down goals into steps and executing them using reasoning, retrieval, and tools.
Designing inputs to guide LLM behavior and outputs.
Transforming user queries to improve retrieval accuracy.
Reordering retrieved results to optimize relevance before sending to an LLM.
Next-generation RAG systems using re-ranking, multi-hop retrieval, and verification.
Reinforcement-learning methods used to align LLMs with human values and safety rules.
Training models on unlabeled data by extracting labels automatically.
Search based on meaning rather than keywords.
Using AI-generated datasets to test models or agents.
Artificially generated data used to train, test, or evaluate AI systems.
LLM enhanced with external retrieved knowledge for accuracy and grounding.
Situations where RAG returns irrelevant, missing, or incomplete results.
Systems that route tasks to the most appropriate LLM.
Running agent actions and tool calls in isolated, controlled environments.
Organizing documents using embeddings to support semantic search.
Agents that persist knowledge across interactions instead of starting from scratch each time.
AI-generated data used to train or evaluate LLMs.
Stress-testing AI systems by simulating adversarial or harmful user inputs.
End-to-end workflow that transforms queries into embeddings, retrieves documents, reranks them, and injects them into the LLM.
Models that detect harmful, unsafe, or non-compliant content before it reaches the user.
Mechanisms for LLMs or agents to review and correct their own outputs.
Routing tasks to models or agents based on semantic features of the input.
Systems where agents collaborate using a shared memory and communication fabric.
Combining LLMs with external knowledge retrieval to improve accuracy.
Breaking complex tasks into smaller steps or subtasks.
Assessing how likely it is that a tool call is needed or correct.
Combining LLM reasoning with external tools such as search, databases, and code interpreters.
Systems that verify the correctness of LLM or agent outputs before final delivery.
The neural network architecture behind modern large language models.
Small modular parameters that allow models to specialize without fully retraining.
Reducing delays caused by tool calls in agent workflows.
Models trained specifically to call tools and APIs autonomously.
The process of converting text into tokens that LLMs can process.
A database optimized for storing and searching vector embeddings.
The ability of models to understand sequences, timelines, and time‑dependent logic.
LLM or agent invoking external tools, APIs, functions, or code to perform real actions.
Stores embeddings for semantic search and similarity retrieval.
Allowing LLMs or agents to call external tools and APIs.
Embedding hidden signals in AI-generated text to trace its origin.
Training models with noisy, incomplete, or programmatically generated labels.