Agentic AI Knowledge Base
Self-contained, citable units on applied and agentic AI — each built as a reusable knowledge unit for people, search engines and AI agents alike. Every entry carries a definition, key takeaways, FAQs, references, a date and a version.
Concepts
10What is Agentic AI?
Agentic AI refers to systems that pursue goals over multiple steps — planning, calling tools, acting on an environment and reacting to feedback — instead of producing a single response. It turns a language model from a text generator into an actor that can complete tasks. The shift it represents is from do-it-yourself software, where the human drives every step, to do-it-for-me software, where the system carries out the work and reports back.
What is Agentic AI Evaluation?
Agentic AI evaluation is the practice of measuring how well an agent completes multi-step, tool-using tasks in an environment — not just the quality of a single answer. As models saturate static knowledge benchmarks, evaluation is shifting from measuring capability (what a model knows) to measuring agency (what a system can actually get done). Good evals are the feedback loop that makes harness engineering possible.
What is an AI Agent?
An AI agent is a system that combines a model with tools, memory and a control loop to take actions toward a goal, rather than just answering a single prompt. It perceives a situation, decides what to do, acts through tools, observes the result and repeats until done. Autonomy ranges from a single tool call to long-horizon execution of complex tasks.
What are Embeddings & Vector Search?
An embedding is a numeric vector that represents the meaning of text (or images, audio, code) so that semantically similar items sit close together in vector space. Vector search finds the nearest embeddings to a query, enabling search by meaning rather than keywords. Embeddings are the backbone of retrieval-augmented generation, semantic search, clustering and recommendation.
What is Fine-tuning?
Fine-tuning continues training a pretrained model on a smaller, targeted dataset to specialize its behavior, style or domain knowledge. It is far cheaper than pretraining and changes the model's weights — unlike prompting or retrieval, which leave the model unchanged. Use it to lock in a consistent format, tone or skill; use retrieval instead when you need fresh or private facts.
What are Foundation Models?
A foundation model is a large model pretrained on broad data at scale that can be adapted to a wide range of downstream tasks. Large language models (LLMs) and large multimodal models are the canonical examples. The term, coined at Stanford in 2021, captures a shift: instead of training a bespoke model per task, organizations build on a shared, general-purpose base — then specialize it through prompting, retrieval or fine-tuning.
What is the Model Context Protocol (MCP)?
The Model Context Protocol (MCP) is an open standard for connecting AI models and agents to external tools, data sources and systems through a single, uniform interface. Introduced by Anthropic in late 2024, it standardizes how an application exposes context and capabilities to a model — acting like a universal adapter so any compliant client can talk to any compliant server.
What is Prompt Engineering?
Prompt engineering is the practice of designing the inputs given to a language model so it produces the desired output reliably. A good prompt specifies the role, the task, the constraints, the output format and, when useful, examples. It is the most accessible lever for steering model behavior — and one layer of the broader harness around a model — but on its own it does not make a system reliable at scale.
What are Reasoning Models?
Reasoning models are language models trained to spend extra computation 'thinking' before they answer — generating internal reasoning steps to solve harder problems in math, code and logic. They trade latency and cost for accuracy on complex, multi-step tasks. The key idea is test-time compute: letting a model reason longer at inference, rather than only making the model bigger, can substantially improve results.
What is Tool Use (Function Calling)?
Tool use, also called function calling, lets a language model invoke external functions, APIs or code to fetch information or take actions in the real world. The model decides which tool to call and with what arguments; the application runs the tool and returns the result, which the model uses to continue. Tool use is the bridge that turns a text generator into an agent that can actually do things.
Harness Engineering
4What are Agent Memory Systems?
Agent memory is how an AI agent retains and recalls information beyond a single context window — across steps, sessions and tasks. It typically separates short-term working memory (the current context) from long-term memory (durable stores the agent reads from and writes to). Memory is what lets an agent carry state through a long task, remember a user over time, and avoid repeating work. It is a core layer of harness engineering.
What is AI Agent Observability?
AI observability is the practice of instrumenting AI systems — especially agents — so you can see what they did and why. It captures traces of each step: prompts, tool calls, retrieved context, model outputs, tokens, latency and cost. Because agents are non-deterministic and multi-step, observability is what makes failures diagnosable and improvement systematic. It is the layer that feeds evaluation and closes the harness-engineering loop.
What is Context Engineering?
Context engineering is the discipline of deciding what information enters a model's limited context window at each step — and what stays out. As agents run over many steps, naively stuffing everything into context degrades quality and cost. Context engineering curates the right instructions, retrieved knowledge, tool results and memory so the model has exactly what it needs, when it needs it. It is a core part of harness engineering.
What is Harness Engineering?
Harness engineering is the discipline of designing and optimizing the scaffolding around an AI model — the prompts, tools, memory, environment, control loop and guardrails — so the model performs reliably on real tasks. Its core premise: as base models converge in raw capability, competitive advantage shifts from the model itself to the harness built around it. The same model can pass or fail a task depending almost entirely on its harness.
Patterns
2What is Enterprise RAG?
Enterprise RAG (retrieval-augmented generation) is the pattern of grounding a model's answers in an organization's own documents, retrieved at query time, instead of relying on the model's parametric memory. It lets a company use private, current and governed knowledge — policies, manuals, tickets, contracts — without retraining a model, while keeping access control, citations and auditability that enterprises require.
What is the Human-in-the-Loop Pattern?
Human-in-the-loop (HITL) is a design pattern where a person reviews, approves or corrects an AI system's output before it takes effect — especially for high-impact actions. Instead of full autonomy, the agent proposes and a human disposes. It is a primary control for managing risk in agentic systems and a recurring requirement in AI governance frameworks like the EU AI Act and NIST AI RMF.
Governance
3What is AI Governance?
AI governance is the set of policies, processes, roles and controls that ensure AI is built and used responsibly, legally and safely. It spans risk management, accountability, transparency, security and compliance across the AI lifecycle. In practice it operationalizes recognized frameworks — the EU AI Act, the NIST AI Risk Management Framework and ISO/IEC 42001 — into concrete controls an organization can implement, evidence and audit.
What are AI Guardrails?
Guardrails are runtime controls that constrain what goes into and comes out of an AI system, keeping its behavior safe, on-policy and compliant. They check and filter inputs and outputs, validate tool actions, block disallowed content and enforce limits — sitting around the model as a safety layer. Guardrails are a primary, operational control in AI governance and a key defense against misuse and prompt injection.
What is Prompt Injection?
Prompt injection is an attack in which malicious instructions hidden in the input to a language model hijack its behavior — making it ignore its rules, leak data or misuse tools. It tops the OWASP Top 10 for LLM applications. The root cause is that models cannot reliably separate trusted instructions from untrusted content, so any text an agent reads — a web page, a document, a tool result — can carry an attack.