Enterprise AI Patterns Library

Patterns Library

Enterprise AI Patterns Library

Reusable design patterns for building AI and agentic systems — each a self-contained, citable unit with the problem it solves, when to use it, how it works, benefits, risks and when not to use it. Built for people and AI agents.

Orchestration

Goal Decomposition

Goal decomposition has an agent break a high-level goal into an ordered set of smaller, tractable sub-tasks — a plan — before acting, then execute and monitor that plan, re-planning when steps fail. The explicit plan becomes an inspectable artifact you can review, gate, and debug. Use it when a goal needs several dependent steps and reactive, step-at-a-time agents drift or stall; skip it for simple, single-shot tasks.

Orchestration

Orchestrator-Workers

An orchestrator LLM dynamically breaks a task into subtasks, delegates each to a worker LLM, and synthesizes the results. Unlike fixed parallelization, the orchestrator decides the subtasks at runtime — making it suited to complex tasks whose decomposition is not known in advance.

Orchestration

Parallelization

Parallelization runs multiple LLM calls at the same time and aggregates the results. Two flavors: sectioning (split a task into independent subtasks run in parallel) and voting (run the same task several times to improve reliability or coverage). It cuts latency and can raise quality.

Orchestration

Prompt Chaining

Prompt chaining decomposes a task into a fixed sequence of LLM calls, where each step works on the output of the previous one. It trades a little latency for much higher accuracy and control, and is the simplest workflow pattern: use it whenever a task cleanly splits into ordered subtasks.

Orchestration

Routing

Routing classifies an input and directs it to the most appropriate specialized handler, prompt or model. It improves quality by letting each path be optimized for its case, and controls cost by sending easy requests to cheap models and hard ones to capable models.

Orchestration

Supervisor Agent

A supervisor agent is a persistent coordinator that manages a team of specialized sub-agents. It reads the conversation state, decides which specialist should act next, routes messages to it, and integrates returned results toward the goal. Unlike a one-shot decomposer, the supervisor stays in the loop across many turns, delegating by capability and re-planning until the task is done or handed back to the user.

Orchestration

Task Prioritization

Order an agent's candidate tasks by value, urgency, dependencies, and cost instead of processing them first-in-first-out. A scoring function and a priority queue decide what runs next, so limited compute, budget, and time go to the work that matters most. Re-score as state changes, and bound the queue so it cannot grow without limit.

Reliability

Evaluator-Optimizer

One LLM generates a response while a second LLM evaluates it against criteria and returns feedback; the generator revises and the loop repeats until the evaluation passes. It raises quality on tasks with clear evaluation criteria, at the cost of extra calls.

Reliability

Recovery Strategy

Give the agent an explicit plan for when things break. Detect failures by validating outputs and catching tool errors; then retry with adjustment, fall back to an alternative path, roll back partial actions, or escalate. Bound retries to avoid runaway loops and cost, make actions idempotent, and distinguish transient from permanent failures. The goal is graceful degradation instead of crashes or silently wrong results.

Reliability

Reflection

Reflection has a model critique its own output and then revise it, using the critique as feedback. It is a lightweight, single-model way to catch mistakes and improve quality on reasoning, coding and writing tasks — at the cost of extra calls.

Safety & oversight

Human Approval Gate

A human approval gate pauses an automated workflow at a defined checkpoint so a person can review, edit or reject a proposed action before it executes — especially for high-impact, irreversible or regulated operations. It is the operational form of human-in-the-loop oversight.

Safety & oversight

Human Escalation

Hand the whole task to a human when the agent detects it is out of its depth — low confidence, repeated failure, ambiguity, or sensitive situations — and pass full context so the person can take over without re-investigating. Unlike an approval gate, which pauses one action for sign-off, escalation transfers ownership so the agent stops driving. The hard part is calibrating triggers to avoid both over- and under-escalation.

Retrieval & knowledge

Long-Term Memory

Give an agent persistent memory across sessions so it remembers facts, user preferences, and prior outcomes beyond a single context window. A write path decides what to store, summarizes it, and deduplicates it; a read path retrieves only the relevant memories into context when needed. Unlike semantic caching, which caches whole answers to skip recomputation, long-term memory stores durable facts and state and recomposes them into fresh reasoning each time.

Cost & performance

Context Compression

Context compression reduces the tokens fed to a model on each call while preserving the information it actually needs to act. Use it on long-running agents and long conversations to cut cost and latency and to stay inside the context window. The three levers are summarizing history, pruning irrelevant context, and compressing prompts. The central risk is lossy: dropping the one detail that mattered. Measure information retained, not just tokens saved.

Cost & performance

Semantic Caching

Semantic caching stores past model responses and reuses them when a new request is semantically similar to a previous one — matching by meaning via embeddings, not exact text. It cuts cost and latency for repetitive or near-duplicate queries common in production.