Enterprise AI Patterns Library
Reusable design patterns for building AI and agentic systems — each a self-contained, citable unit with the problem it solves, when to use it, how it works, benefits, risks and when not to use it. Built for people and AI agents.
Orchestration
7Goal Decomposition
Goal decomposition has an agent break a high-level goal into an ordered set of smaller, tractable sub-tasks — a plan — before acting, then execute and monitor that plan, re-planning when steps fail. The explicit plan becomes an inspectable artifact you can review, gate, and debug. Use it when a goal needs several dependent steps and reactive, step-at-a-time agents drift or stall; skip it for simple, single-shot tasks.
Orchestrator-Workers
An orchestrator LLM dynamically breaks a task into subtasks, delegates each to a worker LLM, and synthesizes the results. Unlike fixed parallelization, the orchestrator decides the subtasks at runtime — making it suited to complex tasks whose decomposition is not known in advance.
Parallelization
Parallelization runs multiple LLM calls at the same time and aggregates the results. Two flavors: sectioning (split a task into independent subtasks run in parallel) and voting (run the same task several times to improve reliability or coverage). It cuts latency and can raise quality.
Prompt Chaining
Prompt chaining decomposes a task into a fixed sequence of LLM calls, where each step works on the output of the previous one. It trades a little latency for much higher accuracy and control, and is the simplest workflow pattern: use it whenever a task cleanly splits into ordered subtasks.
Routing
Routing classifies an input and directs it to the most appropriate specialized handler, prompt or model. It improves quality by letting each path be optimized for its case, and controls cost by sending easy requests to cheap models and hard ones to capable models.
Supervisor Agent
A supervisor agent is a persistent coordinator that manages a team of specialized sub-agents. It reads the conversation state, decides which specialist should act next, routes messages to it, and integrates returned results toward the goal. Unlike a one-shot decomposer, the supervisor stays in the loop across many turns, delegating by capability and re-planning until the task is done or handed back to the user.
Task Prioritization
Order an agent's candidate tasks by value, urgency, dependencies, and cost instead of processing them first-in-first-out. A scoring function and a priority queue decide what runs next, so limited compute, budget, and time go to the work that matters most. Re-score as state changes, and bound the queue so it cannot grow without limit.
Reliability
3Evaluator-Optimizer
One LLM generates a response while a second LLM evaluates it against criteria and returns feedback; the generator revises and the loop repeats until the evaluation passes. It raises quality on tasks with clear evaluation criteria, at the cost of extra calls.
Recovery Strategy
Give the agent an explicit plan for when things break. Detect failures by validating outputs and catching tool errors; then retry with adjustment, fall back to an alternative path, roll back partial actions, or escalate. Bound retries to avoid runaway loops and cost, make actions idempotent, and distinguish transient from permanent failures. The goal is graceful degradation instead of crashes or silently wrong results.
Reflection
Reflection has a model critique its own output and then revise it, using the critique as feedback. It is a lightweight, single-model way to catch mistakes and improve quality on reasoning, coding and writing tasks — at the cost of extra calls.
Safety & oversight
2Human Approval Gate
A human approval gate pauses an automated workflow at a defined checkpoint so a person can review, edit or reject a proposed action before it executes — especially for high-impact, irreversible or regulated operations. It is the operational form of human-in-the-loop oversight.
Human Escalation
Hand the whole task to a human when the agent detects it is out of its depth — low confidence, repeated failure, ambiguity, or sensitive situations — and pass full context so the person can take over without re-investigating. Unlike an approval gate, which pauses one action for sign-off, escalation transfers ownership so the agent stops driving. The hard part is calibrating triggers to avoid both over- and under-escalation.
Cost & performance
2Context Compression
Context compression reduces the tokens fed to a model on each call while preserving the information it actually needs to act. Use it on long-running agents and long conversations to cut cost and latency and to stay inside the context window. The three levers are summarizing history, pruning irrelevant context, and compressing prompts. The central risk is lossy: dropping the one detail that mattered. Measure information retained, not just tokens saved.
Semantic Caching
Semantic caching stores past model responses and reuses them when a new request is semantically similar to a previous one — matching by meaning via embeddings, not exact text. It cuts cost and latency for repetitive or near-duplicate queries common in production.