{
  "id": "HRN-004",
  "slug": "harness-engineering-principles",
  "title": "Harness Engineering Principles",
  "category": "Foundations",
  "status": "Draft",
  "summary": "The core engineering principles of the harness — reliability over capability, determinism boundaries, observability-first, evidence-first, defense in depth, least authority, graceful degradation, and idempotent actuation — that hold across every component.",
  "updated": "Sun Jun 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)",
  "url": "https://santismm.com/en/handbook/harness-engineering-principles",
  "evidence": {
    "evidenceLevel": "theoretical",
    "confidenceLevel": "medium",
    "sourceType": [
      "industry_observation",
      "personal_experience"
    ]
  },
  "related": [
    "HRN-001",
    "HRN-003"
  ],
  "tags": [
    "principles",
    "harness-engineering",
    "foundations",
    "reliability",
    "design"
  ],
  "headings": [
    "Executive Summary",
    "Key Concepts",
    "Definition",
    "Architecture Diagram",
    "Detailed Explanation",
    "Observed Failure Modes",
    "Cost Metrics",
    "Scaling Characteristics",
    "Related Content",
    "References",
    "FAQs"
  ],
  "markdown": "# Harness Engineering Principles\n\n## Executive Summary\nComponents answer *what* a harness contains; principles answer *how* to build each one well. This chapter states the cross-cutting engineering principles of Harness Engineering — the rules that hold whether you are designing memory, orchestration, or a tool contract. They are opinionated by design: a principle that bends to every situation is not a principle.\n\n## Key Concepts\n- **Principle:** A durable design rule that guides decisions across components.\n- **Determinism boundary:** The explicit line between model-decided and code-decided behavior.\n- **Evidence-first:** No claim of quality without measurement.\n- **Defense in depth:** Multiple independent layers so no single failure is catastrophic.\n- **Least authority:** Each component gets the minimum permission needed.\n- **Graceful degradation:** The system fails into a safe, reduced mode rather than collapsing.\n\n## Definition\nThe **Harness Engineering Principles** are a set of cross-cutting design rules that govern how the components of a harness are built and composed so that the resulting agentic system is reliable, observable, governable, and secure. They are the discipline's equivalent of the SOLID principles or the twelve-factor app — not a framework, but a stance.\n\n## Architecture Diagram\n```mermaid\nflowchart LR\n  subgraph Principles\n    P1[Reliability over Capability]\n    P2[Determinism Boundaries]\n    P3[Observability-First]\n    P4[Evidence-First]\n    P5[Defense in Depth]\n    P6[Least Authority]\n    P7[Graceful Degradation]\n    P8[Idempotent Actuation]\n  end\n  P1 --> SYS[(Dependable Agentic System)]\n  P2 --> SYS\n  P3 --> SYS\n  P4 --> SYS\n  P5 --> SYS\n  P6 --> SYS\n  P7 --> SYS\n  P8 --> SYS\n```\n\n## Detailed Explanation\n\n### 1. Reliability over capability\nThe harness optimizes for the *floor* of behavior, not the ceiling. A system that is brilliant 95% of the time and catastrophic 5% of the time is, in an enterprise, a liability — the 5% is what makes the news and the audit. Prefer a narrower scope executed dependably to a broad scope executed erratically. Capability is the model's contribution; reliability is the harness's, and it is the one the enterprise is paying for.\n\n### 2. Determinism boundaries\nDecide explicitly what the model is allowed to decide. Everything that *can* be deterministic *should* be: schema validation, routing, permission checks, retries, and post-conditions belong in code, not in a prompt. The model is reserved for the genuinely open-ended reasoning that only it can do. Drawing this boundary tightly is the single highest-leverage move in harness design — it shrinks the surface over which non-determinism can cause harm.\n\n### 3. Observability-first\nInstrument before you optimize. You cannot debug, evaluate, or trust a non-deterministic multi-step system you cannot see. Every model call, tool invocation, and decision should be a structured, traceable, replayable span *before* the feature is considered complete (HRN-006). Observability is not a phase-two add-on; it is a precondition for every other principle, because each of them depends on measurement.\n\n### 4. Evidence-first\nNo quality claim ships without measurement. \"It seems better\" is not an engineering statement. Changes are gated by evaluation against golden sets and regression suites (HRN-007), and every consequential claim carries its provenance (the evidence model this very knowledge base uses). Evidence-first is what converts agent development from craft to engineering.\n\n### 5. Defense in depth\nAssume any single layer will fail — the model will hallucinate, a tool will return garbage, a user will inject a malicious prompt — and ensure no single failure is catastrophic. Layer independent controls: input validation *and* output validation *and* permission gates *and* monitoring. The model is an untrusted component; treat its output as you would treat unvalidated user input (HRN-011).\n\n### 6. Least authority\nEvery component and tool receives the minimum authority required for its job and no more. Read-only by default; write access scoped and gated; destructive actions behind human approval (PAT-001-class controls). The blast radius of a compromised or confused agent is bounded by the authority you granted it — so grant little.\n\n### 7. Graceful degradation\nWhen something fails, fail *into* a safe, reduced mode — escalate to a human, return a conservative answer, or decline — rather than crashing or, worse, taking a confident wrong action. The harness must have well-defined behavior for impasse, budget exhaustion, tool outage, and low confidence. A system that does not know how to give up safely is not production-ready.\n\n### 8. Idempotent and reversible actuation\nBecause the loop is stochastic and may retry, actions on the world should be idempotent where possible and reversible where not. A retried tool call must not double-charge a customer; a write should be safe to repeat; high-impact actions should be staged, confirmable, and rollback-capable. This principle is what makes retries — essential for reliability — safe.\n\n### Tensions between principles\nThe principles are not always aligned. Reliability-over-capability constrains what the model is allowed to attempt; observability-first adds latency and cost; least-authority slows development. Good harness engineering is the art of resolving these tensions *deliberately* and documenting the trade-off, rather than letting one principle silently win. The meta-principle: **make the trade-off explicit and measurable.**\n\n| Principle | Primary risk it mitigates | Main cost it imposes |\n|-----------|---------------------------|----------------------|\n| Reliability over capability | Catastrophic tail behavior | Reduced scope |\n| Determinism boundaries | Unbounded non-determinism | Up-front design effort |\n| Observability-first | Undebuggable runs | Storage, latency |\n| Evidence-first | Silent regressions | Eval infrastructure |\n| Defense in depth | Single-point catastrophe | Redundant controls |\n| Least authority | Large blast radius | Slower iteration |\n| Graceful degradation | Confident wrong actions | Extra fallback paths |\n| Idempotent actuation | Harmful retries | Action design complexity |\n\n## Observed Failure Modes\n- **Principle theater:** Citing the principles in a design doc but not enforcing them in code or CI.\n- **Capability chasing:** Letting an impressive model capability widen scope past what the harness can reliably control.\n- **Optimizing the unseen:** Tuning prompts and chains before observability exists, so \"improvements\" are unmeasured.\n- **All-or-nothing failure:** No degraded mode, so any single component outage takes the whole system down or produces a confident error.\n\n## Cost Metrics\nThe principles trade marginal per-request cost (instrumentation, validation, redundant checks) for large reductions in the cost of failure (incidents, rework, audit findings, reputational damage). The economically correct framing is *expected cost including tail events*, where the principles consistently pay for themselves.\n\n## Scaling Characteristics\nPrinciples compound at scale. Determinism boundaries and least authority bound the failure surface as step count and concurrency grow; observability- and evidence-first keep a growing system debuggable and regression-safe. Systems built without the principles tend to degrade super-linearly as they scale, because every new capability adds unbounded, unmeasured, over-privileged surface.\n\n## Related Content\n- HRN-001 — Harness Engineering: Definition and Overview\n- HRN-003 — The Harness Taxonomy\n\n## References\n- Analogy to established software principles (SOLID, twelve-factor, defense in depth) adapted to agentic systems.\n- Industry observation on agentic system reliability practices, 2023–2026.\n- Santa María, S. — Working notes on harness design principles.\n\n## FAQs\n**Q:** Which principle matters most?\n**A:** Observability-first is the practical entry point because every other principle depends on measurement. Determinism boundaries is the highest-leverage design decision. They reinforce each other.\n\n**Q:** Aren't these just general software engineering principles?\n**A:** Several are adapted from classic engineering, which is intentional — agentic systems are still software. But the determinism boundary, evidence-first measurement of a stochastic system, and treating the model as untrusted input are specific to the harness.\n\n**Q:** How do I enforce principles, not just state them?\n**A:** Encode them in CI and runtime: schema validation as code, eval gates on merge, permission checks at the tool boundary, and required tracing. A principle that is not enforced is a wish."
}