{
  "id": "HRN-014",
  "slug": "bibliography",
  "title": "Bibliography",
  "category": "Reference",
  "status": "Draft",
  "summary": "A curated, themed reading list for Harness Engineering—agents and orchestration, evaluation, security, governance and standards, and protocols—covering foundational papers, industry writeups, and regulatory frameworks.",
  "updated": "Sun Jun 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)",
  "url": "https://santismm.com/en/handbook/bibliography",
  "evidence": {
    "evidenceLevel": "theoretical",
    "confidenceLevel": "high",
    "sourceType": [
      "industry_observation",
      "paper"
    ]
  },
  "related": [
    "HRN-001",
    "GOV-001"
  ],
  "tags": [
    "bibliography",
    "reference",
    "reading-list",
    "standards"
  ],
  "headings": [
    "Executive Summary",
    "Definition",
    "FAQs"
  ],
  "markdown": "# Bibliography\n\n## Executive Summary\n\nThis bibliography is the curated reference list underpinning the Harness Engineering handbook. It is organized by theme so a reader can go deep on any single layer. Entries are real, well-known works and standards. Where exact citation details (DOIs, page numbers) are not asserted here, the title and venue/source are given without fabricating identifiers; readers should confirm current versions of evolving standards.\n\n## Definition\n\nThe following references are grouped by theme. They are the primary sources the handbook draws on and the recommended starting points for further study.\n\n### 1. Foundations of agents and reasoning\n\n- Yao, S. et al. **\"ReAct: Synergizing Reasoning and Acting in Language Models.\"** ICLR. The reasoning-and-acting interleaving that underpins tool-using agents.\n- Wei, J. et al. **\"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.\"** NeurIPS. Foundational to step-by-step reasoning.\n- Schick, T. et al. **\"Toolformer: Language Models Can Teach Themselves to Use Tools.\"** NeurIPS.\n- Shinn, N. et al. **\"Reflexion: Language Agents with Verbal Reinforcement Learning.\"** NeurIPS. The reflection/self-critique loop (cf. PAT-003).\n- Wang, L. et al. **\"A Survey on Large Language Model based Autonomous Agents.\"** A broad survey of the agent design space.\n- Wang, X. et al. **\"Plan-and-Solve Prompting.\"** ACL. Plan-then-execute decomposition.\n\n### 2. Orchestration, multi-agent, and durable execution\n\n- Anthropic. **\"Building Effective Agents.\"** Engineering guidance on workflows vs. agents and single-agent-first design.\n- Anthropic. **\"How we built our multi-agent research system.\"** Practical supervisor/worker orchestration writeup.\n- LangChain. **LangGraph documentation** — state-machine orchestration for agents.\n- Temporal / durable-execution engines. **Documentation on workflow durability and the Saga pattern.**\n- Microsoft / AutoGen. **\"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation.\"** Multi-agent conversation framework.\n- Hong, S. et al. **\"MetaGPT: Meta Programming for Multi-Agent Collaborative Framework.\"**\n\n### 3. Memory and retrieval (RAG)\n\n- Lewis, P. et al. **\"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.\"** NeurIPS. The canonical RAG paper.\n- Gao, Y. et al. **\"Retrieval-Augmented Generation for Large Language Models: A Survey.\"**\n- Packer, C. et al. **\"MemGPT: Towards LLMs as Operating Systems.\"** Memory hierarchy and paging for agents.\n- Asai, A. et al. **\"Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection.\"**\n\n### 4. Evaluation\n\n- Liang, P. et al. **\"Holistic Evaluation of Language Models (HELM).\"** Stanford CRFM.\n- Zheng, L. et al. **\"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.\"** The LLM-as-judge methodology and its biases.\n- Es, S. et al. **\"RAGAS: Automated Evaluation of Retrieval Augmented Generation.\"** Groundedness and faithfulness metrics.\n- Liu, Y. et al. **\"G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment.\"**\n- **SWE-bench** and **GAIA** — agentic capability benchmarks for software and general assistance tasks.\n\n### 5. Security for agentic systems\n\n- OWASP. **\"OWASP Top 10 for Large Language Model Applications.\"** Including LLM01 Prompt Injection and LLM06 Sensitive Information Disclosure.\n- Willison, S. **\"Prompt injection\"** and **\"The lethal trifecta for AI agents\"** (blog essays). The clearest articulation of the architectural injection/exfiltration problem.\n- Greshake, K. et al. **\"Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.\"**\n- MITRE. **ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems).**\n- NIST. **SP 800-53** (security and privacy controls; least privilege, identity) as adapted for AI systems.\n\n### 6. Governance, risk, and regulatory standards\n\n- NIST. **AI Risk Management Framework (AI RMF 1.0)** and the **Generative AI Profile.**\n- ISO/IEC. **42001:2023 — Artificial intelligence — Management system.**\n- ISO/IEC. **23894:2023 — AI — Guidance on risk management.**\n- European Union. **Regulation (EU) 2024/1689, the EU AI Act** — risk-tiered obligations for AI systems.\n- OECD. **OECD AI Principles.**\n- US White House / OMB. **Executive and management guidance on trustworthy AI** (for context on public-sector expectations).\n- See also GOV-001 (Enterprise AI Governance Framework) and GOV-005 (Agent Governance Controls Checklist) in this corpus.\n\n### 7. Protocols, interoperability, and the discovery layer\n\n- Anthropic. **Model Context Protocol (MCP) specification** — standardized model-to-tool/data interface.\n- **llms.txt** proposal — a site-level convention for agent-friendly content indexing.\n- **JSON-LD / schema.org** — structured data for machine discovery.\n- **OpenAPI Specification** — typed contracts for tools exposed as APIs.\n\n### 8. Authorization and policy enforcement (adapted from systems engineering)\n\n- OASIS. **eXtensible Access Control Markup Language (XACML)** — the PEP/PDP authorization model.\n- **Open Policy Agent (OPA) / Rego** — policy-as-code engine.\n- Saltzer, J. & Schroeder, M. **\"The Protection of Information in Computer Systems.\"** Origin of the least-privilege principle.\n\n## FAQs\n\n**Q: Why are some entries missing DOIs or exact dates?**\nA: To avoid fabricating identifiers. Titles and venues/sources are given so the work is unambiguously locatable; confirm the current version, especially for evolving standards (EU AI Act, NIST AI RMF, MCP, OWASP).\n\n**Q: How does this relate to GOV-001?**\nA: GOV-001 operationalizes the governance and regulatory sources in sections 5–6 into an enterprise framework; this bibliography is the underlying reading list.\n\n**Q: Where should a newcomer start?**\nA: Section 1 (ReAct, Reflexion) for how agents work, section 2 (Building Effective Agents) for how to engineer them reliably, and sections 5–6 for security and governance."
}