Where Prompt Engineering Fits

Prompt is one input to the Harness, not architecture. It is 1/10 of agent engineering, not the center. This is the judgment that sets ADPS apart from the prompt-centric school.

1 · The current state · prompt engineering has been inflated into the whole of agent engineering

Across 2023 and 2024, the industry underwent a collective shift in perception—treating prompt engineering as the center of agent engineering.

The evidence is scattered everywhere. The moment OpenAI released its prompt engineering guide, hundreds of "prompt template collection" GitHub repos sprang up, the top few of them gathering tens of thousands of stars. Prompt engineering was briefly listed among the high-frequency skills on LinkedIn job postings; by mid-2023 some companies were even advertising "prompt engineer" roles paying upwards of 300,000 US dollars—a title that then quietly disappeared from most job listings in 2025. Prompt Engineering for Developers, The Art of Prompt Engineering, ChatGPT Prompting Mastery lined the bookshelves. The prompt engineering short courses put out by Geekbang, Coursera, and DeepLearning.AI add up to more than fifty. This was a cognitive flood.

By the second half of 2024, the industry reflected systematically for the first time—this road had been taken too far. Anthropic's Claude Code team publicly rejected the "giant system prompt" route in early 2025, turning instead to a tool registry + permission pipeline + hook system. A line from Karan Prasad's April 2026 reverse-engineering analysis of Claude Code's 512K lines of code has been quoted repeatedly—"Claude Code's moat is not the prompt, it is the harness." Around the same time, LangChain founder Harrison Chase said at LangChain Interrupt 2025 that "prompt engineering is a small piece of the puzzle." But the industry's engineering education has not caught up. Open up a mainstream agent tutorial and more than half of it is still about how to write prompts.

This is the moment for ADPS to take a position—prompt engineering is 1/10 of agent engineering, not the center, and certainly not architecture.

2 · The ADPS position

Prompt is one input to the Harness, not architecture.

Unpack that sentence—

The prompt determines how the model responds in a single given call; the Harness determines under what conditions the model is called, what context it is handed, what tools it is permitted to call, when it stops, what gets left as a trace, and how to roll back when something goes wrong. The prompt is the shape of a single call; the Harness is the boundary of system behavior.

These two things are not on the same order of magnitude.

The fourth ADPS principle states that "the Harness is the engineering substrate of the agent"—and within this framework prompt engineering occupies a clear position: it is the craft of the system prompt layer inside the Harness, the local triggering mechanism for Reasoning and Action among the seven cognitive functions in certain scenarios. It has its own legitimate territory, but stepping beyond that territory is misuse.

Let us lay out the legitimate and the illegitimate, each in turn.

3 · The legitimate territory of prompt engineering

The following five scenarios are where prompt engineering genuinely delivers, and where no better engineering substitute exists.

Role definition in the system prompt. Giving the model a clear role—"you are a financial compliance review assistant, and your output must conform to the section format of MAS 670"—is one of the core jobs of prompt engineering, with no architectural substitute. Claude Code's system prompt contains long passages of role and behavioral constraints, DeerFlow's Researcher Agent has its role prompt, and the first paragraph of OpenCode's AGENTS.md is likewise a role definition. Role is the work of the prompt, not the work of the Harness.

Few-shot example design. Giving the model two to five input-output example pairs so it can learn implicit rules from the examples—this is extremely effective for structured output, domain specialization, and style transfer. Anthropic's own published prompt engineering documentation lists few-shot among the most effective techniques. The selection, ordering, and similarity of examples are prompt craft, with no architectural substitute.

Output format constraint. Getting the model to output JSON, XML, or a specific markdown format—this is another genuine battleground for prompt engineering. Even with a structured output API (OpenAI function calling, Anthropic tool use), the underlying layer still depends on prompt design. The precision of output format is 80% from the prompt and 20% from schema validation.

Chain-of-Thought triggering. The phrase "let's think step by step" is a classic piece of prompt engineering craft—it triggers the model into a longer chain of intermediate reasoning. The variants of CoT (self-consistency CoT, Tree-of-Thoughts, Plan-and-Solve) are all designs at the prompt layer as well. The engineering layer cannot readily substitute for this—the triggering mechanism of reasoning is, in essence, linguistic.

Failure mode mitigation. When a model is known to hallucinate, overstep, or repeat itself on a certain class of input—a negative instruction in the prompt ("do not fabricate numbers," "do not give medical advice," "if you are unsure, say you don't know") is one of the techniques for reducing these failure modes. This is not the only line of defense, but it is the first line of defense.

These five scenarios together amount to roughly the entire legitimate territory of prompt engineering. They matter, they require dedicated craft training, and they deserve to be taken seriously—but together they are only 1/10 of agent engineering.

4 · The illegitimate overreach of prompt engineering

The following six misuses are the main reason agent projects died over the past two years.

Using a prompt in place of architecture. "I implemented a multi-agent collaboration system with a single 5,000-word prompt"—this kind of demo was popular in the 2024 ChatGPT circles. The problem is that a demo is always a demo; architectural problems cannot be solved with a prompt. The real problems of multi-agent collaboration are message passing, context isolation, conflict resolution, and handoff fidelity—these are engineering problems at the Harness layer, not language problems.

Using a prompt in place of evaluation. "I wrote 'please self-assess the output quality and give a score from 0 to 100' into the prompt"—this is another typical misuse. A model's self-assessment carries serious, well-known bias (it tends to give itself high marks), and such "evaluation" has no engineering meaning whatsoever—no ground truth, no latency and cost measurement, no regression test. The third ADPS principle states that "evaluation is design"; the place for evaluation is an independent evaluation harness inside the Harness, not a single sentence in the prompt.

Using a prompt in place of governance. "I wrote 'calling any API involving money is not allowed' into the system prompt"—this is the standard form of governance misuse. An LLM's instruction following has well-known circumventability, and new variants of prompt injection (DAN, the grandma exploit, unicode tag smuggling) are published every month, dozens of them. Governance must be implemented at the Harness layer with a permission pipeline, tool registry, and approval gate—the prompt is only an auxiliary hint, not enforcement. Claude Code's PreToolUse Hook is the engineering location of governance.

Using a prompt in place of memory architecture. "I stuffed 50K tokens of conversation history into the prompt"—this practice was popular in 2024 when long-context models had just appeared. The problems surfaced immediately: token costs exploded, key information drowned in the noise, and nothing persisted across sessions. Memory is an architectural problem—short-term / long-term tiering, retrieval strategy, relevance scoring, and the persistence layer are all engineering problems at the Harness layer. The prompt is a consumer of memory, not the implementation layer of memory.

Using a prompt in place of tool dispatch. "I wrote 'when the user asks about the weather, call the weather API' into the prompt"—putting tool routing into the prompt as a natural-language description. This road had an extremely high failure rate in early agent projects in 2024. Tool dispatch is an engineering problem, not a language problem—the schema design of the tool registry, permission boundary, failure handling, and retry strategy all require code implementation at the Harness layer. Anthropic's tool use API launched in 2024, OpenAI's function calling, and the MCP protocol are all the engineering direction for this.

Trying to implement a state machine with a prompt. "I wrote 'you are in the planning phase; after planning is complete you enter the execution phase; after execution is complete you enter the reflection phase' into the prompt"—stuffing a finite-state-machine into the prompt. This is one of the most hidden and most common anti-patterns of the past two years. An LLM does not maintain state—state must be implemented at the Harness layer with a real state machine or graph framework (LangGraph, Pydantic Graph, an in-house orchestration layer). The "state" described in the prompt is essentially a form of wishful thinking—the model may follow the state you describe this time and skip a phase the next. State is an engineering object, not a language object.

These six misuses share one form—downgrading an architectural problem into a language problem. The legitimate territory of prompt engineering is "language problems"; overreaching to do "architectural problems" is bound to fail.

5 · The true position of prompt engineering in the two-axis matrix

The ADPS two-axis framework—the vertical axis of the seven cognitive functions (Perception / Memory / Reasoning / Action / Reflection / Collaboration / Governance) × the horizontal axis of the six topologies (Chain / Route / Parallel / Orchestrate / Loop / Hierarchy)—gives prompt engineering a precise coordinate.

Prompt engineering mainly affects the Reasoning and Action functions on the vertical axis—through the system prompt it shapes the model's reasoning style (CoT, ToT, ReAct), and through tool description it shapes tool-selection behavior.

Prompt engineering has almost no effect on the horizontal axis—the six execution topologies of Chain / Route / Parallel / Orchestrate / Loop / Hierarchy are architectural decisions at the Harness layer and have nothing to do with the prompt. An Orchestrator-Worker topology will not become a Hierarchy because you write a good prompt; a Chain topology will not become a Loop because you write a poor one. Topology is engineering structure, independent of language.

The range prompt engineering affects across the 28 cells: it affects roughly Reasoning × 6 + Action × 6 = 12 cells, specifically "how the model responds within this cell," but it does not affect "how this cell is composed into the system."

To quantify this—prompt engineering is the local craft of 12 cells within the 28-cell matrix, not the design language of the matrix.

This is where the judgment of 1/10 comes from—the prompt affects the local response quality of the model within certain cells, its range of effect is limited, and each cell still has other, stronger engineering levers (tool schema, permission, context strategy, reflection loop).

6 · A comparison of two kinds of production teams

Having observed more than thirty agent deployment projects over the past eighteen months, teams can be sorted into two kinds—

Prompt-heavy + Harness-thin (fail mode). The traits of this kind of team are clear—a system prompt of more than 5,000 words, no independent tool registry, no permission pipeline, no evaluation harness, no audit log. Every engineering problem is solved by "adding another passage to the prompt." The typical way such projects die: dazzling at the demo stage, then a corner-case avalanche once they enter production, more than 30% of engineering time spent within three months on repeatedly tuning the prompt, and the project quietly taken offline within a year.

There were a great many such projects in 2024-2025—a certain automaker's after-sales customer service Agent, a certain bank's wealth advisory Agent, a certain e-commerce platform's recommendation Agent—all sharing the trait of being prompt-heavy. None of them survived the first year.

Harness-heavy + prompt-clean (success mode). The traits of this kind of team are equally clear—the system prompt is usually kept within 800-2,000 words, focused on role and high-level behavior, while all the remaining engineering effort goes into the tool registry, permission pipeline, context management, observability, and recovery. The prompt is the final step in the engineering output, not the center.

Claude Code is the most complete sample of this road—its system prompt is under 3,000 words, but the Harness layer runs to 512K lines of code. DeerFlow 2.0 is another sample—the prompt of each sub-agent is short, but each agent has a Docker sandbox, a persistent filesystem, and a long-term memory store. Aider is a third sample—the prompt is minimal, with all the engineering effort going into the Git boundary. What these three projects have in common is Harness-heavy + prompt-clean; all have survived more than three years, and all are reference samples for current production-grade agent engineering.

What this comparison shows is simple—the prompt is not the lever; the Harness is the lever.

7 · Closing · the upgrade path

This position is not written to belittle the prompt engineer—quite the opposite, it is written to point the prompt engineer toward a clear upgrade path.

Engineers who studied prompt engineering over the past two years hold something real—they understand the craft of the model's language layer, they understand few-shot design, they understand CoT triggering, they understand format constraints. These skills are not obsolete; they are 1/10 of agent engineering, and they remain a necessary 1/10.

But staying only within that 1/10 has no future. The direction of the upgrade is the Harness engineer—learning the schema design of the tool registry, learning to implement the permission pipeline, learning the strategies of context management, learning to build the evaluation harness, learning the engineering of observability and audit, learning the topology selection of multi-agent orchestration.

Along this upgrade path, the craft of prompt engineering is not lost—it gets absorbed into the Harness engineer's toolbox as a sub-skill of "the system prompt layer." The 1/10 is still there; it is simply no longer the whole.

ADPS welcomes this upgrade. This society's engineering education, the Pattern Selection Card, the six-step selection method, the Agent Incident Taxonomy—all of these artifacts are prepared for "those who want to upgrade from prompt engineer to Harness engineer."

Prompt is one input to the Harness, not architecture. Acknowledging this is the first step toward doing agent engineering seriously, beginning in 2026.

ADPS · Agent Design Patterns Society · adpsagent.com · 2026-05-30 · v0.1

← Back to all positions