Spec-Driven Agentic Development: A Practical Guide

Most teams that struggle with coding agents share one root problem: they prompt first and specify never. They hand the agent a vague task description, get back plausible-looking code, then spend the next hour finding out why it doesn't actually do what they needed. The code isn't wrong by accident. It's wrong because the agent was never told precisely what "right" meant.

Spec-driven agentic development flips that sequence. You write a concrete, machine-readable specification before the agent touches a single file. The spec defines inputs, outputs, constraints, and the acceptance conditions the finished code must satisfy. The agent then treats that document as the contract it's coding against, not a suggestion it interprets freely.

The shift is opinionated, and it's worth defending: agents are good at filling in implementation detail when the boundary conditions are clear. They're bad at inferring what you actually wanted from an ambiguous description. Specs don't constrain the agent. They give it the ground truth it needs to be useful.

What you'll learn

What a coding spec looks like in practice
The spec-first workflow step by step
Writing acceptance conditions the agent can verify
Spec formats and where each works best
Connecting specs to tests
Common spec mistakes that produce bad agent output
When specs are overkill
Frequently Asked Questions

What a coding spec looks like in practice

A coding spec is a structured document that defines what a piece of software must do: its interface, its behavior under normal and error conditions, and the observable properties a reviewer or test suite can check. It's not a design doc and it's not a requirements list. It's closer to a test fixture written before any code exists.

A minimal spec for a function has four parts:

Purpose: one sentence describing what it does and why.
Signature: the exact types of inputs and outputs.
Behavior: what the function does in each meaningful case, including edge cases.
Acceptance conditions: the specific assertions a test suite would make to confirm correctness.

Here's a concrete example for a token-budget guard in an agentic development workflow:

/**
 * SPEC: truncateToTokenBudget
 *
 * Purpose: Trim a message list to fit within a token budget,
 * removing oldest non-system messages first, preserving
 * the system prompt and the most recent user message.
 *
 * Signature:
 *   truncateToTokenBudget(
 *     messages: Message[],
 *     budgetTokens: number,
 *     countTokens: (text: string) => number
 *   ): Message[]
 *
 * Behavior:
 *   - If total tokens <= budgetTokens, return messages unchanged.
 *   - Remove messages from index 1 onward (preserve index 0,
 *     the system prompt) until total tokens <= budgetTokens.
 *   - Always preserve the last message in the array.
 *   - If the system prompt alone exceeds budgetTokens, throw
 *     BudgetExceededError with the overage amount.
 *
 * Acceptance conditions:
 *   - Given 10 messages totalling 800 tokens and budget 500:
 *     returned array fits within 500 tokens.
 *   - System prompt (index 0) always present in output.
 *   - Last message always present in output.
 *   - BudgetExceededError thrown when system prompt alone > budget.
 */

Notice what this spec does not contain: any implementation. There's no while loop, no array slice, no choice of tokenizer. The agent reads this block, writes a function that satisfies every acceptance condition, and you verify the output against the spec rather than against your intuition about what the code should look like.

The spec-first workflow step by step

Agentic development workflows built around specs follow a consistent sequence regardless of which coding agent you use.

Step 1: Draft the spec. Before opening a code file, write the spec in a comment block, a markdown section, or a separate .spec.md file co-located with the module. Use the four-part structure above. If you can't write the acceptance conditions yet, that's a signal the problem isn't well-understood enough to delegate to an agent.

Step 2: Feed the spec as context, not a prompt. Rather than pasting the spec into a chat message as a task description, include it as a file the agent reads. Agents perform better when the spec is in the codebase than when it's in the conversation history. That placement signals the spec is authoritative, not conversational.

Step 3: Ask the agent to implement against the spec. Your prompt becomes short and precise: "Implement the function described in the spec block above. The implementation must satisfy every acceptance condition. Do not modify the spec."

Step 4: Run acceptance conditions before review. The acceptance conditions in the spec translate directly to test cases. Run them before you read a single line of generated code. If tests pass, the code satisfies the spec. If tests fail, the agent didn't fully implement the contract, and you have a precise failure description to send back.

Step 5: Review for quality, not correctness. Once the acceptance conditions pass, your code review shifts from "is this correct?" to "is this well-structured, readable, and safe?" Those are separate questions, and separating them makes both tasks faster.

This sequence is what the Laxaar team applies across AI-powered software development projects. The workflow isn't slower than prompt-first development. It front-loads the thinking that would otherwise happen during debugging.

Writing acceptance conditions the agent can verify

Acceptance conditions are the linchpin of spec-driven agentic development. Vague conditions produce untestable specs; untestable specs produce unverifiable code.

Good acceptance conditions are:

Concrete: "Returns an array of length 3" beats "returns the correct items."
Observable: they describe outputs or side effects, never internal implementation details.
Falsifiable: a test can pass or fail against them without interpretation.
Exhaustive at the boundary: cover the happy path, the empty/zero case, and at least one error case.

Here's the same function spec rewritten with weak versus strong acceptance conditions:

## Weak (hard to test, easy to misinterpret)
- Returns the trimmed messages
- Handles edge cases correctly
- Throws an error if budget is too small

## Strong (directly translatable to test assertions)
- Given messages = [sys, a, b, c] at 200 tokens total and budget 150:
  output.length < 4 and output[0] === sys and output includes the last message.
- Given messages = [sys] at 600 tokens and budget 500:
  throws BudgetExceededError({ overage: 100 }).
- Given messages = [sys, a] at 100 tokens and budget 200:
  returns messages unchanged (no truncation when under budget).

The strong versions look verbose. They are. That verbosity is doing real work: each line is a test case waiting to be written. The agent reading strong conditions produces code that specifically handles each case. The agent reading weak conditions guesses.

Spec formats and where each works best

There's no single canonical spec format for agentic coding. The right choice depends on the kind of work and the team's existing tooling.

Format	Best for	Limitations
Inline JSDoc/TSDoc spec block	Single functions and methods	Doesn't scale to multi-module features
Co-located `.spec.md` file	Feature-level or API-level specs	Requires discipline to keep in sync
OpenAPI / JSON Schema	HTTP APIs and data contracts	Verbose; overkill for internal functions
Gherkin (Given/When/Then)	User-facing features needing BDD alignment	Requires a BDD runner; unfamiliar to some engineers
Plain markdown checklist	Quick exploration and prototyping	Informal; acceptance conditions need manual translation

At Laxaar we lean toward inline spec blocks for function-level work and co-located .spec.md files for feature-level agentic tasks. The key is that whatever format you choose, it lives in the repository alongside the code. Specs in a separate wiki or ticket tracker are out-of-date before the agent finishes running.

Connecting specs to tests

Specs and tests aren't the same thing, but they should be siblings. A good spec is written so that its acceptance conditions can be mechanically translated into test cases, ideally by the agent itself.

After the agent implements a function from a spec, the Laxaar team often runs a second agent pass with this prompt:

Read the spec block in [file]. For each acceptance condition, write
a Jest test case that asserts it. Do not invent test cases that
aren't in the spec. Use the exact scenarios described.

This produces a test file that covers every condition in the spec, no more and no less. It's not a substitute for exploratory testing, but it gives you a regression suite that's directly traceable back to the original contract.

The connection in the other direction is also valuable: when a test fails after a change, you can find the acceptance condition it came from and immediately know whether the change violated the spec or whether the spec needs to evolve. That traceability is one of the underrated advantages of agentic coding done with explicit specs.

// Agent-generated test from the spec above
describe('truncateToTokenBudget', () => {
  it('removes middle messages when over budget', () => {
    const sys = makeMsg('system', 'You are an assistant.'); // ~40 tokens
    const messages = [sys, makeMsg('user', 'a'), makeMsg('assistant', 'b'), makeMsg('user', 'c')];
    const result = truncateToTokenBudget(messages, 150, mockCountTokens);
    expect(getTotalTokens(result)).toBeLessThanOrEqual(150);
    expect(result[0]).toBe(sys);
    expect(result[result.length - 1]).toBe(messages[messages.length - 1]);
  });

  it('throws BudgetExceededError when system prompt alone exceeds budget', () => {
    const bigSys = makeMsg('system', 'x'.repeat(600)); // 600 tokens
    expect(() => truncateToTokenBudget([bigSys], 500, mockCountTokens))
      .toThrow(BudgetExceededError);
  });

  it('returns messages unchanged when total is under budget', () => {
    const messages = [makeMsg('system', 's'), makeMsg('user', 'u')];
    const result = truncateToTokenBudget(messages, 1000, mockCountTokens);
    expect(result).toEqual(messages);
  });
});

Common spec mistakes that produce bad agent output

Writing specs is a skill. These are the mistakes we see most often on projects that bring their agentic workflows to the Laxaar team for improvement.

Specifying implementation rather than behavior. A spec that says "use a binary search to find the cutoff index" is a design doc, not a spec. The agent will follow the instruction, but you've removed its ability to choose a simpler or more correct approach. Specs describe what; implementation describes how.

Missing the error path. Most spec drafts cover the happy path and stop. Error cases are where agent-generated code breaks most often in production. For every acceptance condition that describes normal output, ask what happens when an input is missing, malformed, out of range, or of the wrong type.

Overly wide acceptance conditions. "Returns a valid user object" is an acceptance condition that accepts almost anything. A "valid user object" in your system has specific required fields, type constraints, and possibly invariants about their relationships. Write those down.

Specs that drift from code. The most dangerous spec is one that's wrong. If the code has diverged from the spec and neither the team nor the agent notices, you're testing against a stale contract. Build a habit of updating the spec as part of any change to the public behavior of a module.

No stop condition. Agents don't naturally know when to stop. A spec without an explicit termination condition for iterative tasks ("stop when all items in the queue are processed or when 1000 iterations are reached, whichever comes first") can produce code with subtle infinite loops.

When specs are overkill

Spec-driven development doesn't pay off uniformly. There's a class of tasks where writing a full spec costs more than the verification benefit is worth.

Throwaway scripts and one-off data migrations are poor candidates. The code will run once, you'll check the output manually, and the effort of writing acceptance conditions you'll never re-run is wasted.

Highly exploratory prototyping is also a weak fit. If you don't know what the output should look like (because you're still discovering the problem space), you can't write useful acceptance conditions yet. Use the agent for exploration, then write the spec once you understand what you're building.

Small single-purpose utility functions that are obviously correct from their name and signature (a capitalize(s: string): string function, for instance) don't need the full spec treatment. Reserve the discipline for anything with non-trivial behavior, error handling, or integration with external systems.

The honest trade-off: spec-first adds 15-30 minutes of upfront work to a feature that might take two hours to implement. It eliminates roughly the same amount of time spent disambiguating with the agent during implementation, reviewing incorrect code, and writing tests from scratch afterward. On net it's usually a wash for small features and a clear win for anything complex. For teams doing custom software development at scale with agents, that win compounds quickly across dozens of features per sprint.

Frequently Asked Questions

Does spec-driven development work with all coding agents?

Yes, with minor adjustments to how you present the spec. Agents that support file context (Claude Code, Cursor, Copilot Workspace) work best when the spec is a file the agent reads directly. Chat-based agents work when the spec is pasted into the conversation as the first message, clearly labeled as the authoritative contract. The key in either case is that the spec precedes the implementation prompt and is marked as authoritative.

How detailed does a spec need to be for an agent to use it effectively?

Detailed enough to write tests from it. A practical test: hand the spec to a teammate who wasn't involved in writing it and ask them to write three test cases. If they can do it without asking questions, the spec is detailed enough. If they ask what "valid" means, what the error looks like, or what happens in a specific edge case, those gaps need filling before the agent sees it.

Can the agent help write the spec itself?

It can draft one, and that's a legitimate use of agentic workflows. Ask the agent to generate a spec from a description of the problem, then review and tighten the acceptance conditions before handing the spec back for implementation. The agent's draft will often miss error cases or leave acceptance conditions vague. Your review pass catches those gaps. Don't skip it: an agent that specs and implements its own work without human checkpoints isn't being governed, it's being trusted blindly.

What's the difference between a spec and a unit test?

A spec describes intent before implementation exists. A unit test verifies behavior once it does. In practice the two are close enough that a well-written spec's acceptance conditions should translate directly into test cases, which is why spec-first and test-driven development work well together. The spec is the contract; the tests are what enforce it at runtime. When a test and the spec disagree, you resolve that disagreement on purpose rather than discovering it through a production regression.

How does this approach scale to multi-file or multi-module features?

At feature level, use a co-located .spec.md file that describes the public interface and integration behavior of the entire feature, not just individual functions. Break the spec into sections by module boundary. Each section gets its own acceptance conditions. The agent (or separate agents for each module) implements against its section, and integration tests verify the interactions between sections. This is how the Laxaar team structures larger AI-powered software development engagements: one feature spec, decomposed into module specs, each implemented and tested independently before integration.

Want to adopt spec-driven agentic development on your team without figuring out the process from scratch? Talk to Laxaar. We run agentic coding workshops and can embed the spec-first discipline directly into your existing engineering workflow.