Creating Agentic Coding Workflows
Design agentic coding workflows that ship: plan the task graph, pick the right tools, handle failures, and integrate AI agents into your existing dev pipeline.

Agentic coding workflows are software pipelines where an AI agent, not a human, drives most of the execution: reading code, running tests, making changes, and iterating on results. The agent doesn't just suggest edits; it acts on them. That shift changes how you design the workflow, because now you're engineering for an autonomous system rather than a human reading suggestions.
Getting this right means thinking about task decomposition before model selection, error recovery before happy paths, and auditability before speed. The Laxaar team has built these workflows for client projects across web, mobile, and cloud products, and this tutorial covers the patterns that actually work at production scale — not just in demos.
Prerequisites: Familiarity with at least one AI coding tool (Claude Code, Cursor, or similar), Node.js 20+ or Python 3.11+, and a Git repository to work with.
What you'll build
- Decompose a coding task into an agent-friendly graph
- Select and configure the right tools for each step
- Write a workflow orchestrator that handles failures
- Integrate the workflow into your CI pipeline
- Add observability so you can debug agent runs
Step 1: Decompose the task
The single biggest mistake in agentic workflow design is giving the agent one giant task and hoping it figures out the steps. It won't — not reliably. Break the work into a directed graph of discrete steps, where each step has a clear input, a clear output, and a verification condition.
Here's a task decomposition for an automated code review workflow:
Task: Review a pull request and post findings
Step 1: Fetch PR diff
Input: PR number
Output: unified diff string
Verify: diff is non-empty
Step 2: Analyze each changed file
Input: list of changed files + their diffs
Output: findings per file (array of {file, line, severity, message})
Verify: findings array is valid JSON, severity in ['error','warning','info']
Step 3: Check for failing tests
Input: list of changed source files
Output: test results for affected test files
Verify: test runner exit code, not just stdout
Step 4: Post review comment
Input: findings + test results
Output: GitHub comment ID
Verify: HTTP 201 from GitHub API
Each step is independently testable. You can run step 2 in isolation with a canned diff, without touching GitHub. That's the property you want — it makes the whole workflow dramatically easier to debug.
In code, represent this as a typed step registry:
type StepResult<T> = { ok: true; value: T } | { ok: false; error: string };
interface WorkflowStep<TIn, TOut> {
name: string;
run: (input: TIn) => Promise<StepResult<TOut>>;
verify: (output: TOut) => boolean;
}
Step 2: Select tools
Tool selection is a matching problem: what can the agent call, and does that cover every step in the graph? For coding workflows, the standard set is:
| Tool category | Example | When to use |
|---|---|---|
| File I/O | readFile, writeFile | Reading source, writing patches |
| Shell execution | execSync, child_process | Running tests, builds, linters |
| Git operations | git diff, git log | Fetching diffs, history |
| HTTP client | fetch, axios | Calling GitHub, Jira, Slack APIs |
| Code search | grep, ripgrep, AST tools | Finding patterns across a codebase |
Resist the urge to give the agent every tool at once. More tools mean more possible wrong choices and a longer tool selection step in each model call. Give the agent the minimum set that covers the task graph.
Here's a minimal tool registry for the PR review workflow:
import { execSync } from "child_process";
import { readFileSync } from "fs";
export const tools = [
{
name: "git_diff",
description: "Get the unified diff for a pull request. Returns raw diff text.",
input_schema: {
type: "object" as const,
properties: {
base: { type: "string", description: "Base branch or commit SHA." },
head: { type: "string", description: "Head branch or commit SHA." },
},
required: ["base", "head"],
},
handler: ({ base, head }: { base: string; head: string }) => {
return execSync(`git diff ${base}...${head}`, { encoding: "utf-8" });
},
},
{
name: "run_tests",
description: "Run the test suite for a list of test files. Returns combined stdout/stderr.",
input_schema: {
type: "object" as const,
properties: {
files: {
type: "array",
items: { type: "string" },
description: "Paths to test files to run.",
},
},
required: ["files"],
},
handler: ({ files }: { files: string[] }) => {
try {
return execSync(`npx vitest run ${files.join(" ")}`, {
encoding: "utf-8",
timeout: 120_000,
});
} catch (err: unknown) {
const e = err as { stdout?: string; stderr?: string; message?: string };
return `FAILED:\n${e.stdout ?? ""}\n${e.stderr ?? e.message ?? ""}`;
}
},
},
];
Keep handlers thin. Business logic belongs in your application code, not in tool handlers. The handler's job is to execute and return a string — the agent's job is to decide what to execute.
Step 3: Write the orchestrator
The orchestrator runs the agent through each step in the task graph, checks verification conditions, and handles failures. It's not the same as the agentic loop from a simple agent — this is a higher-level controller that decides when to move forward, retry, or abort.
import Anthropic from "@anthropic-ai/sdk";
import { tools } from "./tools.js";
const client = new Anthropic();
interface StepSpec {
name: string;
prompt: string;
maxRetries?: number;
verify?: (result: string) => boolean;
}
async function runWorkflow(steps: StepSpec[], context: string): Promise<void> {
const workflowLog: string[] = [];
for (const step of steps) {
const maxRetries = step.maxRetries ?? 2;
let attempts = 0;
let success = false;
while (attempts <= maxRetries && !success) {
attempts++;
console.log(`\n=== ${step.name} (attempt ${attempts}) ===`);
const messages: Anthropic.MessageParam[] = [
{
role: "user",
content: `${context}\n\nWorkflow log so far:\n${workflowLog.join("\n")}\n\nCurrent step: ${step.prompt}`,
},
];
const result = await runAgentStep(messages);
if (step.verify && !step.verify(result)) {
console.warn(`Verification failed for step: ${step.name}`);
workflowLog.push(`STEP ${step.name}: FAILED VERIFICATION (attempt ${attempts})`);
continue;
}
workflowLog.push(`STEP ${step.name}: ${result.slice(0, 300)}`);
success = true;
}
if (!success) {
throw new Error(`Step "${step.name}" failed after ${maxRetries + 1} attempts. Aborting workflow.`);
}
}
console.log("\n=== Workflow complete ===");
console.log(workflowLog.join("\n\n"));
}
async function runAgentStep(messages: Anthropic.MessageParam[]): Promise<string> {
let localMessages = [...messages];
for (let i = 0; i < 10; i++) {
const response = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 4096,
tools: tools.map(({ name, description, input_schema }) => ({
name,
description,
input_schema,
})),
messages: localMessages,
});
if (response.stop_reason === "end_turn") {
return response.content
.filter((b) => b.type === "text")
.map((b) => (b as Anthropic.TextBlock).text)
.join("\n");
}
if (response.stop_reason === "tool_use") {
localMessages.push({ role: "assistant", content: response.content });
const results: Anthropic.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type !== "tool_use") continue;
const tool = tools.find((t) => t.name === block.name);
const result = tool
? String(tool.handler(block.input as never))
: `Unknown tool: ${block.name}`;
results.push({ type: "tool_result", tool_use_id: block.id, content: result });
}
localMessages.push({ role: "user", content: results });
}
}
return "Step exceeded maximum iterations.";
}
The key design decision here: the orchestrator carries a workflowLog across steps, injected into every step's prompt. This gives the agent running step 4 full visibility into what steps 1-3 produced — without relying on the model's internal memory, which resets between runAgentStep calls.
Step 4: CI integration
An agentic coding workflow that only runs locally isn't that useful. The real value comes when it runs automatically — on every PR, on a schedule, or triggered by a webhook.
Here's a GitHub Actions workflow that runs the PR review agent:
# .github/workflows/agent-review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
permissions:
pull-requests: write
contents: read
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # needed for git diff
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run agent review
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_BASE: ${{ github.event.pull_request.base.sha }}
PR_HEAD: ${{ github.event.pull_request.head.sha }}
PR_NUMBER: ${{ github.event.pull_request.number }}
run: npx tsx src/workflows/pr-review.ts
Two things to get right in CI: the fetch-depth: 0 flag (without it, git diff has no history to work with), and using GITHUB_TOKEN with the pull-requests: write permission so the agent can post its review comments.
Set a hard timeout on the job — 10 minutes is generous for a PR review. Agents that run over budget in CI will consume your GitHub Actions minutes fast.
Step 5: Observability
You can't improve what you can't see. Every agentic workflow needs structured logging from day one. At minimum, log the step name, model, token counts, tool calls, and wall-clock time per step.
interface StepTrace {
step: string;
model: string;
inputTokens: number;
outputTokens: number;
toolCalls: { name: string; durationMs: number }[];
totalDurationMs: number;
success: boolean;
}
const traces: StepTrace[] = [];
Write traces to a JSON file per run and commit a summary to your PR as a comment. It sounds like overhead, but the first time you debug a workflow that failed on step 3 of a 5-step chain, you'll appreciate having the full trace.
For teams running many agent workflows, a logging aggregator like Datadog or a simple Postgres table beats flat files quickly. The Laxaar team uses a Postgres table with a fixed schema per project — step name, run ID, timestamp, token counts, and success flag. That covers 90% of debugging needs without adding infrastructure complexity.
Common pitfalls
Skipping verification conditions. It's tempting to trust that if the agent says a step succeeded, it did. Don't. Add verify functions to every step that has a checkable output, especially steps that touch external APIs or run tests.
One massive system prompt for all steps. If you inject the entire workflow spec into every step's system prompt, the agent spends token budget processing context it doesn't need. Pass only the current step's instructions and the workflow log.
No retry budget. Network calls fail. Tests have flakes. Models occasionally call the wrong tool. Build in at least one retry per step, with exponential backoff on HTTP errors. A workflow that fails on a transient network blip isn't useful.
Agent-edited files not committed. If the workflow modifies source files, you need an explicit step to commit and push those changes. The agent doesn't know your Git workflow. Add a final git commit && git push step, or use a bot token with push access.
Frequently Asked Questions
How do I decide which steps to automate vs. keep human-in-the-loop?
Automate steps that are: deterministic, reversible, and well-defined. Keep humans in the loop for: irreversible actions (deploying to production, sending emails), ambiguous requirements, and decisions that require business context the agent doesn't have. When in doubt, add a human gate — you can always remove it once you've validated the agent's judgment.
What happens if an agent modifies a file incorrectly and commits it?
This is why you want the agent to work on a branch, not main. Always configure your CI workflow to run the agent on a feature branch and open a PR for human review before merging. Treat agent-generated code the same as intern-generated code: review it before it ships.
How do I handle secrets in agentic workflows?
Never inject secrets as tool arguments or into prompts. Use environment variables and access them in tool handlers — the same pattern you'd use for any Node.js application. Make sure your agent's system prompt doesn't instruct it to print or log secrets.
Can I run multiple agents in parallel for different steps?
Yes, for steps with no data dependency between them. Steps 2 and 3 in the PR review example (analyze files, run tests) can run in parallel since they both read from the diff but don't depend on each other's output. Use Promise.all to fan out parallel steps and collect results before the next sequential step.
How much does an agentic workflow cost per run?
It depends heavily on model, task complexity, and number of tool-call rounds. A PR review workflow using Claude Sonnet 4.5 typically uses 5,000–20,000 tokens per run, costing $0.02–$0.08. Multiply by your PR volume to get monthly estimates. Caching the system prompt with Anthropic's prompt caching feature cuts repeated input costs by ~90%.
The Laxaar team designs and builds agentic workflows for engineering teams who want AI in their development pipeline without the reliability risk. Talk to us about your use case or explore our automation services.


