Agentic Coding

Building AI Agents with Claude Code

Learn to build production AI agents using the Claude Code CLI — from tool definitions and agentic loops to running real commands and shipping reliable agents.

May 31, 2026 10 min read
Building AI Agents with Claude Code

Building AI agents with Claude Code is one of the most direct paths from idea to working autonomous software. Claude Code is Anthropic's terminal-based AI coding assistant (invoked as claude on the command line), and it's genuinely designed for agentic workflows: it can read files, run shell commands, edit code, and iterate on results without a human approving every step. That combination makes it a natural fit for building agents that do real work.

This tutorial walks through the full process: defining tools, wiring up the agentic loop, handling errors, and checking whether your agent is actually reliable before you ship it. We've built customer-support automation, internal data pipelines, and agentic coding tools at Laxaar, and the patterns here reflect what actually holds up in production.

Prerequisites: Node.js 20+, an Anthropic API key, and claude CLI installed via npm install -g @anthropic-ai/claude-code.

What you'll build

Step 1: Set up the project

Create a fresh Node.js project and install the Anthropic SDK. We'll use TypeScript throughout because type errors in tool schemas are painful to debug at runtime.

mkdir claude-agent && cd claude-agent
npm init -y
npm install @anthropic-ai/sdk
npm install -D typescript tsx @types/node
npx tsc --init --target ES2022 --module NodeNext --moduleResolution NodeNext

Set your API key:

export ANTHROPIC_API_KEY=sk-ant-...

Create src/agent.ts as the entry point. The file structure we use at Laxaar keeps tools in a separate module from the loop, which makes testing individual tools much easier.

claude-agent/
  src/
    agent.ts        # agentic loop
    tools.ts        # tool definitions and handlers
  tsconfig.json
  package.json

Step 2: Define tools

Tools are the actions your agent can take. Each tool has a name, a description the model uses to decide when to call it, and a JSON Schema defining its inputs. Vague descriptions produce incorrect tool calls. Be specific about what the tool returns, not just what it takes.

// src/tools.ts
import { execSync } from "child_process";
import { readFileSync, writeFileSync } from "fs";

export const toolDefinitions = [
  {
    name: "run_command",
    description:
      "Run a shell command and return stdout. Use for git operations, npm commands, file listing, and build steps. Avoid commands that require interactive input.",
    input_schema: {
      type: "object" as const,
      properties: {
        command: { type: "string", description: "The shell command to run." },
        cwd: {
          type: "string",
          description: "Working directory. Defaults to process.cwd().",
        },
      },
      required: ["command"],
    },
  },
  {
    name: "read_file",
    description: "Read the contents of a file at the given path.",
    input_schema: {
      type: "object" as const,
      properties: {
        path: { type: "string", description: "Absolute or relative file path." },
      },
      required: ["path"],
    },
  },
  {
    name: "write_file",
    description: "Write content to a file, creating it if it doesn't exist.",
    input_schema: {
      type: "object" as const,
      properties: {
        path: { type: "string" },
        content: { type: "string" },
      },
      required: ["path", "content"],
    },
  },
];

export function executeTool(
  name: string,
  input: Record<string, string>
): string {
  try {
    if (name === "run_command") {
      const result = execSync(input.command, {
        cwd: input.cwd ?? process.cwd(),
        timeout: 30_000,
        encoding: "utf-8",
      });
      return result.trim();
    }
    if (name === "read_file") {
      return readFileSync(input.path, "utf-8");
    }
    if (name === "write_file") {
      writeFileSync(input.path, input.content, "utf-8");
      return `Written: ${input.path}`;
    }
    return `Unknown tool: ${name}`;
  } catch (err: unknown) {
    // Return structured errors so the agent can react, not hallucinate
    const message = err instanceof Error ? err.message : String(err);
    return `ERROR: ${message}`;
  }
}

Notice the error handling in executeTool. Every tool handler catches exceptions and returns a plain string error. If you throw instead, the loop crashes. If you return nothing, the model hallucinates a result. Neither is acceptable in production.

Step 3: Write the agentic loop

The loop is the heart of the agent. It calls the model, checks the stop reason, dispatches tool calls, and appends results until the agent signals it's done.

// src/agent.ts
import Anthropic from "@anthropic-ai/sdk";
import { toolDefinitions, executeTool } from "./tools.js";

const client = new Anthropic();

const SYSTEM_PROMPT = `You are a coding agent. You have access to tools that let you
run shell commands, read files, and write files. Complete the user's task step by step.
When you're done, summarize what you did and what the result is.`;

async function runAgent(task: string, maxSteps = 20): Promise<void> {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: task },
  ];

  let steps = 0;

  while (steps < maxSteps) {
    steps++;

    const response = await client.messages.create({
      model: "claude-sonnet-4-5",
      max_tokens: 4096,
      system: SYSTEM_PROMPT,
      tools: toolDefinitions,
      messages,
    });

    console.log(`\n--- Step ${steps} (stop_reason: ${response.stop_reason}) ---`);

    if (response.stop_reason === "end_turn") {
      const text = response.content
        .filter((b) => b.type === "text")
        .map((b) => (b as Anthropic.TextBlock).text)
        .join("\n");
      console.log("Agent finished:\n", text);
      return;
    }

    if (response.stop_reason === "tool_use") {
      // Append the assistant turn first
      messages.push({ role: "assistant", content: response.content });

      // Collect all tool results for this turn
      const toolResults: Anthropic.ToolResultBlockParam[] = [];

      for (const block of response.content) {
        if (block.type !== "tool_use") continue;
        console.log(`  Tool call: ${block.name}`, block.input);
        const result = executeTool(block.name, block.input as Record<string, string>);
        console.log(`  Result: ${result.slice(0, 200)}`);
        toolResults.push({
          type: "tool_result",
          tool_use_id: block.id,
          content: result,
        });
      }

      messages.push({ role: "user", content: toolResults });
      continue;
    }

    // max_tokens or unexpected stop — treat as terminal
    console.warn("Unexpected stop_reason:", response.stop_reason);
    break;
  }

  console.warn(`Agent reached max steps (${maxSteps}) without finishing.`);
}

const task = process.argv[2] ?? "List the files in the current directory and summarize the project structure.";
runAgent(task);

The key structural point: tool results go back as a user turn containing an array of tool_result blocks, one per tool call in the previous assistant turn. Missing one result causes an API error. We learned this the hard way on an early project; the SDK error message isn't always obvious about which block is missing.

Step 4: Handle errors and stopping

Two classes of problems kill agents in production: runaway loops and unhandled tool failures. The maxSteps guard covers the first. Structured error returns from tools cover the second. Add one more layer: a token budget check.

// Add inside the while loop, after creating the response
const inputTokens = response.usage.input_tokens;
const outputTokens = response.usage.output_tokens;
const totalTokens = inputTokens + outputTokens;

// Hard stop if context is getting large
if (inputTokens > 150_000) {
  console.warn(`Context too large (${inputTokens} input tokens). Stopping.`);
  break;
}

console.log(`  Tokens this step: ${totalTokens}`);

Claude Sonnet 4.5 has a 200k context window, but you want to stop well before the limit. Model quality degrades as the context fills. We use 150k as a conservative threshold. Adjust based on your task profile.

For irreversible tools (sending email, writing to a database, calling external APIs), add a confirmation gate:

const IRREVERSIBLE_TOOLS = ["send_email", "delete_record", "charge_card"];

if (IRREVERSIBLE_TOOLS.includes(block.name)) {
  const { confirm } = await import("node:readline/promises");
  // In practice, use a proper readline interface
  console.warn(`HUMAN APPROVAL REQUIRED for ${block.name}:`, block.input);
  process.exit(1); // or prompt interactively
}

At Laxaar, we classify every tool before an agent goes to production. Read-only tools get automatic execution. Write tools get logged. Irreversible tools get a human gate. It's not glamorous, but it's what prevents incidents.

Step 5: Run and verify

Run the agent with a concrete task:

npx tsx src/agent.ts "Read package.json, check if typescript is installed, and if not, install it as a dev dependency."

Expected output:

--- Step 1 (stop_reason: tool_use) ---
  Tool call: read_file { path: 'package.json' }
  Result: { "name": "claude-agent", ...}
--- Step 2 (stop_reason: tool_use) ---
  Tool call: run_command { command: 'npm list typescript --depth=0' }
  Result: ERROR: npm list ...
--- Step 3 (stop_reason: tool_use) ---
  Tool call: run_command { command: 'npm install -D typescript' }
  Result: added 1 package ...
--- Step 4 (stop_reason: end_turn) ---
Agent finished:
 typescript was not installed. I've added it as a dev dependency ...

The agent reads the file, checks for the package, installs it, and reports back. That's the loop working. If something goes wrong at step 2, the structured error return lets the agent try an alternative rather than hallucinating success.

You can also drive this agent through Claude Code's own CLI for interactive sessions:

claude --model claude-sonnet-4-5 "audit this project for missing dev dependencies"

Claude Code runs the same tool-use protocol under the hood; it's a production-grade shell for exactly this kind of agentic work.

Common pitfalls

Mutable tool schemas at runtime. If you change tool definitions between steps (e.g., adding tools based on context), the API will reject the request or produce unpredictable results. Keep tool definitions static per run.

Not handling parallel tool calls. The model can call multiple tools in a single turn. If you only process the first block and ignore the rest, you'll get an API error on the next call because you're missing tool result IDs. Always loop over all blocks.

Forgetting the system prompt carries across turns. If your system prompt says "always respond in JSON," every turn (including tool-use turns) will try to format as JSON. Make system prompts tool-agnostic unless you specifically want that behavior.

Over-trusting agent output as ground truth. Always verify side effects. If your agent runs npm install, check that node_modules actually changed. Don't trust the agent's summary alone.

Frequently Asked Questions

What's the difference between the claude CLI and using the Anthropic SDK directly?

claude is a pre-built agentic shell that Anthropic ships with file, bash, and editor tools already wired up. The SDK gives you raw API access to build your own tool registry and loop logic. For custom agents with domain-specific tools, you'll want the SDK. For general coding tasks and interactive sessions, claude is faster to get started with.

How do I prevent my agent from running dangerous commands?

The safest approach is an allowlist. Instead of trying to block dangerous patterns (which attackers can bypass), define exactly which commands are allowed and reject everything else. For bash tools, something like ['npm', 'git', 'tsc', 'node'] as allowed prefixes covers most coding tasks without opening the door to arbitrary execution.

Can I run this agent headlessly in CI?

Yes. Set ANTHROPIC_API_KEY in your CI environment, make sure there are no interactive prompts in your tool handlers, and pipe a task string as an argument. The agent runs synchronously and exits when end_turn fires. Just be sure to set a step budget appropriate for your CI timeout.

How do I test tools in isolation before connecting them to the agent?

Write unit tests that call executeTool directly with known inputs and assert on the returned strings. Don't spin up the full agentic loop for tool testing; it's slow, costs tokens, and makes failures hard to isolate.

What model should I use for coding agents?

Claude Sonnet 4.5 is our default for coding tasks; it handles multi-step reasoning well and is fast enough for interactive use. Opus 4 is worth the cost for agents that make architectural decisions or work with unfamiliar codebases. Haiku is the right pick for high-volume, structured subtasks where the reasoning is simple.

How do I add memory so the agent remembers past runs?

Write a summary of each completed run to a file or database, and inject the last N summaries into the system prompt at startup. For longer-lived agents, a vector database lets the agent query past episodes semantically. Start with a flat file; it's enough for most projects.

The Laxaar team builds production agents for clients across industries. If you want expert help designing an agent architecture that actually ships, reach out to us or explore our AI agent services.

Claude CodeAI AgentsAgentic Coding
Grow your business with us

Take your business to the next level.

Tell us what you're building. We'll come back inside one business day with a fixed scope, timeline, and team — or an honest “this isn't a fit”.

ENGINEERING PHILOSOPHY

Code is useless if it's not comprehensible to those who maintain it. We write code the next person can actually understand.