Agentic Coding: A New Way to Ship Software
Agentic coding uses AI agents to write, test, and iterate on code autonomously. Learn how to structure your workflow, tools, and review process to ship faster.

Software development has always had a feedback loop problem. You write code, run tests, read the failure, fix the issue, repeat. The loop works, but it's slow because every iteration requires a developer to context-switch, read output, form a hypothesis, and write a fix. Agentic coding shortens that loop by letting an AI agent run it autonomously: writing code, executing tests, reading failures, and iterating without waiting for a human between each step.
Agentic coding is the practice of using AI agents to autonomously generate, test, and refine code within a defined scope, with a human setting the goal and reviewing the result rather than writing each line. The agent has access to a code editor, a shell, test runners, and optionally external tools. It runs the loop until the tests pass or it hits a defined stopping condition.
The Laxaar team has been using agentic coding workflows for over a year across client projects in fintech, SaaS, and enterprise tooling. It's not magic, it's not replacing engineers, and it doesn't work well without the right setup. But when the workflow is right, it genuinely changes how fast you can move.
What you'll learn
- What agentic coding actually means in practice
- The tools that make it work
- Structuring tasks for agent success
- Test-driven agentic loops
- Review and quality control
- Where agentic coding breaks down
- Frequently Asked Questions
What agentic coding actually means in practice
There's a spectrum. At one end: a developer writes a detailed spec, an agent generates code, and a developer reviews and merges it. At the other end: an agent reads a GitHub issue, writes a branch, runs CI, fixes failures, and opens a pull request, with a human only at the review step.
Most teams operate in the middle. The agent handles implementation details while the developer owns the spec and the review. This isn't about removing engineers from the process. It's about shifting what engineers spend their time on: from typing code to defining what correct looks like and verifying that the agent found it.
The practical workflow looks like this: a developer writes a clear task description with acceptance criteria, assigns it to an agent session (Claude Code, Cursor in agent mode, or a custom harness), and the agent iterates until the criteria are met or it gets stuck and asks for clarification. The developer reviews the diff, not the keystrokes.
That shift, from reviewing a process to reviewing an outcome, is the actual change in how work gets done. It requires different skills from developers: precise spec writing, critical diff review, and knowing when to take the keyboard back.
The tools that make it work
Agentic coding requires a specific set of tool access for the agent to operate effectively. Without the right tools, the agent is writing code blind.
Code editor access. The agent needs to read, write, and navigate files, not just generate text. Tools like Claude Code (via the CLI), Cursor's agent mode, or Aider give the agent actual file manipulation capabilities, not just code suggestions.
Shell access. Running tests, compiling, linting, and checking output is how the agent closes its feedback loop. An agent that can't run pytest or npm test can't iterate on failures. Shell access is non-negotiable for effective agentic coding.
Version control awareness. The agent should be able to read git diffs, understand what changed, and in more autonomous setups create branches and commits. This context keeps the agent from making changes that conflict with recent work.
Documentation access. Agents that can read framework documentation (via MCP servers, fetched docs, or indexed references) produce code that matches current API conventions rather than training-data patterns that may be outdated.
# Example: running Claude Code in agent mode with tool permissions
# The --allowedTools flag controls what the agent can access
claude --allowedTools "Bash,Read,Edit,Write,Glob,Grep" \
--print \
"Implement the UserPreferences service in src/services/user-preferences.ts.
Requirements are in docs/specs/user-preferences.md.
Tests are in tests/services/user-preferences.test.ts — make them pass.
Do not modify any files outside src/services/ and src/types/."
The scope constraint in that last line matters a lot. An agent with unlimited file access on a large codebase will make changes you didn't ask for. Scoping to specific directories is a simple guardrail that prevents a significant category of review surprises.
Structuring tasks for agent success
The quality of agentic coding output correlates directly with the quality of the task definition. Vague tasks produce vague code. A task like "add user preferences" will produce something, but probably not what you wanted. A well-structured task produces a reviewable diff.
A good agentic task has four parts:
-
What to build. Specific, scoped to a function, module, or feature. Not "improve the API" but "add a
GET /users/:id/preferencesendpoint that returns the user's notification settings." -
Where to build it. File paths, module boundaries, naming conventions to follow. The agent should not have to guess where the code goes.
-
How to know it's done. Existing tests to make pass, new tests to write, or a specific output format to produce. "Make the tests pass" is concrete. "Make it work" is not.
-
What not to touch. Explicit exclusion of files or modules the agent should not modify. This prevents scope creep and makes diffs reviewable.
## Task: Add rate limiting to the search API endpoint
**File:** `src/api/routes/search.ts`
**Related types:** `src/types/rate-limit.ts` (read only, do not modify)
**Tests to pass:** `tests/api/search-rate-limit.test.ts`
### Requirements
- Limit authenticated users to 60 requests per minute per user ID
- Limit unauthenticated requests to 10 per minute per IP
- Return HTTP 429 with `Retry-After` header when limit is exceeded
- Use the existing Redis client at `src/lib/redis.ts`
### Do not modify
- `src/api/routes/auth.ts`
- `src/middleware/` (use existing middleware, don't add new files here)
- Any test files other than `tests/api/search-rate-limit.test.ts`
This level of specificity feels like more work upfront. It is. But it saves more time in review than it costs in spec-writing, and it produces agents that don't wander.
Test-driven agentic loops
Test-driven development pairs particularly well with agentic coding. Write the tests first, hand them to the agent, and let it iterate until they pass. The tests define correctness; the agent finds an implementation that satisfies them.
# Example: pre-written tests the agent iterates against
# tests/services/invoice-parser.test.py
import pytest
from src.services.invoice_parser import InvoiceParser
@pytest.fixture
def parser():
return InvoiceParser()
def test_extracts_total_amount(parser):
pdf_text = "Invoice #1042\nSubtotal: $450.00\nTax (8%): $36.00\nTotal: $486.00"
result = parser.parse(pdf_text)
assert result.total == 486.00
def test_extracts_line_items(parser):
pdf_text = "Invoice #1042\nConsulting services: $300.00\nDesign review: $150.00\nTotal: $450.00"
result = parser.parse(pdf_text)
assert len(result.line_items) == 2
assert result.line_items[0].description == "Consulting services"
assert result.line_items[0].amount == 300.00
def test_handles_missing_tax_line(parser):
pdf_text = "Invoice #1042\nService fee: $200.00\nTotal: $200.00"
result = parser.parse(pdf_text)
assert result.tax == 0.0
assert result.total == 200.00
def test_raises_on_unparseable_content(parser):
with pytest.raises(ValueError, match="Unable to extract total"):
parser.parse("This is not an invoice")
The agent runs pytest, reads the failures, edits the implementation, runs again, and repeats. When all tests pass, it's done. You review the implementation, not the process.
This approach works best when your test suite is honest about edge cases. Tests that only cover the happy path will produce an agent that only handles the happy path. The tests are the spec; write them accordingly.
At Laxaar we've found that TDD-style agentic loops cut implementation time on well-scoped modules by 40–60% compared to a developer writing the same code from scratch. The agent is faster at the typing. The developer's time goes into writing good tests and reviewing the result.
Review and quality control
The review step is where engineering judgment still lives, and it's where teams that rush tend to create technical debt. A passing test suite doesn't mean correct code. It means the code passes the tests you wrote.
What to check in an agentic coding review:
Correctness at the boundaries. The agent often handles the main cases well and is sloppy at edges: null handling, empty collections, concurrent access. Read those paths specifically.
Code style and consistency. Agents trained on diverse codebases will sometimes choose patterns that work but don't match your conventions. A consistent codebase is easier to maintain; enforce your style even when the logic is correct.
Unnecessary complexity. Agents occasionally over-engineer. If the agent wrote a three-class hierarchy where a function would do, simplify it. Don't let generated complexity accumulate.
What the agent didn't touch. Check whether the agent correctly scoped its changes. A stray edit to a shared utility or a config file it wasn't supposed to touch is easier to catch in review than in production.
The review is faster than writing from scratch, but it's not free. Budget real time for it. A 200-line diff that "just adds rate limiting" can have subtle issues that take 30 minutes to find if you read carefully.
Our agentic development workflows guide goes deeper on structuring multi-task agent pipelines and team workflows. The agentic coding expertise page covers how Laxaar integrates these practices into client engagements.
Where agentic coding breaks down
Agentic coding isn't the right tool for every task. Knowing where it fails saves more time than knowing where it works.
Poorly defined requirements. If you can't write a clear spec, the agent can't implement a clear solution. Agentic coding amplifies your requirements clarity; it doesn't substitute for it.
Cross-cutting architectural changes. Refactoring how authentication works across 40 files, or changing a core data model, requires understanding how parts connect in ways that's hard to capture in a task description. These tasks need a developer, not an agent.
Novel algorithms. When the correct approach isn't well-represented in training data (a custom optimization, a domain-specific heuristic), the agent will default to standard patterns that may not be right. You need a developer who understands the problem deeply.
Security-sensitive code. Agents can write code that passes tests and introduces subtle security vulnerabilities. Authentication flows, input sanitization, cryptographic operations: these need human review that goes beyond "do the tests pass." We treat security-sensitive code as mandatory manual implementation at Laxaar.
| Task type | Agentic coding fit | Notes |
|---|---|---|
| CRUD endpoints with clear spec | Excellent | Test-driven loop works well |
| Unit-testable utility functions | Excellent | High iteration speed |
| Data transformation / parsing | Good | Write edge-case tests explicitly |
| UI components from design spec | Good | Visual review still required |
| Complex state management | Fair | Architect the design first |
| Architectural refactors | Poor | Needs deep system understanding |
| Security-critical code | Poor | Manual implementation preferred |
Frequently Asked Questions
Does agentic coding work without a test suite?
It works worse. Without tests, the agent has no automated feedback loop and can't tell whether its changes are correct. You end up reviewing more code with less confidence. If you don't have a test suite, start with well-scoped tasks that have clear, verifiable outputs ("generate a function that parses this CSV format and returns this data structure") and manually verify the output rather than running tests.
How do you prevent the agent from making changes outside its assigned scope?
Two mechanisms: explicit "do not modify" lists in the task description, and file-level tool permissions that physically prevent the agent from writing outside specified paths. Claude Code's --allowedTools and file path scoping let you restrict write access to specific directories. Don't rely on the agent respecting a polite instruction when you can enforce it with a guardrail.
How long should an agent run before a human checks in?
For well-scoped tasks (one module, clear tests), let it run to completion, usually 5–20 minutes. For larger tasks, check in after the agent's first meaningful commit or after 30 minutes, whichever comes first. If the agent is still iterating on the same test failure after 10 attempts, it's stuck. Take over rather than waiting for it to escape on its own.
Can multiple agents work on the same codebase simultaneously?
Yes, with isolation. Assign each agent to a separate git branch and separate file scope. Merging the branches is the same as merging any parallel work: you need to resolve conflicts and verify integration. We sometimes run two or three agents on independent modules in parallel and merge the branches at the end of a session. It works well when the modules are genuinely independent; it breaks down when they share types or utilities that both agents try to modify.
What's the right ratio of agentic to manual coding on a project?
It varies by project type. On greenfield API development with good test coverage, the Laxaar team currently runs 50–70% of implementation through agentic loops. On complex frontend state, legacy system integration, and anything security-critical, we write manually. The ratio shifts toward more manual work as the system matures and the problems become more about understanding subtle interactions than generating new code.
Want to integrate agentic coding into your team's workflow? Talk to Laxaar. We can assess your codebase, help you write the right tooling setup, and train your engineers on the review practices that keep quality high.


