This is the practitioner walkthrough of the AI-augmented software development lifecycle. Each stage has an AI role, a human role, the artefacts that change, and the tooling decisions that matter. The summary headline: discovery and build compress; specification expands; test and review keep the same shape but produce better evidence; deploy and operate become more automated; user research and architectural judgement remain human.
This guide is a spoke of the AI-augmented development pillar guide. It is written for engineering leads, tech leads, and principal engineers who need to introduce AI to a team without losing the gates that keep production safe.
Why does the SDLC change when AI is integrated end-to-end?
The shape of the lifecycle is the same. The time distribution across the stages is different.
In traditional delivery, the bulk of effort sits in build and test. Discovery is short. Specification is often informal. Review is variable. Operate is reactive.
In AI-augmented delivery, the bulk of effort shifts forward. Discovery is more thorough because AI compresses the cost of investigating an existing system. Specification expands because spec quality directly controls AI output quality. Build compresses because most scaffolding and routine work is AI-authored. Test and review keep their shape but produce richer evidence. Deploy and operate become more automated.
The net is delivery time. Talk Think Do measures 40 to 50% faster delivery across active projects, with the gain weighted toward the middle of the lifecycle. The front end stays roughly the same; the back end gets more reliable.
Discovery: what changes when an agent reads the codebase?
The risk of discovery is incompleteness. Stakeholders forget. Documentation is wrong. Codebases hide constraints that nobody talks about.
AI helps by reading what is actually there. An agent can:
- Map the call graph and the data model.
- Identify integration points and external dependencies.
- Surface coupling, complexity hotspots, and risk areas.
- Compare the codebase against architectural intent (where ADRs exist).
- Draft a discovery report with citations to specific files and commits.
Humans interview the stakeholders, set the scope, and decide what good looks like. The artefacts of discovery (the discovery report, the scoping document, the proposal) are AI-drafted and human-approved.
The compression is real. A discovery phase that would have taken three weeks on a 100k-line codebase often closes in one to two weeks with current tooling. The senior architect is still needed; they spend more time deciding and less time gathering.
Specification: why spec-first delivery raises AI accuracy
The risk in AI-augmented build is that AI generates plausible code that does the wrong thing. The control is a strong specification that is also the acceptance criteria.
Spec-first delivery means writing the spec before implementation begins. Tools like OpenSpec make this a structured artefact rather than a Word document. The spec covers user intent, acceptance criteria, error cases, integration contracts, and the data model.
Two practical effects:
- AI output improves. A clear spec gives the agent context that prompt engineering cannot match. The first cut of implementation is much closer to what was wanted.
- Review becomes easier. Reviewers check against the spec, not against an internal model of what the engineer might have intended.
The price is real. Writing a strong spec takes longer than writing an informal one. The gain is in build and review, where the saved time is multiples of the spec investment.
The Q1 2026 AI Velocity Report cites OpenSpec maturity as one of the four drivers behind the move from 51% to 84% AI-authored code.
Build: how IDE agents and cloud agents share the work
The risk of agentic build is local-optimal code that is globally wrong: a sensible change in isolation that conflicts with the architectural intent of the system.
Two patterns work well together.
IDE agents (Cursor, Claude Code) handle changes the engineer is driving. The engineer reviews each step, refactors where AI is locally optimal but globally wrong, and accepts what is right. Agent rules in the repository keep the AI’s working context aligned with project standards. Reusable skills package proven delivery patterns.
Cloud agents (Cursor background agents, GitHub Coding Agent, Claude Code cloud agents) handle longer-running tasks: migrations, large refactors, repetitive issue lists. They run unattended, open pull requests, and wait for human review. The review gate does not relax because the agent ran in the cloud.
The engineer’s role shifts. Less typing. More specification, more review, more architectural judgement. The team shape rewards seniority. Junior engineers still write code, often without AI assistance, to maintain the learning curve. The risks of AI-augmented development guide covers the skills decay risk in more depth.
For the practical configuration of an IDE agent, our Claude Code for .NET developers guide and Claude Code configuration guide cover settings, hooks, MCP, and skills in detail.
Test: AI test authoring with ISTQB-qualified review
The risk is that AI-authored tests test the implementation rather than the intent. They pass because the code does what it does, not because the code does what it should.
The discipline:
- Acceptance-criteria tests come from the spec. Not from the code. The spec is the source of truth.
- AI authors unit, integration, and contract tests. Then runs them, interprets failures, and proposes fixes.
- ISTQB-qualified QA designs the test strategy. Validates coverage against user intent. Runs exploratory testing. Approves releases.
- The QA gate sits in CI. Fail to pass it, fail to merge.
In production, this produces fewer defects than manually written code. The Q1 2026 data shows AI-generated code reviewed by senior engineers produces fewer defects than manually written code in the same systems. The gain is not because AI is more careful. It is because the test coverage is more thorough and the spec is stronger.
Review: senior engineer review, gated pull requests, security review
The risk is that AI-authored changes are rubber-stamped because reviewers cannot keep up with the volume.
The control is structural, not behavioural. Reviewers do not have to read every line if the gates work.
- CI status checks for tests, security scans, secret scans, licence checks, accessibility checks, and attribution metadata. Required to merge.
- Architecture review trigger. Changes touching specific paths in the repository require an architecture sign-off, automatically. Not by reviewer discretion.
- Senior engineer review. Not a sample. Every change. The reviewer’s job is to check architectural intent, not to play diff-spotter against the lines the CI gates already cover.
Pull request size matters. AI tends toward large diffs because it can. A pull request bigger than 300 lines is usually a sign that the work should have been split. CI can enforce that as a gate.
Deploy: CI/CD with agent-checked acceptance criteria
The risk is that a change passes all the CI gates and still fails the user story. The gates check that the code works; they do not always check that the code does the right work.
MCP-connected agents close this gap. The agent reads the acceptance criteria from the work item, runs the relevant checks against the deployed code, and reports the result on the pull request. The reviewer sees a green check for “acceptance criteria met” or an explicit list of what failed.
The deploy decision is still a human one. A senior engineer or release manager approves the change. The change goes through staged rollout, with feature flags where appropriate, and Application Insights or equivalent tracking the live behaviour.
For the engineering practice underlying this, our DevOps maturity assessment guide covers the gates and metrics that matter.
Operate: observability, incident response, regression triage
The risk in operation is that signal volume outstrips human attention. Logs, alerts, metrics, and traces all generate more data than a team can read.
AI compresses the triage cycle. An agent reads the logs, identifies the regression, summarises the timeline, and proposes a mitigation. The human runs the incident, communicates with affected users, and approves the mitigation.
The artefacts of operation become richer:
- Incident timelines drafted from logs, not reconstructed from memory.
- Post-incident reviews that surface the actual change, the test that failed to catch it, and the gate that should have caught it.
- Regression triage that points to a specific commit, not a vague “something broke yesterday”.
Operations rotas do not shrink. The number of incidents does not change. The time per incident drops.
Where does the human keep the lead?
Five parts of the lifecycle stay human-led. AI accelerates their artefacts; it does not substitute for the practice.
- User research and stakeholder interviews. Trust, judgement, reading what people do not say.
- Architectural judgement. When to rewrite, when to migrate, when to absorb the cost.
- Service assessment and governance panels. Public-sector assessments and equivalent private-sector boards.
- Incident communication. Talking to users when something is wrong.
- Senior engineer review. Every AI-generated change. Not a sample.
A delivery plan that underweights these is the most common adoption failure. Teams that treat AI as a substitute for senior judgement produce a brittle codebase and a thin team. Teams that treat AI as leverage on senior judgement produce the opposite.
Where to start
If you are introducing AI-augmented practices to a team that currently does AI-assisted work:
- Pick one project. Greenfield is easier; brownfield is more representative.
- Adopt spec-first delivery on it. OpenSpec or equivalent. The spec is the prompt.
- Stand up one MCP server. Work items first. The integration that closes the loop between intent and implementation.
- Wire one new CI gate. Acceptance criteria check, attribution metadata, or PR size cap.
- Measure for one quarter. Delivery time, defect rate, review time. Compare against the previous quarter.
The compounding starts when the practice generalises across projects. The Q1 2026 84% figure took eight to ten quarters of structured work to reach. The first quarter usually produces a 10 to 20% gain.
For the wider context, the AI-augmented development pillar guide sets the definition. For the risk picture, the risks of AI-augmented development guide covers the controls. For the configuration detail, our harness templates guide covers rules, skills, and the harness setup.
Book a consultation to discuss your team’s specific position, or explore our AI Development and Implementation service for the underlying engagement model.
Frequently asked questions
Does AI change the software development lifecycle or just the tools?
What is spec-first delivery and why does it matter for AI?
What are MCP servers and which are most useful?
How does QA work in an AI-augmented team?
Does the deploy step change with AI?
What does AI-augmented incident response look like?
Where does the human always keep the lead?
Related guides
The Risks of AI-Augmented Development
AI-augmented delivery introduces specific risks across IP, attribution, regulation, security, quality drift, and skills. Eight risks and the controls that contain them.
AI-Augmented Development for Public Sector and GDS
How AI-augmented delivery aligns with the GDS Service Standard, Technology Code of Practice, and UK public-sector procurement. Practical guidance for delivery teams and buyers.
AI-Augmented vs AI-Assisted Development: The Difference
AI-assisted development uses AI as an editor helper. AI-augmented development integrates AI across the lifecycle, with measurable delivery gains. The distinction in 2026.