Why did Talk Think Do stop using OpenSpec?

OpenSpec kept specifications as files in the code repository, which coupled planning to an engineering tool and hid how scope and cost had moved against estimate. Requirements work involves business analysts, delivery managers, and clients, so the repository was the wrong workspace for it.

What replaced OpenSpec at Talk Think Do?

Discovery and refinement now happen in Claude Team, governance lives in Azure DevOps work items, and a Talk Think Do Delivery plugin and MCP server enforce house standards and traceability. Engineering tools like Cursor and Claude Code still author the code.

How does Talk Think Do track AI cost per project?

Agents update each work item as they go, capturing the model used, tokens consumed, and AI cost on every user story. Story-level actuals roll up into epic and feature figures that are compared against the stage 1 estimate.

Did Talk Think Do keep anything from OpenSpec?

Yes. The planning and change discipline it encouraged was good and has been kept. The plans and change records now live in Azure DevOps as tasks linked to work items, rather than as files in the repository.

Why We Moved Away from OpenSpec | AI Velocity Report

Q: Is spec-driven development the same as waterfall?

No. Waterfall is a commercial posture where scope is fixed up front and the contract is the source of truth. Spec-driven development is a delivery practice: requirements are written before code and acceptance criteria are explicit. Talk Think Do runs spec-driven delivery without freezing scope contractually.

In April, we listed OpenSpec maturity as one of the top four drivers of our 84% AI-authored code figure. Two months later, we are walking away from it. This is the kind of reversal the 3-month review cycle exists to surface, and it is worth being direct about why.

OpenSpec, for readers who have not come across it, is an open-source framework for spec-driven AI development. Engineers write specifications as files in the repository, AI generates implementations that follow them, and the specifications act as both the prompt and the acceptance criteria. For an engineer working on a feature, it worked.

84% AI-authored code in Q1 2026, with OpenSpec maturity cited as a top-four driver. Two months later, we have moved away from it.

The problem is that requirements work is not just an engineering activity. Discovery, refinement, scope conversations, and stakeholder alignment all sit upstream of the repository. They involve business analysts, delivery managers, QA leads, and the client.

This is not a tooling or access question. Every member of our team has GitHub Enterprise and Claude Code at their fingertips. Business analysts routinely use Claude and Cursor for prototyping, and they are good at it. The point is more fundamental: the repository is the wrong workspace for the work that happens before code, even when everyone can technically use it. Specs in the repository coupled the planning process to a tool optimised for engineering, and that does not work for a multi-disciplinary team.

How do we actually deliver?

A note on terminology, because it matters here. Spec-driven development is often equated with waterfall. It is not the same thing. Waterfall is a commercial and contractual posture: scope is fixed up front, change requests handle deviation, and the contract is the source of truth. Spec-driven is a delivery practice: requirements are written before code, specifications drive implementation, and acceptance criteria are explicit. You can do one without the other, and we do.

We can run fixed-price waterfall when a client needs it. It requires a more expensive discovery activity that specifies every detail to the point it can be contractually understood, and it tends to be more expensive again on delivery because, as we know from experience, scope still drifts. Thinking evolves during a build. The original scope is rarely the final story. Fixed-price absorbs that drift through change requests, which create their own overhead and adversarial dynamic.

So we usually do something different that we find works far better and is more cost-effective. We cost products against high-level stage 1 estimates at the user story level, which roll up into budgets at epic and feature level. Though these are high level, we have high confidence in our estimates, based on the discovery process data we track and the experience we have with how thinking evolves as clients iterate and understand more about what they want. Both our team and the client team are motivated to deliver to those budgets and work in partnership to achieve shared goals. The work is still spec-driven. Every user story has acceptance criteria. Every test ties back to them. What we are not doing is contractually freezing the spec at month one.

It is a deliberately semi-agile model. We avoid the cost, rigidity, and risk of specifying everything up front. We keep room for the inevitable learning that happens during a build. But the trade is real: this model only works if traceability and reporting are bulletproof. Without that, you cannot tell whether you are on budget at feature level, whether scope has moved, or whether AI cost is landing where you estimated it.

OpenSpec was working against that model rather than supporting it. Specs in the repository could not tell us how scope had moved since estimate, what it had cost us, or what was driving variance at epic level. We were losing visibility precisely where we needed it most.

What did we replace OpenSpec with?

The replacement is not another spec framework. It is a planning surface that everyone can work on, backed by delivery infrastructure we have built and own.

Claude Team has become the working environment for discovery and refinement. Business analysts and delivery managers can take meeting transcripts, stakeholder notes, and early thinking and beat epics, features, and requirements into shape collaboratively before anything is committed. The engineering tools have not gone anywhere. They sit alongside, used for what they are best at.

To make that work consistently across the firm, we have developed a Talk Think Do Delivery Claude plugin. It gives our team org-wide skills (reusable building blocks that package proven delivery patterns) for creating high-quality user stories, epics, features, and bugs, in our house style and to our standards. The plugin is the reason a business analyst in one engagement and a delivery manager in another are producing work items that look and behave the same way.

The plugin pairs with our Talk Think Do Delivery MCP, a Model Context Protocol server that fronts Azure DevOps and Jira Service Desk and is opinionated about the fields and process we use. Skills and rules in the repository invoke it to ensure planning and task breakdown are represented in Azure DevOps consistently, that test case management follows our standards, and that the work item structure is something we can report against rather than something each project shapes ad hoc.

In practice, most of our interaction with Azure DevOps happens through Claude chat. A delivery manager describes what they need in natural language. The skills shape the output to our standards. The MCP enforces validity against Azure DevOps fields and process, and writes the result. The Azure DevOps interface is where work items are rendered and reviewed, not where they are authored. That sounds like a small change. It is not. It is the difference between every team member fluently producing well-formed, traceable work items and only the people who know Azure DevOps well being able to contribute.

The traceability layer is where this earns its place against our delivery model:

End-to-end and unit tests are linked to acceptance criteria and tagged with Azure DevOps work item numbers. Coverage is a property of the work item, not a separate report someone has to assemble.
Agents update the work item as they go. The model used, tokens consumed, and AI cost are captured on each user story. Story-level actuals roll up into epic and feature numbers we can compare against the stage 1 estimate.

That is what gives both us and the client the confidence to work to budgets without specifying every detail up front. The visibility is there throughout, not assembled after the fact.

Why does this matter more than the tooling choice?

The deeper point is that requirements traceability and test coverage are not a documentation exercise. They are the mechanism that lets a budget-led, spec-driven model actually function. Without them, you are either back to fixed-price waterfall or you are absorbing variance you cannot explain.

Specs in the repository conflated authoring with governance, and it conflated engineering with planning. Pulling them apart, with planning in Claude Team, governance in the work item system, and authoring close to the engineer, has restored a view of the engagement we had partially lost. It has also brought more of the firm into the AI-assisted way of working, which is a story we will come back to.

A note on the review cycle

We drop things regularly. Three months is a long time in the current climate, and our review cycle exists precisely so that we can act on what changes within it. We are selective about what we adopt in the first place, picking only the tools and practices we think are genuinely important to evaluate, but once something is in, we hold it to the same evidence bar every quarter.

What is notable about OpenSpec is the speed of the turnaround. It is the quickest we have evaluated something, adopted it, and then walked away from it. That is partly the pace of the broader market, and partly that the failure mode only became visible at scale, on the kind of larger engagements where traceability is non-negotiable.

We also learned a lot from OpenSpec. The planning and change approach it encouraged is genuinely good, and we have kept it. What has changed is where the artefacts live. The plans and change records now sit in Azure DevOps as tasks, linked to the work items they relate to, rather than as files in the repository. Same discipline, different storage, and the storage is the part that determines whether the rest of the team can see and act on the work.

The commitment is not to any single tool. It is to running the review, holding the estimates honestly, learning what we can from what we drop, and being willing to walk away from something we publicly endorsed when the evidence stops supporting it.

If you can see scope drift in real time, you can also see when your original estimate was wrong, and you can see it early enough to do something about it. That is the subject of next month’s piece.

Why we've moved away from OpenSpec

How do we actually deliver?

What did we replace OpenSpec with?

Why does this matter more than the tooling choice?

A note on the review cycle

Frequently asked questions

Want to talk about what we're seeing?