Why does Talk Think Do run a 3-month review cycle on its AI tools?

AI tooling is powerful but not yet stable infrastructure. Models get retuned, pricing tiers are restructured, and vendors are acquired. A 3-month cycle is short enough to catch material changes before they compound and long enough to deliver projects between reviews.

How does the proposed Cursor acquisition affect Talk Think Do clients?

Claude models accessed through Cursor do a meaningful share of Talk Think Do's delivery work, so a change of ownership is a live variable. Talk Think Do is watching pricing, terms of service, data and IP handling, and EU regulatory exposure, and will raise relevant changes with clients proactively.

How does Talk Think Do estimate AI costs?

Token cost is an explicit budget category alongside hosting, licensing, and third-party APIs. Estimates are built from the assumed model, volume, and price per million tokens, and the estimation models output token cost in addition to human effort.

Is Talk Think Do moving off Cursor?

Not today. The combination of Cursor and Claude is currently its best answer for the work it does. The review cycle exists so that any move is evidence-led rather than reactive, and workflows are kept portable across tools.

What did Talk Think Do change after recent AI tooling instability?

It made token cost an explicit line item, decoupled delivery from any single tool by keeping rules, skills, and MCP servers portable, and began testing tools for reliability on real workloads, not just benchmark capability.

The Tooling Beneath the Productivity | AI Velocity Report

This week, SpaceX announced an option to acquire Cursor for $60 billion. Last week, Anthropic published a post-mortem admitting that three separate engineering missteps had quietly degraded Claude Code’s quality for over a month. One of them, a system prompt change that capped model responses at 25 words between tool calls, measurably hurt coding output before it was reverted. Cursor too has been far from immune from stability issues, and fixing this is an area of focus for them.

Token pricing keeps moving, especially with new model releases. Subscription tiers are renamed, repackaged, and repriced on a near-quarterly cadence.

At Talk Think Do, we have gone all-in on AI-assisted delivery. The Q1 2026 numbers we published in this newsletter are real: 84% AI-authored code, 40-50% faster delivery on representative engagements. This was delivered, mainly, on Cursor and Claude.

It is why we are watching the Cursor situation closely. Cursor is a meaningful part of how our team ships software and is the tool of choice for engineers (paired with Opus and Sonnet models). We know this $60 billion acquisition may cause us to have to pivot and is exactly the kind of variable our review cycle is built to absorb. We are not making any moves yet, but we are paying attention, and clients should expect us to.

$60bn The proposed acquisition of Cursor, the IDE doing a meaningful share of the work behind our Q1 numbers. Exactly the kind of variable our review cycle exists to absorb.

What is the difference between the cutting edge and the bleeding edge?

Our original plan was to operate on the cutting edge, not the bleeding edge. There is a meaningful difference. The cutting edge is where you adopt tools and techniques that are proven enough to bet client work on, ahead of the broader market but behind the people taking the real risks. The bleeding edge is where you are discovering the failure modes yourself, in production, on someone else’s deadline.

That distinction is getting harder to maintain. Things are moving too fast for the cutting and bleeding gap to hold the way it used to. Models that were market-leading in February are middle-of-the-pack by April. A coding integrated development environment (IDE) we built workflows around is now potentially part of an entity that combines a rocket company, an AI lab, and a social network.

The tools producing our results are owned and operated by companies still figuring out their business models, their alignment, their pricing, and in some cases, even their ownership structure.

We would rather recognise that and manage it than pretend otherwise. It makes project estimation genuinely hard, and pretending it does not is, increasingly, the riskier path for clients.

What have we changed in how we estimate and deliver?

Token costs are a line item in our estimates. Not a footnote, not bundled into an opaque ‘AI overhead’, but an explicit budget category alongside hosting, licensing, and third-party application programming interfaces (APIs). Token cost estimates are built from the assumed model, the assumed volume, and the assumed price per million tokens; our estimation models have been updated to output this in addition to actual human effort.

We do not couple our delivery to any single tool. Cursor is excellent right now. So is Claude Code. GitHub Copilot is catching back up, depending on the workflow.

We use what works for the job in front of us, and we deliberately keep our thinking in source code repositories and our workflows portable. The artefacts we own and control are Cursor Rules, Skills, OpenSpec definitions, and custom Model Context Protocol (MCP) servers. The IDE underneath them is replaceable. If Cursor’s roadmap shifts post-acquisition, switching (or, more likely, adding compatibility) is a small task, not a project.

We test for reliability, not just capability. A new model that scores higher on a benchmark but stalls during use, or produces sub-standard output for real workloads, is worse than a stable, less capable one. We track real reliability across our delivery teams on real projects and feed that back into which tools sit at the centre of which workflows.

What are we specifically watching with the Cursor acquisition?

It is worth being direct about this, because it is not an abstract industry story for us: Claude models accessed through Cursor are our current sweet spot. That specific combination is doing a meaningful share of the work behind the 83% number. The Cursor acquisition therefore is not a curiosity we are tracking out of professional interest. It is a live variable sitting underneath active client engagements.

The technical risks are obvious and well-discussed elsewhere:

Model lock-in
API access changes
Performance regressions
Roadmap shifts toward Grok and Composer

We are watching all of those. The risks we think are under-discussed, and arguably more consequential for a consultancy, are the non-technical ones.

Data, intellectual property (IP), and contractual posture. Our clients’ code, architectural decisions, prompts, and in some cases sensitive business logic flow through Cursor. The current terms of service protect that material in ways we have reviewed and signed off on. A change of ownership can rewrite those terms.

The new parent’s approach to training data, retention, and sub-processing will be a different posture than Cursor’s standalone one. Under our client contracts, we are obligated to know what that posture is, not to assume it. ‘We read the terms of service when we onboarded the tool’ stops being a sufficient answer the moment ownership changes hands. We will be reviewing afresh, and where appropriate, raising it with clients proactively rather than waiting to be asked.

EU AI Act and regulatory exposure. This one matters specifically because we are a UK-based firm serving EU clients, which already places us in the more complicated end of the AI Act compliance picture.

The EU’s posture toward Musk-owned entities is, to put it diplomatically, more adversarial than its posture toward Anthropic or OpenAI. X is currently subject to multiple Digital Services Act investigations, and the EU has demonstrated willingness to act on those. If Cursor inherits any of that regulatory weather under new ownership, the compliance story we tell EU clients about our tooling stack gets meaningfully harder to write.

We do not yet know how this will land, but we know enough to flag it as a watchpoint rather than a settled question.

Pricing pressure after the IPO. A $60 billion valuation needs revenue to justify it. The initial public offering (IPO) expected to fund the acquisition will create sharp pressure to demonstrate that revenue trajectory. We expect Cursor pricing to be on an upward trend.

The most likely shapes are bundling with X Premium or Grok subscriptions, or new enterprise tiers carved out from features that are currently included. For us, this is not just a line-item cost increase. It is a potential re-evaluation of every active estimate where Cursor sits as an assumed input. The line-item approach we have already adopted helps absorb this cleanly, but it does not make the underlying volatility go away.

None of this means we are moving off Cursor today. The combination of Cursor and Claude is, as of right now, still our best answer for the work we do. But ‘best answer right now’ and ‘answer we can rely on for the next 6 months’ are no longer the same statement, and we would rather be honest about that gap than paper over it.

Why run a 3-month review cycle?

The most important change is the one that runs underneath all of the above: every quarter, we formally reassess our entire AI tooling stack.

The review covers everything:

Models
IDEs
Agents
MCP servers
Subscription tiers
Cost per delivery unit
Vendor risk

We benchmark candidates, run them on representative engagement work, and score them on capability, reliability, and total cost. From there we decide what stays, what is swapped, and what goes on the watchlist.

We started doing this because the pace of change demanded it. We will keep doing it for the foreseeable future, and here is why.

Even when we land on a stack that is working well, we cannot assume the status quo will hold for long. Recent quarters have shown all of the following:

A model gets quietly tuned for cost and degrades on iterative coding
A pricing tier we are standardised on gets restructured
A tool we depend on is acquired by an entity with different priorities
A new entrant ships something materially better and changes the economics of what we estimate

The honest position is that AI tooling is not yet stable infrastructure in the way that, say, .NET or Kubernetes are stable infrastructure. It is powerful, it is fast-moving, and it is occasionally wobbly. Treating it like a solved problem is how consultancies end up holding the cost of someone else’s pricing change, or shipping a project with degraded output and not noticing until the client does.

A 3-month cycle is short enough to catch material changes before they compound, and long enough to actually deliver projects in between reviews. We expect to be running this cadence for a long time, possibly for as long as the frontier keeps moving at this pace. That probably means the lifetime of this newsletter.

What does this mean for clients?

The productivity gains from AI-assisted development are not hype. We are shipping the evidence every quarter, and the trajectory is real.

Anyone telling clients that these tools are mature, stable, predictable infrastructure is selling them a story. They are powerful, fast-moving, and occasionally wobbly. Pricing those three things honestly is, increasingly, the actual job of a consultancy operating at the AI frontier.

Our commitment is not to any specific model, vendor, or IDE. It is to running the review cycle, holding the estimates honestly, and absorbing the volatility on our side of the engagement so that what reaches the client is the productivity, not the chaos underneath it.

While this does keep life very interesting, we would happily welcome more stability. Until then, we are constantly evaluating. We expect to be doing it for some time.

A small preview of what is coming in the next quarterly report: despite continuing to push hard on AI-assisted delivery, our AI-authored code percentage has dropped very slightly, to 83%. It has held there for a couple of months. We think that is a more interesting data point than continued growth would have been, and we will unpack it in the full Q2 report.

The Tooling Beneath the Productivity