Governing the AI Agent Supply Chain

The AI agent surface is everything a coding agent can call: skills, plugins, Model Context Protocol (MCP) servers, and subagents. Each one is executable third-party code running with the agent’s access, so it should be governed like a software supply chain. That means an inventory, least-privilege scoping, versioning, change review, and per-call logging. For UK teams in regulated sectors, this is not optional hygiene. It is the evidence ISO 27001 and the EU AI Act expect.

A green build proves the code works. It says nothing about what the agent was allowed to touch.
Treat skills, plugins, and MCP servers as dependencies: catalogue, version, scope, and review them.
The same controls that satisfy security reviews also satisfy procurement.

Our Claude Code development service sets up governed MCP servers and a scoped agent surface as standard.

Most coding-agent guidance focuses on what the agent produces. This guide is about what the agent can reach. As agents gain skills, plugins, and MCP connections, that surface becomes a supply chain, and an ungoverned supply chain is where the real risk sits. The framing draws on The Generative Programmer’s 29 May 2026 article on the missing quality layer for AI coding agents, which lists an agent surface inventory as one of five controls teams most often lack.

What is the agent surface, and why treat it as a supply chain?

The agent surface is the set of installed capabilities a coding agent can invoke, and it is a supply chain because every item is third-party code running with the agent’s access. When the agent can call a skill, load a plugin, or reach a system through an MCP server, it can do whatever that capability allows: read your source, run commands, or touch external data.

That is the same risk profile as a package dependency, with one difference. A dependency runs inside your build. An agent capability runs at the direction of a non-deterministic model that can be steered by the content it reads. So the controls are the same as for any dependency, applied with a little more care:

An inventory of what is installed.
Versioning, so you know what changed and when.
Least-privilege scoping, so each capability holds only the access it needs.
Change review, so nothing joins the surface unreviewed.

Why is a green build not enough here?

A green build is a narrow signal. It confirms the code compiled and the tests passed. It says nothing about whether the agent reached a system it should not have, used a skill no one vetted, or ran with broader credentials than the task required.

This is the gap the quality-layer argument identifies. Capability has outrun control: agents can do more each month, but the mechanisms to verify what they were permitted to do have not kept pace. The build status answers “did it work?”. Agent surface governance answers a different and increasingly important question: “what was it allowed to do, and who approved that?”.

What lives on the agent surface?

Four categories make up the surface, and each needs a named owner:

Skills. Reusable, invocable units of agent behaviour, often markdown plus scripts. They can run commands and edit files.
Plugins and extensions. Vendor or third-party add-ons that extend what the agent or IDE can do.
MCP servers. Connectors that expose external systems (work trackers, databases, cloud APIs, CI/CD) to the agent through the Model Context Protocol.
Subagents. Agents the main agent can spawn, each inheriting some slice of capability.

The common thread is that all four grant capability beyond the model itself. They are the outer harness made executable, which is precisely why they need governing rather than just listing.

What can go wrong on an ungoverned surface?

The failure modes are concrete, and most come from too much standing access rather than exotic attacks:

Over-permissioned MCP servers. A server with broad write access lets a single bad instruction take actions no one reviewed.
Unvetted skills. A skill pulled from a public source can carry commands that exfiltrate secrets or weaken a check.
Tool-output prompt injection. Content an agent reads through a connector can carry instructions that redirect it. If the connector is powerful, the redirection is dangerous.
Stale or unpinned versions. A capability that updates silently can change behaviour, or be compromised upstream, with no review.

MITRE ATLAS, the adversarial threat framework for artificial intelligence systems, is a useful way to structure this threat model. The point is not to fear the tools. It is to size their access to the job.

How do you build an agent surface inventory?

Start with a catalogue, because you cannot govern what you have not listed. The inventory is the agent equivalent of a software bill of materials, and it answers the first question any reviewer asks: who can do what?

For each item on the surface, record:

Name and type (skill, plugin, MCP server, subagent).
Owner, the person or team accountable for it.
Source and version, with the version pinned, not floating.
Permissions, the specific access it holds.
Data and systems reached, especially anything carrying personal or client data.

Keep the inventory in the repository, next to the harness it describes, so it is versioned with the code and visible in review. Our guide on AI code attribution for enterprise procurement covers the related discipline of recording authorship and provenance, which pairs naturally with a surface inventory.

How do you apply least privilege to MCP servers?

Scope every MCP server to the narrowest access the task needs, and prefer read-only by default. This single decision removes most of the risk, because an agent cannot misuse access it was never granted.

The patterns we use:

Separate read-only servers from action servers. A server that reads logs or work items should never also be able to deploy.
Gate state-changing actions behind approval. Anything that writes, deletes, or deploys requires an explicit confirmation step.
Issue scoped, rotatable credentials per server, not one broad shared token.
Log every tool call, so the audit trail shows what the agent actually did, not just what it could do.

We expand on read-only, action, and gated MCP patterns in a separate guide on MCP patterns for production agents.

How do you version and review changes to the surface?

Treat a change to the agent surface like a change to a dependency: it goes through review and it is versioned. Adding an MCP server or installing a skill is a change to what your agents can do, and it deserves the same scrutiny as adding a package to the build.

In practice:

Pin versions and avoid floating references that update silently.
Require a pull request to add or change anything on the surface, with the inventory updated in the same change.
Run software composition analysis, for example with a tool such as Snyk, over the third-party code behind skills and plugins.
Re-review on update, because an upstream change can alter behaviour or introduce a compromise.

How does agent surface governance map to UK regulation?

Governing the surface produces exactly the evidence UK regulated work expects, which is why we frame it as an assurance activity, not developer hygiene.

ISO 27001 expects asset inventories, access control, and change management. The surface inventory, least-privilege scoping, and pull-request review supply all three directly.
The EU AI Act expects providers and deployers to document and control their AI systems. A versioned, permission-scoped, logged surface is that documentation. Our guide on the EU AI Act and custom software covers the provider and deployer split in detail.
Cyber Essentials and procurement reviews ask who can access what. A current inventory answers in minutes rather than weeks.

The pattern is that the controls which make agents safe to use are the same controls a buyer or auditor wants to see. Doing the work once satisfies both. Our blog post on using ISO 27001 to govern AI development tools develops this point.

How does Talk Think Do govern its agent surface?

We run a small, explicitly scoped surface and keep it under review, as recorded in the Q1 2026 AI Velocity Report:

Six live custom MCP servers cover work items, test execution, logging, Azure, CI/CD, and GitHub, each scoped to its job.
Read access and action access are separated, and state-changing operations are gated.
The surface is versioned in the repository and changed through review, inside our ISO 27001-certified framework.
Every AI-authored change still passes senior engineer review and ISTQB-qualified QA validation.

The discipline is deliberately unglamorous: a catalogue, scoped permissions, versioning, review, and logging. That is what lets us run agents with high autonomy on regulated client work without widening the attack surface as we go.

To review the agent surface on your own delivery, or to set one up safely from the start, see our Claude Code development service and AI integration service, or book a free consultation.

Frequently asked questions

What is the AI agent surface?

The agent surface is the set of installed capabilities a coding agent can call: skills, plugins or extensions, Model Context Protocol (MCP) servers, and the subagents it can spawn. Each one can read code, run commands, or reach external systems on the agent's behalf. Because every item adds capability and risk, the surface should be treated as a software supply chain that needs review, versioning, and permission scoping.

Why treat coding agent skills and MCP servers as supply chain dependencies?

Because they are executable third-party code that runs with the agent's access. An unvetted skill, an over-permissioned MCP server, or a compromised plugin can read source, exfiltrate secrets, or take actions no one reviewed. The same controls you apply to package dependencies (inventory, versioning, least privilege, and change review) apply to the agent surface, and for regulated work they are an audit requirement.

What is an agent surface inventory?

An agent surface inventory is a catalogue of every skill, plugin, and MCP server an agent can use, with its owner, version, source, the permissions it holds, and the data or systems it can reach. It is the agent equivalent of a software bill of materials. Without it you cannot answer who can do what, which is the first question any security or procurement review will ask.

How do you apply least privilege to an MCP server?

Scope each MCP server to the narrowest access it needs and prefer read-only where possible. Separate read-only data servers from servers that can take actions, and gate any action that changes state behind explicit approval. Issue scoped, rotatable credentials per server rather than broad shared tokens, and log every tool call so the audit trail shows what the agent actually did.

How does agent surface governance map to ISO 27001 and the EU AI Act?

Agent surface governance produces exactly the evidence these frameworks expect. ISO 27001 asks for asset inventories, access control, and change management, which the surface inventory and least-privilege scoping provide. The EU AI Act expects providers and deployers to document and control their systems. A versioned, permission-scoped, logged agent surface is the documentation, not an afterthought to it.

What tools help govern the agent surface?

Software composition analysis tools such as Snyk help track and flag the third-party code behind skills, plugins, and MCP servers. MITRE ATLAS, the adversarial threat framework for AI systems, helps structure the threat model. The mechanics matter less than the discipline: a catalogue, scoped permissions, versioning, change review, and per-call logging, applied consistently.

Governing the AI Agent Supply Chain

What is the agent surface, and why treat it as a supply chain?

Why is a green build not enough here?

What lives on the agent surface?

What can go wrong on an ungoverned surface?

How do you build an agent surface inventory?

How do you apply least privilege to MCP servers?

How do you version and review changes to the surface?

How does agent surface governance map to UK regulation?

How does Talk Think Do govern its agent surface?

Frequently asked questions

Related guides

AI Tutoring for Children: Safety by Design

Is Claude GDPR Compliant? Anthropic Assurance for UK Businesses

The Risks of AI-Augmented Development

Ready to transform your software?