Using AI to Meet the GDS Service Standard

AI can help a delivery team meet the GDS Service Standard more consistently across the whole lifecycle: user research, design, build, and operation. Agent rules, skills, and a shared AGENTS.md — used by tools like Claude Code and Cursor — are one mechanism for encoding the Standard into every AI-assisted change. Other tools apply at other stages: AI transcription for interviews, large-language-model synthesis of user stories and acceptance criteria, automated accessibility checks, and scaffolding against the GOV.UK Design System. Some Service Standard practices are not helped by AI at all. That is exactly where human effort should concentrate.

Why does encoding the GDS Service Standard matter for AI-augmented teams?

The Government Digital Service (GDS) Service Standard is 14 principles the UK Government uses to assess digital services. It covers user research, multidisciplinary teams, iterative delivery, accessibility, security, open standards, and reliable operation. It is supported by the GDS Service Manual, the Technology Code of Practice, and the GOV.UK Design System.

The Standard is principles-based. That works well in a team where every researcher has run user interviews under the Service Manual’s guidance, every engineer has read the Technology Code of Practice, and every designer can recite the GOV.UK Design System from memory. It works less well in two situations that are increasingly common in 2026.

The first is AI-augmented delivery. AI coding agents apply whatever context you give them, fast. If the Service Standard is not in that context, the agent will happily ship code that ignores it. Accessibility gets bolted on at the end. Dependencies get pulled in without architectural review. User-facing language drifts from plain English. The same failure mode applies upstream: AI can draft user stories from interview notes, but if nobody has told it what good looks like, the stories will be competent and wrong.

The second is team scale. Principles-based standards depend on tribal knowledge, and tribal knowledge moves when people move. A rule written down once, in a file the whole team shares, is more durable than a memo circulated in a Slack channel six months ago.

Encoding the Service Standard into the delivery workflow solves both problems at once. The principles travel with the project. AI output inherits them. Human review becomes a confirmation step rather than a first-pass discovery.

The Standard itself is worth reading on its own terms, whether you are building a government service or a commercial one. See our companion guide The GDS Service Standard for Private-Sector Delivery Teams for a point-by-point map of what transfers and what does not.

Where does AI help across the Service Standard lifecycle?

AI is not a single tool applied once. It is a family of tools applied across the full delivery lifecycle. Each phase has different AI assists, different encoding mechanisms, and different human work that does not go away.

Discover. AI transcription tools produce usable transcripts of user interviews within minutes. Large-language-model tools synthesise those transcripts into themes, unmet needs, and candidate user stories. An AGENTS.md file can carry the team’s shared research conventions so that synthesis follows a consistent structure. The human work that remains is the interview itself: building trust, reading the unspoken cues, and deciding which needs are real.

Design. AI models scaffold wireframes, draft content, and generate first-pass information architecture. A rule scoped to user-facing code can require references to the GOV.UK Design System components rather than new inventions. The human work that remains is the service design judgement: which flow, which trade-offs, which edge cases matter.

Build. AI coding agents generate code against the specification the team has given them. Agent rules (in a Cursor .cursor/rules/ directory, or referenced from a CLAUDE.md for Claude Code) apply on every agent turn to encode accessibility, security, and Service Standard-adjacent standards. Agent skills (.cursor/skills/ for Cursor, Claude skills for Claude Code) handle recurring tasks such as generating Architecture Decision Records or scaffolding Design System components. The human work that remains is senior engineering review, architectural judgement, and the discipline to say no.

Iterate. AI helps draft rollback plans, feature-flag scaffolding, and pull-request descriptions. A rule can cap pull-request size so changes stay reversible. The human work that remains is the release decision and the stakeholder communication around it.

Operate. AI drafts runbooks, log-analysis queries, and post-incident reviews. A rule can require structured logging on every new endpoint. The human work that remains is the on-call judgement during an incident and the communication with affected users after it.

What are agent rules, skills, and project-level context files?

Before we get to worked examples, a brief glossary. A procurement team reading this should recognise the mechanism a mature supplier will reference.

Agent rules are versioned guardrails stored alongside the code. In Cursor they live in .cursor/rules/ as Markdown files with frontmatter; Claude Code reads the equivalent content from a CLAUDE.md at the repository root, and from any referenced rule or policy files. Each rule has a scope. Some rules are always applied on every agent turn. Others are path-scoped, activated only when the agent edits files matching a glob pattern. Typical rules on a Service Standard-aligned project include:

An accessibility rule applied to every front-end change
A secrets-and-licence rule scoped to dependency manifests
A content-style rule scoped to user-facing copy and Markdown

Agent skills are named, discoverable capabilities the agent reads when triggered by keywords in the user request. In Cursor, skills live in .cursor/skills/, each with a SKILL.md that tells the agent when and how to use it; Claude Code uses the same SKILL.md convention for its own skills. Typical skills on a Service Standard-aligned project include:

A skill that generates an Architecture Decision Record from a chat-history discussion
A skill that scaffolds a GOV.UK Design System-compliant form
A skill that drafts a runbook template from a service’s architecture

Skills differ from rules because they are opt-in to the task rather than opt-out. A rule says “always do X”. A skill says “when asked to do Y, do it like this”.

AGENTS.md and CLAUDE.md are project-level context files that live at the root of a repository. They provide a standing brief to any agent working in the repo: what the project is, what the conventions are, which rules apply, and how to run the build. A single AGENTS.md is usually enough to tie the always-applied rules, the path-scoped rules, the available skills, and the build and review expectations together.

What all three have in common is that they are durable, versioned, and reviewable. They live in the repository, travel with the code, and can be read by any agent, any human reviewer, or any procurement auditor who asks to see them. That is what makes them useful for compliance work.

How can AI and encoded rules support each Service Standard point?

This section walks through nine Service Standard points. For each, it names three things: where AI helps directly, where a rule, skill, or Design System reference encodes the principle, and what remains human judgement. The examples are deliberately practical, not exhaustive.

Point 1: Understand users and their needs

Where AI helps. Transcription tools such as Otter, Rev, and Microsoft Copilot meeting notes turn interview recordings into accurate transcripts in minutes. Large-language-model tools such as Claude or ChatGPT synthesise those transcripts into draft themes, user stories, and acceptance criteria.

Where rules and skills encode the principle. A rule can require that any change touching user-facing code references a user story, a research finding, or a documented need. The rule is a forcing function: the agent refuses to proceed unless the calling engineer supplies the context. A skill can carry the team’s preferred story format so drafts come out in a consistent shape.

What remains human. The interview itself. Reading the unspoken cues. Deciding which needs are real and which are artefacts of the conversation. Deciding which needs the service will actually meet. None of this is transcription work.

Point 2: Solve a whole problem for users

Where AI helps. Large-language-model tools draft service maps and journey diagrams from research notes. They surface dependencies between user tasks that a busy team might otherwise miss.

Where rules and skills encode the principle. An AGENTS.md file encodes team norms about what “a whole problem” means for this service, so every agent-produced artefact lands in the right frame. A project-level skill can prompt the agent to check whether the current change closes or opens a loop in the user journey.

What remains human. Hiring, team composition, and the senior decision about where the service’s responsibility ends and another team’s begins.

Points 3 and 4: Joined-up experience and simplicity

Where AI helps. AI models draft consistent microcopy across channels, flag tone drift between pages, and generate first-pass information architecture. They are surprisingly good at spotting jargon an author has become blind to.

Where rules and skills encode the principle. A path-scoped rule on front-end code can require components from the GOV.UK Design System or the team’s own design system, with documented exceptions. A skill can scaffold a Design System-compliant form, including fieldsets, error summaries, and validation patterns. The Design System section below treats this in depth.

What remains human. Service design judgement. User-flow decisions. Deciding where a bespoke pattern is justified because the user research supports it, versus where “we need a custom thing” is a red flag.

Point 5: Make sure everyone can use the service

Where AI helps. AI drafts alt text for images, generates descriptive link text, and suggests semantic HTML replacements for divs-and-styles layouts. Automated tools such as axe and pa11y run in continuous integration and surface failures within seconds.

Where rules and skills encode the principle. An accessibility rule applied to every front-end change can require that any user-interface pull request either ships with passing automated checks or documents the exception. A skill can run accessibility tests against the changed pages and surface failures before human review. Extending the rule to require Web Content Accessibility Guidelines (WCAG) 2.2 AA evidence, in line with the Public Sector Bodies Accessibility Regulations 2018, is a small addition.

What remains human. Testing with real assistive-technology users. Automated tools catch perhaps 30 to 40 per cent of accessibility issues; real testing finds the rest. No Service Standard service should ship without it.

Point 8: Iterate and improve frequently

Where AI helps. AI drafts pull-request descriptions, feature-flag scaffolding, and rollback plans. It is particularly good at spotting the “one more change” pattern that inflates a pull request beyond what a reviewer can reason about.

Where rules and skills encode the principle. A rule can cap pull-request size and require that each change ship behind a feature flag where appropriate. A skill can draft feature-flag scaffolding and rollback runbooks from a natural-language description.

What remains human. The release decision. The stakeholder communication. The judgement call about whether this change is safe to roll to production on a Friday afternoon.

Point 9: Create a secure service

Where AI helps. AI drafts first-pass threat models from architecture descriptions. It summarises security advisories and suggests mitigation steps. It is a useful rubber duck for “what could go wrong with this endpoint?”

Where rules and skills encode the principle. Several concrete rules help here. A rule can block committed secrets or anything that looks like a credential. A rule can require that any new dependency pass a licence scan and a vulnerability scan before merge. A rule can require an ISO 27001 review marker on pull requests that touch authentication, authorisation, or personal-data handling. For AI-specific supply-chain questions, see our AI code attribution guide.

What remains human. Threat modelling the system, not just the endpoint. Senior security review. The decision to say no to a feature whose security trade-offs are not yet understood.

Points 11 and 12: Choose the right tools, make source open

Where AI helps. AI drafts Architecture Decision Records (ADRs) from chat-history reasoning, so the decision becomes a visible artefact rather than an implicit one. AI attribution metadata can be generated and attached to pull requests consistent with our AI code attribution guide.

Where rules and skills encode the principle. A rule can require an ADR for any new dependency, service, or infrastructure component. A skill can draft the ADR and present it for human sign-off. Another rule can require AI attribution metadata on every change, so code intended to be open is accompanied by a clear record of how it was produced.

What remains human. Architectural judgement. The intellectual-property decisions around open source. The commercial and policy trade-offs around which components to open and which to keep closed. For a wider discussion, see Who Owns AI-Written Code?.

Point 13: Use and contribute to open standards, common components and patterns

Where AI helps. AI surfaces when a proposed component duplicates an existing open-source or in-house pattern, which is exactly the duplication Point 13 is designed to prevent. It drafts contribution-back proposals when a fix applied locally would also improve an upstream library, and it writes the accompanying pull-request narrative for submission. Given an API design, it will flag where a proprietary format has been proposed in place of an open standard such as OpenAPI, JSON Schema, iCal, or a recognised domain-specific standard.

Where rules and skills encode the principle. A rule on user-interface code can require a reference to a matching GOV.UK Design System component or an equivalent in-house pattern before a new component is accepted, with a documented exception where none exists. A rule on data-exchange code can require an open standard for new endpoints, with any proprietary format flagged for justification. A skill can scaffold a contribution back to a dependency, including the changelog entry, licence compatibility check, and governance statement the upstream maintainer will need.

What remains human. Choosing which standard to adopt when two overlap. Deciding when a genuinely new pattern is justified and proposing it back to the community. Maintaining relationships with the upstream projects a service depends on. These are small, high-leverage calls a team makes once and lives with for years.

Point 14: Operate a reliable service

Where AI helps. AI drafts runbooks from a service’s architecture, generates post-incident-review first drafts from incident notes, and writes log-analysis queries from natural-language prompts. During an incident, it summarises noisy logs into candidate hypotheses faster than a human can read them.

Where rules and skills encode the principle. A rule can require that new endpoints ship with structured logging and a health check. A skill can generate a runbook template from the service’s architecture, and another can produce a post-incident review draft from incident notes.

What remains human. The on-call judgement during an incident. The decision to declare, escalate, or stand down. The communication with affected users. These are high-stakes judgement calls under pressure; AI produces drafts, humans make calls.

How does the GOV.UK Design System pair with rules and skills?

The GOV.UK Design System is the clearest available target for encoding into a Service Standard-aligned rulepack. It is a mature, public, living set of components, patterns, and styles published by the Government Digital Service. It operationalises several Service Standard points at once: Point 3 (joined-up experience), Point 4 (simplicity), and Point 5 (accessibility). Every component in the Design System has already been user-tested and accessibility-audited, which reduces the judgement load on both the AI agent and the reviewer.

The Design System also sits inside a clear documentation hierarchy: the Service Manual describes the practice, the Service Standard describes the bar, and the Design System provides the building blocks. Teams building outside central government can adopt the Design System directly or draw on its patterns; it is under an Open Government Licence.

Concrete rule and skill examples that encode the Design System into AI-augmented delivery:

A path-scoped rule on user-facing code (for example .astro, .tsx, .njk) that requires a reference to a Design System component or a documented exception in an Architecture Decision Record. The rule prevents the agent from inventing new form patterns when a Design System equivalent already exists.
A skill named “scaffold a Design System form” that generates GOV.UK-compliant fieldsets, error summaries, and validation patterns from a natural-language description. The skill encodes the Design System’s published guidance into a reliable agent workflow.
A skill named “review user-interface pull requests against the Design System” that flags custom components, inconsistent type scale, or inaccessible patterns before the human reviewer opens the pull request. This moves the easiest half of a design review earlier in the process.
A rule that forbids inventing custom visual elements when a Design System equivalent exists, requiring a link to the relevant Design System component page in the pull-request description.

An honest caveat. The agent still needs a human on the team who has delivered against the Design System before and can tell when defaults are being stretched beyond their intent. Rules narrow the search space; they do not replace that judgement. The Design System is well enough documented that this judgement concentrates on a few edge cases, but those edge cases are where good services live or die.

What does a starter GDS-aligned rulepack look like?

A rulepack is not a single file. It is a repository layout. The starter layout has four elements.

The rules directory (.cursor/rules/ for Cursor, or rule files referenced from CLAUDE.md for Claude Code) holds rule files, each one covering a specific concern. Rules are scoped through frontmatter: an alwaysApply: true rule applies on every agent turn, while a globs or paths frontmatter key limits a rule to files matching a pattern. This is how an accessibility rule can apply only to front-end code, while a content-style rule applies to all written content.

The skills directory (.cursor/skills/ for Cursor, Claude skills for Claude Code) holds skills the agent uses on demand. Each skill has a SKILL.md with a short description the agent reads to decide whether the skill applies. Skills are good for things a rule would make too heavy. Examples include:

Producing a specific diagram
Generating a runbook
Scaffolding a Design System form
Driving a known workflow end-to-end

An AGENTS.md or CLAUDE.md at the repository root ties it together. It tells any agent working in the repo which rules are always applied, where the skills live, and what the project conventions are. A single context file is usually enough to make the rest of the rulepack discoverable and applied.

A continuous-integration pipeline that references the rules. Rules in a repository are only useful if they are applied. Continuous integration (CI) should check that pull requests satisfy the same constraints the rules encode:

Licence scans
Accessibility tests
Secret scans
Required review approvals

A rule without a corresponding CI check is a recommendation. A rule with a CI check is enforceable.

The flow below shows how Service Standard principles flow through this layout into a shipped change. The diagram is visible on wider screens; the prose above and below it carries the same information.

Service Standard and its references

Service Standard, Service Manual, Technology Code of Practice, GOV.UK Design System.

Source of truth

Encoded in rules, skills, and AGENTS.md

Plus Design System references and AI tool configuration, versioned in the repository.

Reviewable artefacts

Applied across research, design, and build

Transcription, synthesis, Design System scaffolding, AI-assisted engineering, accessibility checks.

Always-on guardrails

Human judgement gates

Service assessments, senior review, real user testing, architectural decisions.

Non-negotiable

Shipped and operated

CI-enforced deployment, Design-System-compliant UI, documented runbooks, monitored endpoints.

Auditable outcome

Maintenance of the rulepack is itself a practice. Rules drift when the Service Manual updates, when dependencies change, or when the team discovers a new failure mode. A quarterly review, timed to an existing cadence such as an architecture review board or a platform engineering retrospective, keeps the rulepack honest and connected to the rest of the team’s delivery tooling.

Where does AI not help, and where should human effort concentrate?

This is the honest section, and it is important. The items in this list are not apologies for what AI cannot do. They are the positive case for where senior team time should concentrate, precisely because AI does not lighten the load there.

User research. Point 1 requires teams to understand users. AI handles the transcripts and the first-pass synthesis. Research itself (building trust in an interview, reading the unspoken cues, deciding which needs are real) remains human work. A delivery plan that underweights user research because AI “writes the stories” is a delivery plan that ships a fluent misunderstanding.

Service assessments. Assessments are structured conversations with experienced assessors who probe the team’s understanding. A rulepack is evidence that a team takes the Standard seriously; it is not evidence that the team has passed an assessment.

Senior engineering judgement. A rule can block committed secrets. A rule cannot tell you that your data model is wrong, that your architecture will not scale, or that the problem you are solving is the wrong problem. These remain human calls.

Leadership, sponsorship, and cross-organisational negotiation. The Service Standard assumes a senior sponsor who owns the outcome, a multidisciplinary team that can make trade-offs, and stakeholders who can be brought along. None of this work is transcription, drafting, or scaffolding. It is human diplomacy at speed.

Incident communications with affected users. During an incident, AI drafts the status update. The human decides whether to send it, to whom, and in what tone. For a public-facing service under pressure, this is one of the highest-stakes pieces of writing a team will do. Nobody wants to read an AI-drafted apology.

Service design calls that require trade-offs between competing needs. Every interesting service contains conflicts: accessibility against flow, simplicity against capability, speed against reliability. AI models are good at drafting both sides of the argument. They do not make the call.

Rules drift. A rule written two years ago may not reflect current practice. A rule referencing WCAG 2.1 AA needs updating to 2.2. A rule referencing a retired dependency scanner needs updating to a current one. Without a maintenance practice, a rulepack ages into noise. Rule maintenance itself is a Service Standard-adjacent activity and needs a named owner.

A delivery plan that recognises all of the above and invests accordingly is a plan that ships a service to the Standard. A plan that treats AI as a substitute for any of these items is the risk pattern.

What UK public-sector AI governance applies to this work?

A Service Standard-aligned service using AI in delivery sits inside a wider UK public-sector governance stack. A supplier who cannot cite the relevant pieces is not ready. The following are the main references a 2026 delivery plan should acknowledge.

The Generative AI Framework for His Majesty’s Government (issued by the Cabinet Office, with Central Digital and Data Office and Department for Science, Innovation and Technology lineage) and the subsequent AI Playbook for the UK Government. These set expectations for how public bodies buy, build, and operate AI. Coverage includes risk classification, transparency, and human oversight.
National Cyber Security Centre (NCSC) Guidelines for secure AI system development. A widely cited, plain-language reference for securing AI systems end to end. It is a reasonable baseline for any supplier’s security rules.
The Algorithmic Transparency Recording Standard (ATRS). Required for many public-sector algorithmic and AI-assisted tools. If the service uses AI in a way that affects citizens, an ATRS record is likely needed.
Information Commissioner’s Office (ICO) guidance on AI and data protection. The regulator’s position on lawful basis, fairness, transparency, and accuracy for AI processing of personal data. The ICO’s published AI risk toolkit sits alongside it.
UK General Data Protection Regulation (UK GDPR) and Data Protection Act 2018, specifically the Data Protection Impact Assessment (DPIA) requirement. AI processing of personal data (including AI transcription of user-research interviews and AI synthesis of research notes containing identifiers) typically triggers a DPIA. Participant consent wording should name AI transcription and synthesis specifically, not just “recording”.
Government Functional Standard GovS 007: Security and the NCSC Cyber Assessment Framework. The security governance layer that sits above the secure-service practice (Point 9) for regulated services.

Where AI is applied upstream in user research, two practical consequences follow. First, US-based transcription or synthesis services may not be appropriate without a data-transfer assessment and an explicit lawful basis. UK or EU data-residency equivalents should be considered. Second, participant information sheets and consent forms should make AI use explicit, and the DPIA should describe it. These are cheap to get right on day one and expensive to retrofit.

A rule and skill layer helps here too. An AGENTS.md file can carry the project’s position on which AI tools are approved for which categories of data, so the agent does not suggest an unapproved one mid-task. A skill can generate a DPIA draft for an AI-assisted research plan, ready for the Data Protection Officer to review. A rule can require a DPIA reference on any pull request that introduces a new AI processing step against personal data.

What should procurement teams ask suppliers about this?

If you are procuring an AI-augmented supplier for a Service Standard-aligned service, the following questions surface whether the supplier’s approach is real or performative.

“Can we see your rules and skills repository?” A mature supplier should be able to share or demonstrate this within a working day.
“Which of your rules map to which Service Standard points?” A good answer identifies specific rules tied to specific points. A vague answer is a flag.
“Who reviews and updates your rulepack, and how often?” Quarterly is a reasonable cadence. Never is a red flag.
“How do you prevent AI-generated code from bypassing your rules?” A good answer cites required status checks in continuous integration, branch protection rules, and documented review processes.
“How do you handle user research, service design, and service assessments, given AI does not substitute for these?” A mature supplier will name the senior people accountable for each.
“How do you apply the Generative AI Framework for HMG, NCSC secure AI guidelines, and the Algorithmic Transparency Recording Standard to this engagement?” A mature supplier can cite each by name and show how it maps to their delivery practice.
“Where AI processes personal data in user research or operation, can you show the DPIA, the lawful basis, the data-residency position, and the participant consent wording?” This is the compliance floor, not an advanced question.
“Can you provide a sample evidence pack from a recent delivery, showing the rulepack, a completed pull request, and the human review record?”
“Who owns the maintenance of the rulepack when key people leave the project?”

A supplier who is vague on these questions in 2026 is an AI risk, not an AI advantage. The questions are also a fair self-check for in-house teams evaluating their own AI-augmented delivery practice.

For related procurement questions on AI-generated code specifically (intellectual property, licence compliance, attribution evidence), see our AI code attribution guide for enterprise procurement and the companion article Who Owns AI-Written Code?.

How do we use this approach at Talk Think Do?

Talk Think Do maintains a shared rules and skills layer across delivery. It covers:

Accessibility
Content and copywriting
Performance
Security posture
Engineering conventions

Several of these overlap directly with Service Standard principles: accessibility (Point 5), content clarity (Point 4), performance and reliable operation (Point 14), and open standards and common components (Point 13). On government engagements we layer on additional rules and references to the GOV.UK Design System, the Service Manual, and the Technology Code of Practice.

Upstream of the codebase, we use AI transcription and synthesis to turn interview notes into draft stories and acceptance criteria in hours rather than days, while keeping interview work itself firmly human. Downstream, continuous integration enforces the same constraints the rules encode, so AI-generated code cannot bypass the standards the team has agreed.

For the deeper detail of how this fits into our broader AI delivery practice, see our AI approach, The AI Velocity Report, Shipping AI in the Real World, and Why We Don’t Let AI Ship Code Unsupervised.

This approach underpins how we deliver mission-critical public-sector systems, including our work on the UK Livestock Unique Identification System (LUIS) and Department for Education modernisation. It is how we keep senior engineering judgement compounding across every change, rather than leaving Service Standard alignment to the memory of whoever picks up the next pull request.

If you are building or maintaining a service to the GDS Service Standard, or your procurement window requires evidence of AI-augmented delivery discipline, book a consultation or see our Government and Public Sector practice.

Frequently asked questions

What is the GDS Service Standard?

The Government Digital Service (GDS) Service Standard is a 14-point set of principles the UK Government uses to assess digital services built or funded by central government. Points cover user research, multidisciplinary teams, iterative delivery, accessibility, security, open standards, and reliable operation. The Standard is principles-based rather than a tick-box checklist. It is supported by the GDS Service Manual, the Technology Code of Practice, and the GOV.UK Design System.

How does AI help with user research under the Service Standard?

AI helps with the scaffolding around user research, not the research itself. Transcription tools (Otter, Rev, Microsoft Copilot meeting summaries) produce usable transcripts of interviews in minutes rather than hours. Large-language-model tools such as Claude or ChatGPT synthesise research notes into draft user stories and acceptance criteria in minutes rather than days. Human researchers still hold the interview, build trust, spot the unspoken cues, and decide which needs are real and which are artefacts of the conversation. AI accelerates the paperwork; it does not substitute for the practice.

What are agent rules and skills?

Agent rules are versioned guardrails stored in a repository that an AI coding agent applies automatically, either on every turn (always-applied) or when editing files matching a glob pattern. Agent skills are named, discoverable capabilities that the agent reads when triggered by keywords in the user request, such as generating a specific diagram or running a particular workflow. In Cursor, rules live in .cursor/rules/ and skills in .cursor/skills/; Claude Code reads similar content from a CLAUDE.md at the repo root plus Claude skills. Together with a shared AGENTS.md at the root, they form a durable, versioned encoding of a project's delivery standards.

How does the GOV.UK Design System fit with agent rules and skills?

The GOV.UK Design System is the clearest target for encoding into rules and skills. A rule scoped to user-facing code can require a Design System component reference on every new user-interface change, or a documented exception. A skill can scaffold a Design System-compliant form, including fieldsets, error summaries, and validation patterns. Another skill can review a user-interface pull request against the Design System and flag custom components, inconsistent type scale, or inaccessible patterns. Because the Design System is public, mature, and already documents the accessibility of each component, it reduces the judgement load on both the agent and the reviewer.

Which Service Standard points are best suited to AI and rule-based encoding?

The points with concrete engineering or artefact consequences encode well. Understanding users (Point 1) is supported by transcription, story drafting, and a rule that requires a story reference on every change. Accessibility (Point 5) is supported by automated accessibility checks and an accessibility rule. Iteration (Point 8) is supported by pull-request-size rules and rollback runbook skills. Security (Point 9) is supported by secret-scan and licence-scan rules. Choice of tools and open source (Points 11 and 12) are supported by Architecture Decision Record rules and AI attribution metadata. Open standards and common components (Point 13) are supported by rules that require a Design System or open-standard reference before new components or formats are accepted. Reliable operation (Point 14) is supported by runbook-generation skills and structured-log rules. Points about user research, team composition, and success definition benefit from AI drafting but remain human-led.

Where does AI not help with the Service Standard, and where should human effort concentrate?

AI does not sit on a service assessment panel, conduct user research, make senior engineering judgement calls, lead stakeholder negotiation, communicate with affected users during an incident, or write the tone-critical bits of policy. These are the practices that generate Service Standard compliance rather than the artefacts that document it. A delivery plan that underweights these items is the risk pattern. Because AI does not lighten the load here, this is where senior team time should concentrate.

What should procurement teams ask an AI-augmented supplier about Service Standard readiness?

Ask to see the supplier's rules and skills repository. Ask which of their rules map to which Service Standard points. Ask who reviews and updates the rulepack, and how often. Ask how they prevent AI-generated code from bypassing the rules through required continuous-integration status checks. Ask how they handle user research and service assessments (practices AI does not substitute for). Ask for a sample evidence pack from a recent delivery showing the rules, a completed pull request, and the human review record. A supplier who cannot answer these clearly is an AI risk, not an AI advantage.