AI Code Attribution: Audit Trails and Compliance for Enterprise Procurement

AI code attribution is the practice of recording which code was AI-assisted, which model produced it, and what human review occurred. This guide recommends a repo-level model log (an ai-tools.md or .ai-manifest.json in the repo root) as the durable record, supplemented by PR-level notes for reviewer context. It covers CI/CD enforcement, SBOM integration, and what procurement teams should require from suppliers.

What is AI code attribution and why does it matter?

AI-augmented development is now standard practice across enterprise software teams. The question is no longer whether AI writes code, but whether you can prove what it wrote, under which terms, and with what human oversight.

Attribution matters for three reasons.

Legal defensibility. In the UK, the CDPA 1988 s.9(3) assigns copyright for computer-generated works to “the person who made the necessary arrangements.” To invoke that provision, you need evidence of human direction, review, and editorial decisions. Attribution metadata creates that evidence. For a deeper look at UK copyright law and AI code, see our UK legal guide for CTOs.

Supply chain transparency. Enterprise buyers increasingly require visibility into how software was produced. Government frameworks (G-Cloud, DOS), ISO 27001 audits, and regulated industries (financial services, education) expect suppliers to disclose AI tool usage and demonstrate governance. Attribution provides the audit trail.

Open-source licence compliance. AI models can reproduce training data, including code under GPL, MIT, Apache, or other licences. Without attribution and scanning, you risk shipping code with undisclosed licence obligations. For more on this risk, see Who Owns AI-Written Code?.

A practical attribution framework

Attribution does not require new tools or major process changes. It requires consistent metadata at a small number of touchpoints in your development workflow. The challenge is choosing an approach that is realistic to maintain, durable across platforms, and useful when it matters (audits, disputes, procurement reviews).

Our recommendation: repo-level model log plus PR hints

After working across multiple enterprise projects, our recommended approach has two layers.

Layer 1: A repo-level model log (the permanent record). Keep an ai-tools.md or .ai-manifest.json file in the repository root. This file records which AI tools and model versions are approved and in use on the project, along with the licence terms that govern them. Because it lives in the repo, it is versioned, portable, and survives platform migrations.

{
  "tools": [
    {
      "name": "Claude Code",
      "model": "Sonnet 4.6",
      "provider": "Anthropic",
      "termsUrl": "https://www.anthropic.com/terms",
      "termsReviewedDate": "2026-03-15"
    }
  ],
  "licenceScanTool": "FOSSA",
  "lastPolicyReview": "2026-04-01"
}

Review and update this file quarterly or when tools change. The file answers the question every auditor and procurement team will ask: “What AI tools touched this codebase, and under what terms?”

Layer 2: PR-level notes (the reviewer hint). Add a required field to your PR template so that every pull request records whether AI tools were used and, if so, which ones:

## AI usage
- [ ] No AI tools used in this PR
- [ ] AI-assisted (tool: ___, model: ___, human review: yes/no)

This gives reviewers immediate context when assessing a change. It also creates a lightweight, searchable record of AI involvement per change.

Why this combination works. The repo-level file is the durable record. It travels with the code. The PR notes add change-level context where it is most useful: during review. Together, they cover both “what tools are used on this project?” and “was AI involved in this specific change?” without requiring heavy per-commit ceremony.

Why not commit-level attribution?

Some frameworks recommend tagging every commit with AI tool metadata, either as a message prefix ([AI-assisted]) or as a Git trailer (AI-Tool: Claude Code). In theory, this creates the most granular audit trail. In practice, it is unreliable.

You cannot guarantee compliance. Developers switch between AI and manual coding within a single session. Expecting every commit to carry accurate metadata is aspirational, not enforceable.
It is hard to apply retrospectively. If a commit is missed, there is no clean way to add the metadata after the fact without rewriting history.
Tooling does not support it consistently. Not all AI coding tools add commit trailers automatically. Relying on a convention that depends on manual discipline undermines the point of automation.
It creates noise. In a codebase where AI assists on most changes, a [AI-assisted] prefix on 80% of commits adds metadata without adding signal.

Commit conventions are not harmful. If your team already uses them, keep them. But they should not be the foundation of your attribution strategy. The repo-level model log is more reliable, and the PR template is more useful to reviewers.

A note on PR portability

Pull requests are a platform feature, not a Git feature. PR descriptions, comments, and review threads are stored by GitHub, Azure DevOps, or GitLab, not in the repository itself. If you migrate your repository to a different platform, fork it, or detach a fork from its network, PR metadata does not come with it.

This is why we recommend the repo-level model log as the primary record. It is part of the Git history and goes wherever the code goes. PR notes are valuable during active development and review, but they should not be your only attribution record.

Licence scanning gate

Run an automated open-source licence scan on every pull request. This is the most important technical control. It catches training data contamination before it reaches production.

Tools that work well in enterprise CI/CD:

FOSSA for licence compliance and attribution reports
Snyk Open Source for licence and vulnerability scanning
GitHub Advanced Security (GHAS) for secret scanning and code scanning
WhiteSource (Mend) for licence policy enforcement

Configure the scan as a required status check that blocks merge. This makes compliance structural rather than relying on individual discipline.

Human review records

Every AI-generated code change should be reviewed by a qualified engineer before merge. Record this review in the standard PR approval process. The key is that the approval is documented, timestamped, and linked to the specific code changes.

For regulated environments, consider a dedicated “AI review” approval alongside the standard code review. This creates a clear audit trail showing that AI output was explicitly assessed for correctness, security, and licence compliance.

Integrating attribution into your CI/CD pipeline

The controls above need enforcement. Without automation, conventions decay. Here is how to wire them into GitHub Actions (the same principles apply to Azure DevOps or GitLab CI).

Licence scan as required check

Configure your licence scanning tool as a required status check on your main branch. In GitHub:

Add the scanner’s GitHub App or Action to the repository
Under Settings > Branches > Branch protection rules, add the scan as a required status check
Set the policy to block merge on copyleft or unknown licences

The scan should flag, at minimum, GPL-family licences that impose copyleft obligations on surrounding code.

PR template enforcement

Use a GitHub Actions workflow or Azure DevOps policy to check that the AI usage section of the PR template is completed. A simple regex check on the PR body can validate that the checkbox section is present and at least one option is selected.

Pipeline summary

The full pipeline adds two checks to your existing CI:

Licence scan (blocking)
PR template validation (advisory or blocking)

Combined with the repo-level model log, standard code review, and test suites, these checks create a layered attribution system that operates at the speed of normal development.

What should procurement teams require from suppliers?

If you are buying software that may include AI-generated code, you need evidence of governance, not just a policy document. Here is what to ask for.

Contract clauses

AI disclosure: The supplier must disclose which AI tools are used in development, including model names and versions.
Licence warranty: The supplier warrants that all AI-generated code has been scanned for open-source licence compliance and that no undisclosed licence obligations exist.
IP indemnity: The supplier indemnifies the buyer against claims arising from AI-generated code, including copyright infringement and licence violations.
Human review: The supplier warrants that all AI-generated code has been reviewed by a qualified engineer before delivery.

Evidence pack

Request a sample evidence pack from a recent delivery. A mature supplier should be able to provide:

The repo-level model version log (ai-tools.md or .ai-manifest.json) for the project
A completed PR with AI usage notes visible in the description
A licence scan report showing clean results
The AI usage policy that governs their development team

Vendor questionnaire additions

Add these questions to your standard vendor assessment:

Do you use AI coding tools in development? If yes, which ones?
How do you track which code is AI-assisted?
Do you run licence scans on every PR? What tool do you use?
Can you provide a sample attribution report from a recent project?
Do you maintain a model version log? How often is it updated?
What is your policy on human review of AI-generated code?

Suppliers who cannot answer these questions clearly may not have the governance controls you need.

Attribution and software bill of materials

A Software Bill of Materials (SBOM) is an inventory of components in a software product. SBOMs are already standard for dependency tracking and vulnerability management. AI provenance is the next layer.

What standards support AI metadata?

CycloneDX 1.6 introduced a machine-learning component type. This allows you to record AI models as components in the SBOM, including the model name, version, provider, and any relevant licence information.

SPDX 3.0 added AI and dataset elements to the specification. These support recording the training data, model provenance, and usage context.

Both formats are still maturing their AI-specific capabilities. Tooling support is emerging but not yet universal.

Practical SBOM integration today

Even before tooling fully catches up, you can extend your existing SBOM process:

Add AI tools as components. Record each AI coding tool as a development-time dependency in your SBOM. Include the model version and provider.
Flag AI-assisted components. Where possible, annotate components or modules that were substantially AI-generated. This is manual today but will become automated as tooling improves.
Include licence scan results. Attach the licence scan output as a supplementary document alongside the SBOM.

The goal is not perfection. It is a clear, honest record that demonstrates governance and improves over time.

What regulated industries need to know

Different sectors have different expectations. Here is how AI code attribution maps to common compliance frameworks.

Financial services

The FCA and PRA expect firms to understand and manage risks from AI and automated systems. AI code attribution supports:

Model risk management: Documenting which AI models contributed to code in trading, risk, or customer-facing systems
Operational resilience: Demonstrating that AI-generated components have been reviewed and tested
Third-party risk: Requiring attribution evidence from software suppliers

Government and public sector

G-Cloud and Digital Outcomes and Specialists (DOS) frameworks increasingly ask about AI usage. The Government Digital Service (GDS) Service Standard requires teams to understand their technology choices. AI code attribution provides:

Transparency: A clear record of how AI was used in service delivery
Assurance: Evidence that AI-generated code meets the same quality and security standards as human-written code
Supplier governance: A framework for evaluating supplier AI practices during procurement

Education

Education technology handles sensitive pupil data and is subject to DfE data protection requirements. AI code attribution matters because:

Data handling code: Code that processes pupil data must be traceable and auditable
Procurement: Multi-academy trusts and local authorities need assurance that suppliers govern AI usage
GDPR compliance: AI involvement in code that processes personal data should be documented as part of the data processing record

Where to start

If you have no AI attribution in place today, start with these three steps:

Create a model version log in each repository. Add an ai-tools.md or .ai-manifest.json to the repo root listing every AI tool, model version, and licence terms link. This is the durable record that travels with the code.
Enable a licence scanning gate on your main branch. This is the highest-value technical control.
Add a PR template field for AI usage disclosure. This gives reviewers immediate context and creates a searchable per-change record.

These three steps cover the minimum viable attribution framework. Add SBOM integration and commit conventions as your process matures.

For a deeper dive into the legal context, read our guides on AI code ownership in the UK and who owns AI-written code.

To see how we apply these practices in our own delivery, visit our AI approach page or explore our Claude Code development and GitHub Actions CI/CD services.

Frequently asked questions

What is AI code attribution?

AI code attribution is the practice of recording which parts of a codebase were produced or substantially shaped by AI tools. It includes metadata about the model used, the version, whether a human reviewed the output, and the licence scan status. The goal is traceability: knowing what was AI-assisted, under which terms, and with what oversight.

Is AI code attribution legally required in the UK?

There is no UK law that mandates AI code attribution specifically. However, the CDPA 1988 s.9(3) provision on computer-generated works relies on identifying 'the person who made the necessary arrangements.' Documented attribution strengthens that claim. Separately, ISO 27001, Cyber Essentials, and public sector procurement frameworks increasingly expect transparency about AI tool usage in software supply chains.

What should an SBOM include for AI-generated code?

At minimum, the AI tool name, model version, a flag indicating AI involvement, and the result of any licence scan. CycloneDX 1.6 supports a machine-learning model component type. SPDX 3.0 supports AI and dataset elements. Both formats can carry this metadata today, though tooling support is still maturing.

How do I enforce AI attribution in CI/CD?

Use a repo-level model log (ai-tools.md or .ai-manifest.json) as the durable record of AI tool usage, a licence scanning gate that blocks PRs with flagged dependencies, and a PR template field that records AI involvement per change. GitHub Actions, Azure DevOps, and GitLab CI all support required status checks that prevent merge until these pass.

What should procurement teams ask suppliers about AI code?

Ask for: a written AI usage policy, evidence of licence scanning in CI/CD, a repo-level model version log (such as ai-tools.md or .ai-manifest.json), PR-level attribution notes, and a contractual warranty that AI-generated code has been reviewed by a qualified engineer and scanned for open-source licence compliance. Request a sample evidence pack from a recent delivery.

Does Talk Think Do practise AI code attribution?

Yes. We use repo-level model version logs, PR-level attribution notes, CI licence scanning gates, and documented human review on every project. Our ISO 27001 certified security framework includes AI tool governance. See our AI approach page for how we manage AI-augmented delivery.

AI Code Attribution for Enterprise Procurement Teams