AI code attribution is the practice of recording which code was AI-assisted, which model produced it, and what human review occurred. This guide recommends a repo-level model log (an ai-tools.md or .ai-manifest.json in the repo root) as the durable record, supplemented by PR-level notes for reviewer context. It covers CI/CD enforcement, SBOM integration, and what procurement teams should require from suppliers.
What is AI code attribution and why does it matter?
AI-augmented development is now standard practice across enterprise software teams. The question is no longer whether AI writes code, but whether you can prove what it wrote, under which terms, and with what human oversight.
Attribution matters for three reasons.
Legal defensibility. In the UK, the CDPA 1988 s.9(3) assigns copyright for computer-generated works to “the person who made the necessary arrangements.” To invoke that provision, you need evidence of human direction, review, and editorial decisions. Attribution metadata creates that evidence. For a deeper look at UK copyright law and AI code, see our UK legal guide for CTOs.
Supply chain transparency. Enterprise buyers increasingly require visibility into how software was produced. Government frameworks (G-Cloud, DOS), ISO 27001 audits, and regulated industries (financial services, education) expect suppliers to disclose AI tool usage and demonstrate governance. Attribution provides the audit trail.
Open-source licence compliance. AI models can reproduce training data, including code under GPL, MIT, Apache, or other licences. Without attribution and scanning, you risk shipping code with undisclosed licence obligations. For more on this risk, see Who Owns AI-Written Code?.
A practical attribution framework
Attribution does not require new tools or major process changes. It requires consistent metadata at a small number of touchpoints in your development workflow. The challenge is choosing an approach that is realistic to maintain, durable across platforms, and useful when it matters (audits, disputes, procurement reviews).
Our recommendation: repo-level model log plus PR hints
After working across multiple enterprise projects, our recommended approach has two layers.
Layer 1: A repo-level model log (the permanent record). Keep an ai-tools.md or .ai-manifest.json file in the repository root. This file records which AI tools and model versions are approved and in use on the project, along with the licence terms that govern them. Because it lives in the repo, it is versioned, portable, and survives platform migrations.
{
"tools": [
{
"name": "Claude Code",
"model": "Sonnet 4.6",
"provider": "Anthropic",
"termsUrl": "https://www.anthropic.com/terms",
"termsReviewedDate": "2026-03-15"
}
],
"licenceScanTool": "FOSSA",
"lastPolicyReview": "2026-04-01"
}
Review and update this file quarterly or when tools change. The file answers the question every auditor and procurement team will ask: “What AI tools touched this codebase, and under what terms?”
Layer 2: PR-level notes (the reviewer hint). Add a required field to your PR template so that every pull request records whether AI tools were used and, if so, which ones:
## AI usage
- [ ] No AI tools used in this PR
- [ ] AI-assisted (tool: ___, model: ___, human review: yes/no)
This gives reviewers immediate context when assessing a change. It also creates a lightweight, searchable record of AI involvement per change.
Why this combination works. The repo-level file is the durable record. It travels with the code. The PR notes add change-level context where it is most useful: during review. Together, they cover both “what tools are used on this project?” and “was AI involved in this specific change?” without requiring heavy per-commit ceremony.
Why not commit-level attribution?
Some frameworks recommend tagging every commit with AI tool metadata, either as a message prefix ([AI-assisted]) or as a Git trailer (AI-Tool: Claude Code). In theory, this creates the most granular audit trail. In practice, it is unreliable.
- You cannot guarantee compliance. Developers switch between AI and manual coding within a single session. Expecting every commit to carry accurate metadata is aspirational, not enforceable.
- It is hard to apply retrospectively. If a commit is missed, there is no clean way to add the metadata after the fact without rewriting history.
- Tooling does not support it consistently. Not all AI coding tools add commit trailers automatically. Relying on a convention that depends on manual discipline undermines the point of automation.
- It creates noise. In a codebase where AI assists on most changes, a
[AI-assisted]prefix on 80% of commits adds metadata without adding signal.
Commit conventions are not harmful. If your team already uses them, keep them. But they should not be the foundation of your attribution strategy. The repo-level model log is more reliable, and the PR template is more useful to reviewers.
A note on PR portability
Pull requests are a platform feature, not a Git feature. PR descriptions, comments, and review threads are stored by GitHub, Azure DevOps, or GitLab, not in the repository itself. If you migrate your repository to a different platform, fork it, or detach a fork from its network, PR metadata does not come with it.
This is why we recommend the repo-level model log as the primary record. It is part of the Git history and goes wherever the code goes. PR notes are valuable during active development and review, but they should not be your only attribution record.
Licence scanning gate
Run an automated open-source licence scan on every pull request. This is the most important technical control. It catches training data contamination before it reaches production.
Tools that work well in enterprise CI/CD:
- FOSSA for licence compliance and attribution reports
- Snyk Open Source for licence and vulnerability scanning
- GitHub Advanced Security (GHAS) for secret scanning and code scanning
- WhiteSource (Mend) for licence policy enforcement
Configure the scan as a required status check that blocks merge. This makes compliance structural rather than relying on individual discipline.
Human review records
Every AI-generated code change should be reviewed by a qualified engineer before merge. Record this review in the standard PR approval process. The key is that the approval is documented, timestamped, and linked to the specific code changes.
For regulated environments, consider a dedicated “AI review” approval alongside the standard code review. This creates a clear audit trail showing that AI output was explicitly assessed for correctness, security, and licence compliance.
Integrating attribution into your CI/CD pipeline
The controls above need enforcement. Without automation, conventions decay. Here is how to wire them into GitHub Actions (the same principles apply to Azure DevOps or GitLab CI).
Licence scan as required check
Configure your licence scanning tool as a required status check on your main branch. In GitHub:
- Add the scanner’s GitHub App or Action to the repository
- Under Settings > Branches > Branch protection rules, add the scan as a required status check
- Set the policy to block merge on copyleft or unknown licences
The scan should flag, at minimum, GPL-family licences that impose copyleft obligations on surrounding code.
PR template enforcement
Use a GitHub Actions workflow or Azure DevOps policy to check that the AI usage section of the PR template is completed. A simple regex check on the PR body can validate that the checkbox section is present and at least one option is selected.
Pipeline summary
The full pipeline adds two checks to your existing CI:
- Licence scan (blocking)
- PR template validation (advisory or blocking)
Combined with the repo-level model log, standard code review, and test suites, these checks create a layered attribution system that operates at the speed of normal development.
What should procurement teams require from suppliers?
If you are buying software that may include AI-generated code, you need evidence of governance, not just a policy document. Here is what to ask for.
Contract clauses
- AI disclosure: The supplier must disclose which AI tools are used in development, including model names and versions.
- Licence warranty: The supplier warrants that all AI-generated code has been scanned for open-source licence compliance and that no undisclosed licence obligations exist.
- IP indemnity: The supplier indemnifies the buyer against claims arising from AI-generated code, including copyright infringement and licence violations.
- Human review: The supplier warrants that all AI-generated code has been reviewed by a qualified engineer before delivery.
Evidence pack
Request a sample evidence pack from a recent delivery. A mature supplier should be able to provide:
- The repo-level model version log (
ai-tools.mdor.ai-manifest.json) for the project - A completed PR with AI usage notes visible in the description
- A licence scan report showing clean results
- The AI usage policy that governs their development team
Vendor questionnaire additions
Add these questions to your standard vendor assessment:
- Do you use AI coding tools in development? If yes, which ones?
- How do you track which code is AI-assisted?
- Do you run licence scans on every PR? What tool do you use?
- Can you provide a sample attribution report from a recent project?
- Do you maintain a model version log? How often is it updated?
- What is your policy on human review of AI-generated code?
Suppliers who cannot answer these questions clearly may not have the governance controls you need.
Attribution and software bill of materials
A Software Bill of Materials (SBOM) is an inventory of components in a software product. SBOMs are already standard for dependency tracking and vulnerability management. AI provenance is the next layer.
What standards support AI metadata?
CycloneDX 1.6 introduced a machine-learning component type. This allows you to record AI models as components in the SBOM, including the model name, version, provider, and any relevant licence information.
SPDX 3.0 added AI and dataset elements to the specification. These support recording the training data, model provenance, and usage context.
Both formats are still maturing their AI-specific capabilities. Tooling support is emerging but not yet universal.
Practical SBOM integration today
Even before tooling fully catches up, you can extend your existing SBOM process:
- Add AI tools as components. Record each AI coding tool as a development-time dependency in your SBOM. Include the model version and provider.
- Flag AI-assisted components. Where possible, annotate components or modules that were substantially AI-generated. This is manual today but will become automated as tooling improves.
- Include licence scan results. Attach the licence scan output as a supplementary document alongside the SBOM.
The goal is not perfection. It is a clear, honest record that demonstrates governance and improves over time.
What regulated industries need to know
Different sectors have different expectations. Here is how AI code attribution maps to common compliance frameworks.
Financial services
The FCA and PRA expect firms to understand and manage risks from AI and automated systems. AI code attribution supports:
- Model risk management: Documenting which AI models contributed to code in trading, risk, or customer-facing systems
- Operational resilience: Demonstrating that AI-generated components have been reviewed and tested
- Third-party risk: Requiring attribution evidence from software suppliers
Government and public sector
G-Cloud and Digital Outcomes and Specialists (DOS) frameworks increasingly ask about AI usage. The Government Digital Service (GDS) Service Standard requires teams to understand their technology choices. AI code attribution provides:
- Transparency: A clear record of how AI was used in service delivery
- Assurance: Evidence that AI-generated code meets the same quality and security standards as human-written code
- Supplier governance: A framework for evaluating supplier AI practices during procurement
Education
Education technology handles sensitive pupil data and is subject to DfE data protection requirements. AI code attribution matters because:
- Data handling code: Code that processes pupil data must be traceable and auditable
- Procurement: Multi-academy trusts and local authorities need assurance that suppliers govern AI usage
- GDPR compliance: AI involvement in code that processes personal data should be documented as part of the data processing record
Where to start
If you have no AI attribution in place today, start with these three steps:
- Create a model version log in each repository. Add an
ai-tools.mdor.ai-manifest.jsonto the repo root listing every AI tool, model version, and licence terms link. This is the durable record that travels with the code. - Enable a licence scanning gate on your main branch. This is the highest-value technical control.
- Add a PR template field for AI usage disclosure. This gives reviewers immediate context and creates a searchable per-change record.
These three steps cover the minimum viable attribution framework. Add SBOM integration and commit conventions as your process matures.
For a deeper dive into the legal context, read our guides on AI code ownership in the UK and who owns AI-written code.
To see how we apply these practices in our own delivery, visit our AI approach page or explore our Claude Code development and GitHub Actions CI/CD services.
Frequently asked questions
What is AI code attribution?
Is AI code attribution legally required in the UK?
What should an SBOM include for AI-generated code?
How do I enforce AI attribution in CI/CD?
What should procurement teams ask suppliers about AI code?
Does Talk Think Do practise AI code attribution?
Related guides
Is Your Organisation Ready for AI? A Practical Readiness Checklist
Most AI projects stall before they deliver value. This guide provides a structured readiness assessment across five dimensions: data, people, process, infrastructure, and governance.
RAG vs Fine-Tuning vs Prompt Engineering: Choosing the Right AI Architecture
Three approaches to getting your data into AI, each with different costs, timelines, and trade-offs. A practical comparison for enterprise teams evaluating AI architectures on Azure.