4.3 — AI Code Attribution & Licensing

Configuration Management 90 min DevOps & Developers

← 4.2 Change Management & Release Control

0:00 / 0:00

Listen instead

AI Code Attribution & Licensing

0:00 / 0:00

Learning Objectives

✓ Describe the current legal landscape surrounding AI-generated code and copyright
✓ Explain the US Copyright Office position on AI authorship and its practical implications
✓ Assess license contamination risks from AI code generation tools
✓ Evaluate the implications of active litigation (GitHub Copilot, Bartz v. Anthropic) on organizational policy
✓ Implement organizational policies for AI code attribution, provenance tracking, and license compliance
✓ Design AI BOM (Bill of Materials) processes that extend traditional SBOM practices
✓ Apply the EU AI Act's requirements for general-purpose AI transparency to development workflows

1. The Legal Landscape of AI-Generated Code (2025-2026)

The legal framework for AI-generated code is in active flux. There is no settled law in any major jurisdiction. Organizations making decisions about AI code adoption today are operating in legal uncertainty, and that uncertainty must be managed as a risk — not ignored as someone else’s problem.

1.1 The Scale of Litigation

As of early 2026, over 70 copyright infringement lawsuits have been filed against AI companies across the United States and European Union. These cases span text, image, music, and code generation, but the underlying legal questions are the same: Can AI models trained on copyrighted material produce output that infringes on the rights of the training data creators? And who is liable — the AI company, the user, or both?

The litigation is not theoretical or marginal. It involves major technology companies, major law firms, and billions of dollars in claimed damages. The outcomes will define the legal framework for AI-generated content for the next decade.

1.2 The Bartz v. Anthropic Settlement

The $1.5 billion Bartz v. Anthropic settlement stands as the largest AI copyright settlement to date. While the specific terms are not fully public, the magnitude of the settlement signals several things to organizations using AI code generation:

Training on copyrighted material without authorization carries substantial financial risk
AI companies are willing to settle rather than establish unfavorable precedent
The cost of AI copyright disputes is not hypothetical — it is measured in billions
Organizations that use AI-generated code inherit downstream risk if that code is later found to infringe

1.3 Thaler v. Perlmutter — The Authorship Question

In March 2025, the D.C. Circuit Court of Appeals affirmed the lower court ruling in Thaler v. Perlmutter: copyright requires human authorship. Stephen Thaler sought to register a copyright for an image generated entirely by his AI system, DABUS. The court held that the Copyright Act requires a human author, and an AI system — regardless of how sophisticated — cannot be an author under US law.

Implications for AI-generated code:

Code generated entirely by AI, without meaningful human creative contribution, is not eligible for copyright protection in the United States
This means AI-generated code may enter the public domain by default — anyone can use it, and the organization that generated it has no exclusive rights
This also means competitors can freely use AI-generated code that is publicly available (e.g., in open-source repositories)
The organizational consequence: AI-generated code should not be your competitive differentiator, because you may not be able to protect it

2. US Copyright Office Position

The US Copyright Office has issued multiple guidance documents on AI and copyright, establishing a framework that is more nuanced than “AI output has no copyright.”

2.1 The Threshold: Meaningful Human Authorship

The Copyright Office’s position, articulated through registration guidance and formal reports, centers on a threshold concept: works predominantly generated by AI without meaningful human authorship are NOT eligible for copyright protection.

This is not a binary test. The Copyright Office evaluates on a spectrum:

No copyright protection:

Prompting an AI to generate code, even with detailed and specific prompts
Selecting from multiple AI-generated options without modification
Curating AI output (choosing which generated code to keep)
Using AI to generate code and then making only trivial modifications (formatting, variable renaming)

Copyright protection available when:

A human-authored work is perceptible in the output — the human’s creative expression is identifiable
A human makes creative modifications to AI output that go beyond the mechanical or trivial
The human provides substantial creative contribution that shapes the expressive content of the work, not just the functional direction
The AI is used as a tool in a process where the human retains creative control over the expressive elements

2.2 Practical Implications

The Copyright Office framework creates a practical test: Can you point to specific creative decisions made by a human that are reflected in the final code?

A developer who uses AI to generate a function and then substantially rewrites the logic, adds error handling, restructures the algorithm, and integrates it with the existing codebase has likely contributed sufficient human authorship
A developer who prompts “write a REST API endpoint for user authentication” and commits the output with minimal changes has likely not contributed sufficient human authorship

For organizations, this means:

Track AI contribution levels in your codebase (see Section 6)
Code that is predominantly AI-generated may not be protectable
Code where humans made substantial creative contributions retains copyright eligibility
The burden of proving human authorship falls on the party claiming copyright

3. GitHub Copilot Litigation (Doe v. GitHub)

The Doe v. GitHub case is the most directly relevant litigation for organizations using AI code generation tools. Filed in November 2022 in the Northern District of California, it targets the training and output of GitHub Copilot.

3.1 Core Claims

The plaintiffs — open-source developers — allege that GitHub Copilot (powered by OpenAI Codex, trained on public GitHub repositories) was trained on their copyrighted code and produces output that:

Reproduces copyrighted code without attribution
Strips copyright notices and license terms from training data
Violates the terms of open-source licenses (which require attribution, license inclusion, or copyleft compliance)

3.2 Procedural History

May 2023 ruling (Judge Tigar):

Most claims dismissed, including federal DMCA claims (the court found the AI output was not “similar enough” to specific copyrighted works to sustain a DMCA claim)
Surviving claims: breach of contract (violation of open-source license terms), open-source license violation
The court noted that the dismissals were largely procedural (insufficient pleading) rather than substantive (the claims are invalid)

Post-2023 developments:

Discovery phase ongoing — plaintiffs seeking access to training data details and model documentation
Ninth Circuit appeal filed (case number 24-6136) on certain dismissed claims
As of February 2026, the case remains in active litigation with no trial date set

3.3 Implications Regardless of Outcome

Even if the plaintiffs ultimately lose, the litigation has established that:

The question of whether AI-generated code violates open-source licenses is a serious legal question, not a fringe theory
Organizations using AI code generation tools face potential claims from open-source authors
The training data composition of AI models is relevant to legal liability
“I didn’t know the AI used copyrighted code” is not a defense — users of AI tools have a duty to assess the provenance of AI output

4. License Contamination Risks

License contamination is the most immediate and practical risk for organizations using AI code generation tools. Unlike the theoretical questions of copyright authorship, license contamination can trigger concrete legal obligations today.

4.1 How Contamination Occurs

AI code generation models are trained on massive datasets of code scraped from public repositories. These repositories contain code under every conceivable license: MIT, Apache 2.0, BSD, GPL, AGPL, LGPL, MPL, proprietary, and unlicensed.

The training process converts this code into statistical patterns (model weights). When the model generates code, it produces output based on these patterns. The output has no license, no attribution, and no indication of which training data influenced it.

The contamination path:

Training data includes GPL-licensed code
Model learns patterns from that code
Model generates output that may reproduce or closely mirror GPL-licensed patterns
Developer commits the output to a proprietary codebase
If the output is sufficiently similar to GPL source, the developer may have unknowingly introduced a GPL obligation into their proprietary codebase

4.2 The German Court Ruling (November 2025)

A German court ruled in November 2025 that when an AI model memorizes and can reproducibly output specific copyrighted content, that reproduction constitutes copyright infringement. This is significant because it establishes (at least in German jurisdiction) that the statistical learning defense (“the model learned patterns, not specific code”) has limits.

If the model can reproduce a specific code snippet verbatim when given appropriate prompts, and that snippet is copyrighted, the reproduction may infringe — regardless of the mechanism by which the model arrived at it.

4.3 The GPL Propagation Theory

The most contentious question in AI code licensing is whether the GPL’s copyleft provisions propagate through AI model training.

The strong propagation theory: if a model is trained on GPL code, the model weights are a derivative work of that code, and all output is therefore subject to the GPL. This would mean any code generated by such a model carries GPL obligations.

The prevailing view: model weights are not derivative works of the training data. The training process is transformative — the weights encode statistical patterns, not copies of the training data. This view is held by most major AI companies and many legal scholars.

The practical reality: legal uncertainty remains. No court has definitively ruled on whether model weights are derivative works. The prevailing view may be correct, but “probably correct” is not “legally settled.”

The risk that matters: regardless of the theoretical question about model weights, the practical risk is real. AI models can and do reproduce copyleft snippets. If those snippets end up in a proprietary codebase and are identified, the organization faces a license compliance problem. The theoretical question of whether weights are derivative works is irrelevant if the output is a literal reproduction.

4.4 Quantifying the Risk

The risk of license contamination is not uniform across all AI code generation:

High risk: asking the AI to implement well-known algorithms, standard patterns, or code similar to popular open-source libraries (the model has seen thousands of implementations and may reproduce one closely)
Medium risk: asking the AI to implement domain-specific logic that is less commonly found in public repositories
Low risk: using AI for code that is highly specific to your application’s internal architecture (the model has no training data to memorize)

The risk is also tool-dependent. Some AI code generation tools have implemented output filtering to detect potential license violations. Others have not.

5. EU AI Act Implications

The EU AI Act, which began phased enforcement in August 2025, introduces specific requirements for providers of general-purpose AI (GPAI) models, which includes the models powering AI code generation tools.

5.1 Key Requirements for GPAI Providers

Training data transparency (effective August 2025):

GPAI providers must make available a sufficiently detailed summary of the training data
This summary must be detailed enough for rightholders to exercise their rights (opt-out, compensation claims)
The European AI Office has published a template for training data summaries

EU copyright compliance:

GPAI providers must demonstrate compliance with EU copyright law in their training data
This includes respect for the text and data mining (TDM) exceptions and opt-out provisions under the EU Copyright Directive (2019/790)
Rightholders who have expressly reserved their rights (via robots.txt, metadata, or other machine-readable means) must have their works excluded from training — or the provider must obtain a license

Full compliance deadline: August 2026 — providers have until this date to fully comply with all GPAI obligations, including maintaining and publishing training data documentation.

5.2 Implications for Organizations Using AI Code Tools

Organizations using AI code generation tools in the EU (or producing code deployed in the EU) should:

Verify that their AI tool providers comply with EU AI Act GPAI obligations
Request training data summaries from providers
Assess whether the provider’s training data practices create downstream legal risk
Document their due diligence in assessing AI tool compliance (this documentation may be relevant in future legal proceedings)

6. Organizational Policies and Practical Guidance

Legal uncertainty is not an excuse for inaction. Organizations must establish policies that manage the known risks while the legal landscape evolves.

6.1 Tag AI-Generated Code in VCS

Every piece of AI-generated or AI-assisted code must be identifiable in the version control system. This is the foundational requirement from which all other policies flow.

Implementation:

# Commit message convention
git commit -m "feat: implement rate limiting

AI-Assisted: yes
AI-Tool: GitHub Copilot
AI-Contribution-Level: substantial  # minimal | moderate | substantial
Human-Modifications: logic restructured, error handling added, tests written manually"

# Git trailers (machine-parseable)
git commit --trailer "AI-Assisted=yes" --trailer "AI-Tool=Claude Code"

# PR labels
# Apply labels: ai-generated, ai-assisted, copilot, claude

AI contribution levels:

Minimal: AI provided autocomplete suggestions, developer wrote most code
Moderate: AI generated initial implementation, developer made significant modifications
Substantial: AI generated most of the code, developer reviewed and made minor modifications

This classification feeds into the copyright analysis (Section 2): substantial AI contribution with minimal human modification means the code likely has no copyright protection.

6.2 Track Provenance

Maintain a record of which AI tools were used, which prompts produced which code, and what review process was applied. This provenance data supports:

License compliance auditing
Incident investigation (if AI-generated code causes a security issue)
Regulatory inquiries about AI use in development
Internal metrics on AI tool effectiveness and risk

Minimum provenance record:

Date of generation
AI tool and model version
Nature of the prompt (not necessarily the exact prompt, but the intent)
Review process applied (automated scan, peer review, security review)
Modifications made by humans

6.3 SCA Scanning on AI-Generated Code

Software Composition Analysis (SCA) tools should be configured to scan AI-generated code specifically for license compliance issues. Standard SCA tools detect known open-source components by matching code signatures. Newer tools are designed specifically for AI output:

Codacy Guardrails: scans AI-generated code for potential license violations and security issues
FOSSA: license compliance analysis that can be integrated into CI/CD pipelines with specific policies for AI-generated code
Snyk: dependency and license scanning with AI code analysis capabilities
FossID: snippet-level license detection that can identify partial matches against open-source databases

Integration approach:

AI-tagged PRs trigger enhanced SCA scanning (in addition to standard SCA)
SCA results are annotated on the PR
Any license match above the confidence threshold blocks merge until reviewed
License review results are documented in the PR

6.4 Assume No Copyright Protection

For risk management purposes, assume that AI-generated code (especially code with substantial AI contribution) has no copyright protection unless your legal counsel specifically advises otherwise.

Practical implications:

Do not rely on AI-generated code as trade secret or proprietary advantage without substantial human modification
AI-generated code committed to public repositories is effectively public domain
Competitors can use the same AI tools to generate equivalent code
Focus proprietary protection on the human-authored architecture, design decisions, and domain-specific logic

6.5 GPL Scanner Integration

Implement real-time scanning before commit to detect potential copyleft contamination:

# Pre-commit hook configuration
repos:
  - repo: local
    hooks:
      - id: license-scan
        name: License contamination check
        entry: fossid-scan --mode snippet --threshold 0.85
        language: system
        types: [python, javascript, typescript, go, rust, java]
        stages: [commit]

Threshold guidance:

95%+ match: almost certainly a direct reproduction — block and review
85-95% match: likely derived from a specific source — flag for legal review
70-85% match: possible similarity — flag for awareness, do not block
Below 70%: likely coincidental similarity — no action

6.6 Maintain Removal Traceability

If AI-generated code must be removed later (due to a court ruling, license claim, or cease-and-desist), the organization must be able to:

Identify all AI-generated code in the repository
Trace which AI tool and session generated the code
Remove the code without breaking the application
Verify that the removal is complete
Replace the code with clean-room human-written alternatives

This requires the tagging (Section 6.1) and provenance (Section 6.2) practices to be in place from the start. Retroactively identifying AI-generated code in a large codebase is extremely difficult.

6.7 Open Source Compliance Includes AI Outputs

Traditional open source compliance processes focus on third-party libraries and components included in the software. With AI code generation, the definition of “third-party code” must expand to include AI-generated code that may be derived from third-party training data.

Updated compliance process:

Inventory third-party libraries (traditional SCA)
Inventory AI-generated code (AI attribution tracking)
Scan both for license compliance
Generate SBOM that includes AI-generated components (see Section 7)
Review and approve licenses for both categories

7. License Classification Table

Organizations must maintain a license classification policy that determines which licenses are approved, which require review, and which are blocked.

License Category	Examples	Risk Level	Policy
Permissive	MIT, Apache 2.0, BSD 2-Clause, BSD 3-Clause, ISC, Unlicense	Low	Approved for use. Include attribution as required by the specific license.
Weak Copyleft	LGPL 2.1, LGPL 3.0, MPL 2.0, EPL 2.0	Medium	Requires legal review. Copyleft applies to modifications of the licensed component but does not propagate to the larger work (in most interpretations). Dynamic linking typically safe; static linking may trigger copyleft.
Strong Copyleft	GPL 2.0, GPL 3.0, AGPL 3.0	High	Blocked by default for proprietary/commercial software. GPL requires derivative works to be licensed under GPL. AGPL extends this to network interaction (SaaS). Use requires explicit legal approval and architectural isolation.
Proprietary	Commercial licenses, EULAs	Variable	Requires contract review. Terms vary widely. Ensure license permits intended use, modification, and distribution.
Unknown / None	No license declared	High	Do not use. Code without a license is copyrighted by default (all rights reserved). The absence of a license does not mean the code is free to use — it means the opposite.
AI-Generated (unattributed)	Output from AI tools without provenance	High	Treat as Unknown. Apply enhanced SCA scanning. Require human review and substantial modification before use in production.

License policy enforcement:

SCA tools configured with the organization’s approved license list
Pipeline blocks builds that include unapproved licenses
Exceptions require legal review and documented approval
License policy reviewed annually (or when new license types emerge)

8. The Adoption Gap

Despite the legal and security risks outlined in this module, organizational adoption of comprehensive AI code governance remains low. Industry surveys as of 2025-2026 indicate that only 24% of organizations apply comprehensive IP, licensing, security, and quality evaluations to AI-generated code.

This means 76% of organizations using AI code generation tools are doing so without adequate controls for:

License compliance
Copyright risk
Security quality
Provenance tracking
Regulatory compliance

This gap represents both a risk and an opportunity. Organizations that implement the controls described in this module are not just managing risk — they are establishing competitive advantage in an environment where regulators, auditors, and customers will increasingly demand evidence of responsible AI use.

9. AI BOM — Extending SBOM for AI Components

The Software Bill of Materials (SBOM) has become a standard artifact for supply chain security (required by Executive Order 14028 for US government software). AI code generation introduces new components that traditional SBOM formats (SPDX, CycloneDX) were not designed to capture.

9.1 What an AI BOM Adds

An AI BOM extends the traditional SBOM with:

AI Tool Information:

Tool name and version (e.g., GitHub Copilot v1.x, Claude Code)
Model identifier and version (e.g., GPT-4, Claude Opus)
Provider and provider’s AI Act compliance status

AI-Generated Component Inventory:

Which files/functions/modules contain AI-generated code
AI contribution level for each component
Date of generation
Provenance chain (prompts, review, modifications)

License Risk Assessment:

SCA scan results for AI-generated components
Identified potential license matches
Risk classification per component
Legal review status

Human Modification Record:

Description of human modifications to AI-generated code
Percentage of human vs. AI contribution (estimated)
Reviewer identity and review date

9.2 AI BOM in Practice

No standardized AI BOM format exists as of early 2026, but the following approaches are emerging:

CycloneDX extension: CycloneDX 1.6 includes a Machine Learning BOM (MLBOM) specification that can be adapted for AI code provenance. The modelCard component type captures model information, and custom properties can capture generation metadata.

SPDX AI Profile: SPDX 3.0 includes an AI profile for documenting AI-related artifacts, including training data, model information, and generated output.

Custom metadata approach:

{
  "ai_bom_version": "1.0",
  "generated": "2026-03-15T10:30:00Z",
  "components": [
    {
      "file": "src/auth/oauth2.py",
      "ai_tool": "Claude Code",
      "ai_model": "claude-opus-4-20250514",
      "contribution_level": "substantial",
      "human_reviewer": "dev@example.com",
      "review_date": "2026-03-15",
      "license_scan": {
        "tool": "FOSSA",
        "matches": [],
        "risk": "low"
      },
      "human_modifications": "Restructured token refresh logic, added rate limiting, wrote tests"
    }
  ]
}

9.3 Organizational Integration

The AI BOM should be:

Generated as part of the CI/CD pipeline (alongside the traditional SBOM)
Archived with each release
Available for audit and regulatory inquiry
Updated when AI-generated components are modified
Included in vendor security assessments (when providing software to customers)

10. Building an Organizational AI Code Policy

Drawing together all elements of this module, an organizational AI code policy should address:

Approved tools: which AI code generation tools are approved for use, and under what conditions. Not all tools have equivalent license risk profiles.

Attribution requirements: mandatory tagging of AI-generated code in VCS (Section 6.1), with defined contribution level classifications.

Review requirements: enhanced review process for AI-generated code, including mandatory SCA scanning and security review for sensitive components.

License compliance: integration of AI-aware SCA scanning into the CI/CD pipeline, with blocking rules for potential copyleft contamination.

Copyright posture: organizational position on AI-generated code copyright (recommend: assume no protection for substantially AI-generated code).

Provenance tracking: systems and processes for maintaining AI code provenance data (Section 6.2).

Removal readiness: capability to identify and remove AI-generated code if required by legal action or policy change (Section 6.6).

Regulatory compliance: monitoring of evolving regulations (EU AI Act, US state-level AI laws) and adjustment of policy accordingly.

Training: all developers using AI code generation tools must complete training on the organization’s AI code policy, license risks, and attribution requirements.

Annual review: the AI code policy must be reviewed at least annually given the rapidly evolving legal landscape. What is legally uncertain in 2026 may be settled law by 2027.

11. Key Takeaways

The legal landscape for AI-generated code is unsettled and evolving rapidly. Over 70 lawsuits, a $1.5 billion settlement, and active circuit court appeals define the current state. This is not theoretical risk.
Copyright requires human authorship. Code generated predominantly by AI without substantial human creative contribution is likely not copyrightable in the United States. This means AI-generated code may not be protectable as proprietary.
License contamination is the most immediate practical risk. AI models trained on copyleft code can produce output that triggers license obligations. SCA scanning and pre-commit license detection are essential controls.
The EU AI Act imposes transparency requirements on AI tool providers and creates compliance obligations for organizations using those tools in EU markets.
Tag everything. Track everything. Scan everything. The ability to identify, audit, and if necessary remove AI-generated code depends on attribution and provenance practices being in place from day one.
Only 24% of organizations have comprehensive AI code governance. The remaining 76% are accumulating risk. Implementing the controls in this module is both risk management and competitive advantage.
AI BOM extends traditional SBOM to capture AI-specific provenance, licensing, and contribution data. Standardization is emerging but not yet settled — start with custom metadata and migrate to standards as they mature.

Review Questions

A developer uses Claude Code to generate an entire microservice (approximately 2,000 lines of code), makes minor formatting changes, and submits a PR. Assess the copyright status of this code under current US Copyright Office guidance. What organizational policy would you apply?
Your SCA tool detects an 87% code similarity match between AI-generated code in your proprietary product and a GPL-3.0 licensed library. What are the potential legal implications, and what steps should you take?
Your organization operates in both the US and EU. How do the EU AI Act’s GPAI requirements affect your choice of AI code generation tools and your internal compliance processes?
Design an AI BOM schema for your organization. What fields would you include beyond traditional SBOM components? How would you integrate AI BOM generation into your existing CI/CD pipeline?
The CEO reads a news article about AI-generated code and asks: “Are we at risk?” Prepare a five-minute briefing that covers the key risks, your current controls, and recommended next steps. Use language appropriate for a non-technical executive.

Module 4.3 of the SSDLC + CIS Controls v8 CG16 + AI-Augmented Development Training Program Track 4: Version Control & Change Management (Dev + DevOps)

Study Guide

Key Takeaways

Copyright requires human authorship — Thaler v. Perlmutter (D.C. Circuit, March 2025) confirmed AI cannot be an author under US law; substantially AI-generated code may not be copyrightable.
License contamination is the most immediate practical risk — AI models trained on GPL code can produce output triggering copyleft obligations in proprietary codebases.
Only 24% of organizations have comprehensive AI code governance — 76% are using AI code generation without adequate controls for licensing, copyright, security, or provenance.
Tag everything, track everything, scan everything — AI attribution in VCS (git trailers), provenance records, and enhanced SCA scanning are foundational controls.
Assume no copyright protection for substantially AI-generated code — The organization may not be able to protect it as proprietary; competitors could freely use equivalent AI output.
The $1.5B Bartz v. Anthropic settlement signals material risk — Training on copyrighted material carries substantial financial risk; users inherit downstream liability.
AI BOM extends traditional SBOM — Captures AI tool information, contribution levels, license risk assessment, and human modification records alongside standard components.

Important Definitions

Term	Definition
Thaler v. Perlmutter	D.C. Circuit ruling that copyright requires human authorship; AI cannot be an author under US law
License Contamination	AI-generated code reproducing or closely mirroring copyleft (GPL) patterns, introducing license obligations
AI Contribution Level	Classification: Minimal (autocomplete), Moderate (AI initial, human modified), Substantial (AI generated most code)
AI BOM	Extension of SBOM capturing AI tool info, contribution levels, provenance, and license risk per component
CycloneDX MLBOM	Machine Learning BOM specification in CycloneDX 1.6 adaptable for AI code provenance
Doe v. GitHub	Active class-action lawsuit alleging Copilot reproduces licensed code without attribution
EU AI Act GPAI	General-Purpose AI obligations including training data transparency and EU copyright compliance
Code Provenance	Record of origin, tools used, prompts, review process, and modifications for all code

Quick Reference

Framework/Process: US Copyright Office meaningful human authorship test; three AI contribution levels; license classification table (Permissive/Weak Copyleft/Strong Copyleft/Proprietary/Unknown); AI BOM schema
Key Numbers: 70+ active copyright lawsuits; $1.5B Bartz settlement; 24% of organizations have comprehensive governance; 95%+ match blocks commit; 85-95% match flags for legal review; August 2026 full EU AI Act GPAI compliance deadline
Common Pitfalls: Assuming AI-generated code is free to use because it was generated (not copied); committing AI code without tags (makes removal impossible later); treating “no license declared” as free to use (it is copyrighted by default); relying on AI-generated code as proprietary competitive advantage

Review Questions

Under current US Copyright Office guidance, what distinguishes AI-assisted code that qualifies for copyright protection from code that does not?
How does the GPL propagation theory create risk for proprietary codebases using AI code generation?
What SCA scanning threshold should block a commit versus flag for legal review, and why?
How would you design an AI BOM schema that integrates with your existing SBOM and CI/CD pipeline?
If you needed to remove all AI-generated code from your repository due to a legal action, what systems must already be in place to make this feasible?

Q1. What was the court's ruling in Thaler v. Perlmutter regarding AI authorship?

AI-generated works can be copyrighted if the AI is sufficiently advanced

AI-generated works are automatically public domain

The case was dismissed without a ruling on the merits

Q2. According to the US Copyright Office, which of the following would likely qualify for copyright protection?

Prompting an AI with detailed instructions and committing the output with minimal changes

Selecting from multiple AI-generated options without modification

A human substantially rewriting AI output including restructuring the algorithm and adding error handling

Curating AI output by choosing which generated code to keep

Q3. What is the significance of the $1.5 billion Bartz v. Anthropic settlement for organizations using AI code generation?

It established that all AI-generated code is free to use commercially

It signals that training on copyrighted material carries substantial financial risk and users inherit downstream liability

It confirmed that AI companies are fully liable and users face no risk

It only applies to image generation, not code generation

Q4. What does the German court ruling of November 2025 establish about AI and copyright?

AI models can hold copyright in Germany

When an AI model memorizes and can reproducibly output specific copyrighted content, that reproduction constitutes infringement

The statistical learning defense fully protects AI model outputs

German copyright law does not apply to AI-generated code

Q5. What percentage of organizations apply comprehensive IP, licensing, security, and quality evaluations to AI-generated code?

76%

50%

24%

42%

Q6. How does the module classify code with no license declared in the license classification table?

Low risk, approved for use

Medium risk, requires legal review

High risk, do not use because code without a license is copyrighted by default

Variable risk, depends on the source

Q7. What threshold of code similarity in a license scan should block a commit and require review?

70-85% match

85-95% match

95%+ match

Any match above 50%

Q8. What AI contribution level classification means the AI generated most of the code and the developer reviewed and made minor modifications?

Minimal

Moderate

Substantial

Complete

Q9. Which SBOM format includes a Machine Learning BOM specification that can be adapted for AI code provenance?

SPDX 2.3

CycloneDX 1.6

SWID Tags

CPE Dictionary

Q10. Under the EU AI Act, what is the full compliance deadline for GPAI providers to meet all obligations including training data documentation?

August 2025

January 2026

August 2026

January 2027

Answered: 0 of 10 · Score: 0/0 (0%)

4.3 — AI Code Attribution & Licensing

Learning Objectives

1. The Legal Landscape of AI-Generated Code (2025-2026)

1.1 The Scale of Litigation

1.2 The Bartz v. Anthropic Settlement

1.3 Thaler v. Perlmutter — The Authorship Question

2. US Copyright Office Position

2.1 The Threshold: Meaningful Human Authorship

2.2 Practical Implications

3. GitHub Copilot Litigation (Doe v. GitHub)

3.1 Core Claims

3.2 Procedural History

3.3 Implications Regardless of Outcome

4. License Contamination Risks

4.1 How Contamination Occurs

4.2 The German Court Ruling (November 2025)

4.3 The GPL Propagation Theory

4.4 Quantifying the Risk

5. EU AI Act Implications

5.1 Key Requirements for GPAI Providers

5.2 Implications for Organizations Using AI Code Tools

6. Organizational Policies and Practical Guidance

6.1 Tag AI-Generated Code in VCS

6.2 Track Provenance

6.3 SCA Scanning on AI-Generated Code

6.4 Assume No Copyright Protection

6.5 GPL Scanner Integration

6.6 Maintain Removal Traceability

6.7 Open Source Compliance Includes AI Outputs

7. License Classification Table

8. The Adoption Gap

9. AI BOM — Extending SBOM for AI Components

9.1 What an AI BOM Adds

9.2 AI BOM in Practice

9.3 Organizational Integration

10. Building an Organizational AI Code Policy

11. Key Takeaways

Review Questions

Study Guide

Key Takeaways

Important Definitions

Quick Reference

Review Questions

Module Media