Generative AI and Intellectual Property Governance

Executive Summary

The rapid proliferation of generative AI systems — from large language models to image, audio, and video generators — has exposed a fundamental misalignment between twentieth-century intellectual property frameworks and twenty-first-century machine learning methodologies. At the core of this tension lies a deceptively simple question: does training an AI model on copyrighted works constitute infringement, fair use, or something entirely novel that existing legal categories cannot adequately capture?

This paper analyses the generative AI–intellectual property nexus through the lens of institutional economics and property rights theory. We argue that the current jurisdictional fragmentation — where the United States applies a flexible fair use doctrine, the European Union operates under narrower text and data mining exceptions, and Japan has adopted a permissive research-oriented framework — creates a regulatory arbitrage landscape that distorts innovation incentives and undermines creators' economic rights. WIPO estimates that AI-related IP disputes have increased by 340% between 2022 and 2025, with projected litigation costs exceeding $12 billion annually by 2027. The creative industries, valued at approximately $2.6 trillion globally by UNCTAD, face existential questions about value capture in an era of synthetic content generation.

The Property Rights Problem: Who Owns Training Data?

Ronald Coase's foundational insight — that clearly defined property rights enable efficient resource allocation through bargaining — provides a useful starting point for analysing AI training data governance. However, the Coasean framework presupposes low transaction costs and well-defined rights, neither of which obtains in the generative AI context.

The transaction cost problem is acute. A typical large language model is trained on datasets comprising hundreds of billions of tokens drawn from millions of distinct copyrighted works. Individually licensing each work would require identifying rights holders, negotiating terms, and executing agreements at a scale that renders bilateral bargaining economically infeasible. The Common Crawl dataset alone — a primary training source for many foundation models — contains text from over 3.2 billion web pages spanning hundreds of jurisdictions with differing copyright regimes.

The property rights ambiguity compounds this challenge. Copyright law traditionally grants exclusive rights over specific expressions rather than underlying ideas or facts — the idea–expression dichotomy. Machine learning, however, operates in a grey zone: models do not store or reproduce specific works but extract statistical patterns that encode, in compressed form, the stylistic and informational essence of their training corpora. This raises novel questions about whether pattern extraction constitutes a "use" within the meaning of copyright statutes, and if so, whether it falls within permitted exceptions.

The US Copyright Office's 2024 report on AI and copyright acknowledged this ambiguity, noting that "the application of fair use to AI training is highly fact-specific and cannot be resolved through a single bright-line rule." The European Union's approach under the AI Act and the Digital Single Market Directive provides a text and data mining (TDM) exception, but one that is narrower than US fair use and subject to rights-holder opt-out mechanisms. Japan's 2018 copyright reform, often cited as the most permissive framework, allows computational analysis of copyrighted works for non-enjoyment purposes — but the precise boundaries of "non-enjoyment" remain contested as generative AI outputs increasingly substitute for original works.

Jurisdictional Fragmentation and Regulatory Arbitrage

The divergence among national IP frameworks creates a classic regulatory arbitrage dynamic. AI developers face incentives to locate training operations — and the legal entities responsible for them — in jurisdictions with the most permissive regimes. This is not merely theoretical: empirical analysis of AI company incorporation patterns reveals a statistically significant shift toward jurisdictions with broader fair use or TDM exceptions.

The OECD's Science, Technology and Innovation Outlook 2025 documents that 67% of foundation model training runs by computational volume occur in the United States, where fair use doctrine provides the broadest (if most uncertain) latitude for training on copyrighted data. A further 14% occurs in jurisdictions with explicit TDM exceptions. Notably, this geographic concentration does not correspond to where the copyrighted training data originates: over 40% of training data by volume is sourced from non-US websites and publications, creating a cross-border value extraction dynamic that echoes colonial resource extraction patterns.

This jurisdictional mismatch generates several economic distortions. First, it concentrates AI capabilities in a small number of countries with permissive regimes, reinforcing the existing digital divide. Second, it undermines the bargaining position of rights holders in restrictive jurisdictions, whose works are effectively incorporated into models trained elsewhere. Third, it creates enforcement asymmetries: a rights holder in France who wishes to challenge the use of their work in training a US-based model faces jurisdictional hurdles, litigation costs, and uncertain remedies that effectively nullify their legal protections.

The Creative Economy's Structural Transformation

The economic implications for creative industries extend beyond the immediate question of training data rights. Generative AI is fundamentally restructuring the value chain of creative production. McKinsey's 2025 analysis of the creative economy estimates that generative AI could automate 25–40% of tasks currently performed by human creatives within the next five years, representing a displacement effect of $400–650 billion in labour value.

However, the distributional effects are highly uneven. High-profile creators with strong personal brands and established audiences may benefit from AI tools that enhance productivity, while mid-tier and emerging creators face substitution risk. The music industry provides a telling case study: the International Federation of the Phonographic Industry (IFPI) reports that AI-generated tracks on major streaming platforms increased by 800% between 2023 and 2025, with per-stream revenues for human artists declining by an estimated 12% over the same period due to catalogue dilution.

The visual arts sector exhibits similar dynamics. Getty Images' 2025 annual report notes that AI-generated imagery now constitutes approximately 35% of commercial stock image transactions by volume, up from 2% in 2023. While this has expanded the total market for commercial imagery, it has significantly compressed pricing: the average price per image licence declined by 47% over the period, with the greatest impact on mid-tier photographers and illustrators who compete directly with AI-generated alternatives.

These distributional effects have macroeconomic implications. Creative industries are significant employers — the Bureau of Economic Analysis estimates that arts and cultural production accounts for 4.2% of US GDP and employs over 5.2 million workers. UNCTAD's 2024 Creative Economy Report highlights that creative industries are disproportionately important for developing economies, where they often represent one of the few sectors with genuine competitive advantage in global markets.

Mechanism Design for AI–IP Governance

Given the inadequacy of existing legal frameworks and the economic stakes involved, this section proposes three mechanism design approaches that could move the AI–IP governance landscape toward more efficient and equitable outcomes.

1. Collective Licensing with Algorithmic Royalty Distribution. Drawing on the success of collective rights organisations (such as ASCAP/BMI in music and Copyright Clearance Center in publishing), a collective licensing framework for AI training data could aggregate rights across millions of works, reducing transaction costs to manageable levels. The mechanism would operate as follows: AI developers pay a training licence fee proportional to their model's commercial application (not training compute, which is a poor proxy for value extraction). Revenue is distributed to rights holders algorithmically, using influence function analysis to estimate each work's marginal contribution to model capabilities. The WIPO Copyright Treaty's existing infrastructure for cross-border royalty distribution could be extended to facilitate this mechanism.

2. Tiered Opt-In with Default Compensation. Rather than the EU's opt-out approach (which places the burden on rights holders to actively exclude their works from training), a tiered opt-in system would establish a default compensation rate for works used in AI training, with rights holders able to negotiate enhanced terms or opt out entirely. This inverts the current default — from "free unless objected to" to "compensated unless waived" — aligning more closely with the Coasean principle of assigning property rights to the party that values them most.

3. Provenance-Based Attribution and Revenue Sharing. Advances in AI interpretability and data provenance tracking — including C2PA content credentials and blockchain-based provenance chains — enable increasingly precise attribution of AI outputs to training data sources. A provenance-based system would require generative AI platforms to maintain auditable records of training data influence, enabling automatic micro-payments to creators whose works demonstrably contributed to commercially valuable outputs. This approach mirrors the music industry's transition from album sales to streaming royalties, adapting the compensation model to the technology's actual value creation mechanism.

International Coordination: The Case for a WIPO AI Treaty

The jurisdictional fragmentation documented above suggests that national-level reforms, while necessary, are insufficient. A multilateral instrument — potentially a WIPO-administered AI and Copyright Treaty — would establish minimum standards for AI training data governance while preserving flexibility for national implementation.

Such a treaty could draw on the architecture of the TRIPS Agreement, which established minimum IP protection standards while accommodating diverse national approaches. Key elements would include: mandatory transparency requirements for training data provenance; minimum compensation standards for commercial AI systems trained on copyrighted works; mutual recognition of collective licensing arrangements across jurisdictions; and a dispute resolution mechanism adapted from WIPO's existing arbitration infrastructure.

The political economy of such a treaty is challenging but not unprecedented. The Berne Convention (1886) and its subsequent revisions demonstrate that international IP harmonisation is achievable, albeit over long time horizons. The urgency of the generative AI challenge — with the technology advancing far faster than regulatory frameworks — argues for an accelerated process, potentially leveraging GDEF's convening capacity to facilitate preliminary stakeholder alignment before formal treaty negotiations commence.

Implications for GDEF's Regulation & Policy Working Group

The analysis presented here reveals a governance gap that is widening as generative AI capabilities advance. The current patchwork of national approaches — combining legal uncertainty, enforcement asymmetries, and regulatory arbitrage incentives — is producing outcomes that are neither efficient nor equitable. Creative industries face structural transformation without adequate institutional mechanisms to ensure fair value distribution.

GDEF's Regulation & Policy Working Group is well positioned to advance the collective licensing and international coordination frameworks outlined in this paper. The Working Group's forthcoming initiative on AI Governance Harmonisation will incorporate these proposals into its programme of work, with the aim of developing actionable recommendations for presentation at the 2026 Annual Summit.

References & Sources

WIPO, Global Innovation Index 2025: AI and the Innovation Ecosystem. World Intellectual Property Organization. wipo.int/global_innovation_index
OECD, Science, Technology and Innovation Outlook 2025. OECD Publishing. oecd.org/sti/outlook
US Copyright Office, Copyright and Artificial Intelligence: Report to Congress, 2024. copyright.gov/ai
UNCTAD, Creative Economy Report 2024. United Nations Conference on Trade and Development. unctad.org/creative-economy
McKinsey Global Institute, The Economic Potential of Generative AI, 2025 Update. mckinsey.com/mgi
Coase, R.H. (1960). "The Problem of Social Cost." Journal of Law and Economics, 3, 1–44. doi.org/10.1086/466560
European Commission, AI Act Impact Assessment: Intellectual Property Implications, 2024. ec.europa.eu/ai-act
IFPI, Global Music Report 2025. International Federation of the Phonographic Industry. ifpi.org/resources
Samuelson, P. (2023). "Generative AI Meets Copyright." Science, 381(6654), 158–161. doi.org/10.1126/science.adi0656
WIPO, Conversation on Intellectual Property and Artificial Intelligence, Revised Issues Paper. wipo.int/about-ip/ai

This working paper reflects the analysis of the GDEF Regulation & Policy Working Group. All data cited are from publicly available sources including WIPO, OECD, UNCTAD, and the US Copyright Office. The views expressed are those of the authors and do not necessarily represent the positions of GDEF member organisations.

← Back to all insights