
U.S. Copyright Office Issues Guidance on Generative AI Training
In Short
The Situation: To address the legal issues presented by artificial intelligence ("AI"), the U.S. Copyright Office ("Office") launched a multi-part Copyright and Artificial Intelligence Report ("Report") (see our Commentaries on Part One and Part Two). The Office released a prepublication version of Part Three in the series on generative AI ("GenAI") in May 2025, anticipating a final version "will be published in the near future, without any substantive changes expected in the analysis or conclusions."
The Development: Part Three focuses on a hotly contested copyright question: Do GenAI developers need to seek permission or provide compensation to use third-party copyrighted works, or can training be justified as fair use? The report describes the technical process of AI training, concludes that using copyrighted works in training may implicate copyright owners' exclusive rights, and analyzes such use under the fair use doctrine.
Looking Forward: Part Three asserts that fair use is likely to permit certain training uses but also recognizes the risk of market harm to creators. Rather than supporting new laws, the Office encourages continued development of voluntary licensing solutions and careful, fact-specific application of fair use.
The Office found that training and deploying a GenAI system using copyright-protected material involves multiple acts that could establish prima facie infringement. For example, Part Three states that "[t]he steps required to produce a training dataset containing copyrighted works clearly implicate the right of reproduction[]" and describes actions such as downloading, transferring, converting, and copying of the works. The Report further explains that the training process itself implicates the right of reproduction by downloading the dataset and copying it to high-performance storage prior to training and also when works are "shown" to the model (and thus temporarily copied) in batches.
Additionally, model weights that demonstrate "memorization" of protected expression in works used during training could themselves be considered copies or derivative works. GenAI outputs may infringe the reproduction right, the right to prepare derivative works, and, depending on the content type, the public display and public performance rights.
The Office then addressed whether training GenAI models on copyrighted material without permission may be considered fair use. In summary, the Office offered the following guideposts:
- Purpose and character: Uses that are highly transformative weigh in favor of fair use. Models that perform a general-purpose function will often be transformative, but the degree of transformativeness will depend on the model's functionality and deployment. By contrast, training AI to generate expressive content that competes with the originals, especially for commercial ends, is less likely to be deemed fair. The commerciality inquiry depends on whether the user stands to profit without paying the customary price—not the motive of the user. In the Office's view, knowingly using pirated or unlawfully accessed works should weigh against fair use.
- Nature of the work: Using highly creative or unpublished works weighs against fair use, while using factual or functional material may favor fair use.
- Amount and substantiality: GenAI training typically involves copying entire works, which often weighs against fair use. The Office acknowledges that copying whole works can be necessary to train effective models, but it distinguishes AI training from cases like Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015) where full copying was essential for a transformative search tool. Copying extensive material is not automatically excused by scale or necessity; the analysis looks at whether the scope of copying is reasonable in light of the transformative purpose.
- Market effect: The potential for AI-generated content to substitute for original works, dilute their market value, or undermine licensing opportunities is a significant factor weighing against fair use, but the public benefits of GenAI provide compelling arguments in its favor. The analysis of just how much a particular GenAI tool disrupts the market for the works used in its training requires a fact-intensive inquiry.
The Office concludes that the ultimate determination of fair use requires a case-by-case assessment that "balance[es] multiple statutory factors in light of all relevant circumstances." Although the Office notes the weighing of the four factors is left to the courts, the purpose-and-character and market-effect factors "can be expected to assume considerable weight in the analysis."
Looking beyond fair use, the Office analyzes some of the challenges faced by voluntary licensing markets for AI training in sectors like music, news, and stock imagery. The Office discusses the role of collective licensing (e.g., through collecting societies or industry consortia) to streamline permissions, but notes potential hurdles such as antitrust concerns and administrative overhead. The Office found little support for imposing compulsory licensing or extended collective licensing schemes for AI at this time, noting that stakeholders generally prefer an "opt-in" approach where they can choose when, how, and to whom they license their works. Thus, the Office recommends allowing the licensing market for AI training to develop naturally, reserving any statutory interventions to address clear market failures.
Overall, Part Three reinforces a case-by-case, market-driven strategy and advises all stakeholders to remain engaged as the intersection of AI and copyright law evolves. The Office acknowledged that its analysis is limited to "current circumstances and publicly available information" in a rapidly evolving field and that it is committed to "monitor developments to determine whether any conclusions should be revisited." Interested parties should watch for the final version of Part Three to publish, as well as Part 4 of the Report which will address infringement liability and transparency requirements.
Also of note, the Office is currently in a state of flux, with the Trump administration's recent dismissal of the Register of Copyrights and Librarian of Congress and its appointment of new individuals to hold these positions.
Three Key Takeaways
- Fair use is paramount: The legality of training AI models on unlicensed copyrighted material hinges on fair use. Part Three did not propose new copyright exceptions for AI, signaling that each use must be judged, on a case-by-case basis, under the totality of the circumstances, including the existing four-factor fair use analysis. As in other contexts, the purpose and market-effect factors will often be determinative.
- Fair use is a highly fact-specific inquiry: The process of training an AI model with copyrighted works may implicate the reproduction or derivative-work rights. Courts will likely scrutinize why a model was trained, how it is deployed, and whether any guardrails prevent the disclosure of expressive content. If the intended use of the AI model is not meaningfully transformative, the works were procured unlawfully, or the AI's outputs readily compete in the same market as the originals, such use may be more likely to fall outside fair use.
- Licensing over legislation: The Office favors voluntary licensing and industry-led solutions over broad legislation. Industries like music, news, and art are addressing GenAI training needs within their specific fields, and no new compulsory licensing nor broad statutory changes have been recommended by the Office at this time—collaboration and court rulings will shape these developments.