
Private-order solutions to navigate the intricate legal challenges of Generative AI

This paper is the last in a series of five articles, where we have examined the complexities of copyright protection for AI-generated content, potential copyright infringements involving training data sets, the legal status of GenAI models, and the evolving legal responses. In this article, we propose private-order solutions for GenAI, offering actionable insights and recommendations to promote a legally-compliant and trustworthy framework for GenAI.
In order to foster trust and confidence in the development and deployment of GenAI, it has become imperative to establish robust legal frameworks to ensure transparent and accountable practices in the acquisition and usage of training datasets, as well as in the training and exploitation of GenAI models. These frameworks should outline clear guidelines for obtaining consent, securing necessary permissions, and ensuring compliance with intellectual property laws and data protection regulations. Given the current legal gray area surrounding GenAI, we proceed to outline possible solutions and recommendations for the private sector to address the legal intricacies and challenges inherent in GenAI. Ultimately, the adoption of ethical and industry standards for training datasets and developing GenAI systems represent an effective strategy to address current legal uncertainties.
Amidst the growing debate surrounding the training of GenAI models, private-order initiatives are emerging to tackle some of the legal issues that the law has yet to resolve. For instance, through the Spawning initiative, artists Mat Dryhurst and Holly Herndon are building a set of tools and solutions to meet the needs of rights holders and AI developers. After launching HaveIBeenTrained.com, a platform that allows artists to search if their works are in the dataset used to train Stable Diffusion, the artist duo has contributed to the creation of “Do Not Train Registry,” along with a system of machine-readable opt-out methods making it simple for right holders to declare data preferences and for AI trainers to respect them. This initiative allows artists to selectively opt out their works from being used in future AI model training, thereby preserving their creative integrity and addressing concerns regarding the unauthorized use of copyrighted material. Companies like Stability AI and HuggingFace have already partnered with Spawned, with a commitment not to train any of their GenAI models on the data that rights holders had explicitly opted-out from. The decision reflects a proactive effort to foster transparency and collaboration between artists and tech companies amidst the development of trustworthy and legally-compliant GenAI models.
Another private-order solution emerged as a response to writers and actors strikes in the US — one of the longest labor strikes in Hollywood history — that lasted over 100 days. In response to the strike, the Writers Guild of America (WGA) approved an agreement made with the Alliance of Motion Picture and Television Producers. The agreement requires producers to obtain explicit consent for using an individual’s image or work, and to ensure fair and equitable compensation for all relevant right holders. Although the agreement does not outlaw the use of AI tools in the writing process, it sets up guardrails to make sure that GenAI remains under the control of workers, rather than being used to replace them. For instance, some guarantees include that AI cannot be used to write or edit scripts that have already been written by a screenwriter, and that any substantial adaptation of an AI-generated script by a human is to be recognized as an original screenplay.
More generally, while fair use has often been claimed as a defense for training GenAI models with publicly available data, in jurisdictions where fair use is not readily applicable, or not sufficiently reliable, consent becomes the most effective protection against potential liability, ensuring compliance with copyright laws while protecting privacy and personality rights throughout the GenAI system’s life cycle. Accordingly, it may be advisable to establish robust data agreements not only for the use of copyrighted material, but also for the processing of personal or sensitive information in the training process. Mechanisms for data anonymization could also be leveraged to mitigate privacy risks associated with the inclusion of personal data in training datasets.
In the coming months, the negotiation of clear licensing agreements with explicit opt-in solutions for the use of personal and copyrighted data will be important in ensuring a secure and trustworthy environment for AI training. Additional mechanisms could also be put in place for recognizing the creative efforts of all participants in the AI creation process and fairly compensating original content creators, e.g., by exploring revenue sharing or royalty payment schemes to safeguard the interests of creators whose copyrighted materials are used in AI systems. Alternatively, a few initiatives are emerging with a view to establish public domain datasets (e.g., Hugging Face’s Common Corpus) specifically designed for training GenAI models in a way that does not violate copyright law.
The emergent AI landscape highlights the essential need for updated legal definitions and frameworks that accurately reflect the creative capabilities of Generative AI, along with effective safeguards for the intellectual property rights of individuals whose works are involved in training the system and generating outputs.
In addressing these new and intricate legal questions, policymakers, legal experts, AI practitioners, and content creators must find a way to leverage the new opportunities of Generative AI while ensuring the respect of intellectual property rights and other established legal principles. In examining the possible legal regimes that might apply to the three core components of GenAI — the training data, the AI models, and the generated output — we have shown that there exist still a lot of legal uncertainty with regard to both the extent of protection that can be granted to these digital assets, as well as the potential liabilities associated with them — in particular, with regard to copyright infringement and personality or privacy rights violation. The question of whether traditional copyright laws can be extended to cover Generative AI, or if the development of sui generis rights specifically tailored for GenAI is required, remains an ongoing subject of legal and policy deliberation.
As we conclude our analysis through these five short papers, while the law is struggling to keep up with the rapid pace of innovation, new private-ordering solutions focused on bringing more trust and confidence in the GenAI sector are emerging as a response to growing legal uncertainties — trying to navigate the regulatory grayzone by private means aimed at reducing the gap between existing regulatory framework and innovative practices in the AI field.
___
- ChatGPT: A Case Study on Copyright Challenges for Generative Artificial Intelligence Systems, Cambridge University Press, 29 August 2023, available at: https://www.cambridge.org/core/journals/european-journal-of-risk-regulation/article/chatgpt-a-case-study-on-copyright-challenges-for-generative-artificial-intelligence-systems/CEDCE34DED599CC4EB201289BB161965 (accessed on 03.27.2024)
- Spawning is an independent third party that created a Do Not Train registry intended to provide machine readable opt-outs to AI model trainers. they have partnered with two organizations (Stability and Hugging Face), who have agreed to honor the Do Not Train registry. They are actively working to partner with more organizations to ensure the widest adoption of respect for rights holders’ wishes.
- See https://www.theguardian.com/culture/2023/oct/01/hollywood-writers-strike-artificial-intelligence (accessed on 04.05.2024)
- In the United States, the writers’ union (WGA) went on strike for 148 days to protest the effects of artificial intelligence on audiovisual production. Just a couple of months later, actors joined for the same reason: the risk of AI replacing their jobs. In that case the strike took 118 days to end. The agreement delineates legal distinctions among “employment-based digital replica”, “independently created digital replica”, and “digital alterations”, thereby establishing foundational legal precedents for the utilization of AI in creative processes. See “How Hollywood writers triumphed over AI — and why it matters”, The Guardian, 1 October 2023, available at: https://www.theguardian.com/culture/2023/oct/01/hollywood-writers-strike-artificial-intelligence (accessed on 03.25.2024)
- Leistner, Matthias — Antoine, Lucie ANTOINE, IPR and the use of open data and data sharing initiatives by public and private actors, European Parliament, May 2022, available at https://www.europarl.europa.eu/RegData/etudes/STUD/2022/732266/IPOL_STU(2022)732266_EN.pdf
- Kop, Mauritz “The Right to Process Data for Machine Learning Purposes in the EU”, Harvard Law School, Harvard Journal of Law & Technology (JOLT) Volume 34, Digest Spring 2021, 22 June 2020, available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3653537 (accessed on 03.28.2024)
- ChatGPT: A Case Study on Copyright Challenges for Generative Artificial Intelligence Systems, Cambridge University Press, 29 August 2023, available at: https://www.cambridge.org/core/journals/european-journal-of-risk-regulation/article/chatgpt-a-case-study-on-copyright-challenges-for-generative-artificial-intelligence-systems/CEDCE34DED599CC4EB201289BB161965 (accessed on 03.27.2024)
- Common Corpus is the largest public domain dataset released for training LLMs. Developped by Hugging Face, Common Corpus includes 500 billion words from a wide diversity of cultural heritage initiatives, showing it is possible to train fully open LLMs on sources without copyright concerns https://huggingface.co/blog/Pclanglais/common-corpus


