The Copyright Crisis in GenAI: Challenges, Expert Perspectives and Predictions

Within the AI domain, the legal framework is still evolving, shaping gradually by each legal case offering insights towards clearer guidelines. This ongoing development prompts active involvement from both content authors and AI companies in shaping the legal landscape. Therefore, we take a closer look at industry legal practices, seeking to grasp how authors and companies endeavor to tilt the scales in their favor and discuss predictions, providing insights into the potential future trajectory of the copyright crisis in the GenAI field.

This article was made possible with the invaluable assistance of DMLA’s President, Leslie Hughes. The Digital Media Licensing Association (DMLA), representing the united voice of the media production, distribution, and licensing industry for over 70 years, has played a major role in shaping the content of this piece.

Recent Accusations Against AI Companies

Here is just a glimpse into the multifaceted debates surrounding copyright in AI. Yet, within these legal cases, we witness a tapestry of diverse perspectives that underscore the complexities of navigating ongoing copyright crisis:

Stability AI

Stability AI is at the center of a legal dispute with Getty Images, accused of copyright infringement, database right infringement, trademark infringement. Getty Images claims that Stability AI, an open-source GenAI company, unlawfully copied and processed millions of images from Getty Images’ database to train its deep learning AI model, Stable Diffusion, all without the license. Getty Images, recognizing AI’s potential to stimulate creative endeavors, highlights its practice of providing licenses to technology innovators, emphasizing respect for personal and intellectual property rights — something Stability AI is accused of neglecting in its pursuit of standalone commercial interests.

The Copyright Crisis in Generative AI. An example of an AI-generated image with Getty Images watermark. — An example of an AI-generated image with Getty Images watermark. Source: Forbes

In addition to the trademark issues faced by Stability AI, many have noticed the inclusion of signatures, often mere squiggles, within images generated by the model. The very presence of such signatures serves as evidence that the model was trained on images with them, leading to their reproduction in the output.

On a related note. Check out the signature on this one. It's almost readable. In the latest version of Stable Diffusion signatures are a lot more readable. But here's the kicker. This signature almost says "Karla Conway." That would be the subject (sort of), not any artist #aiart pic.twitter.com/lYakUkg6vB
— Lynn Cole 🏳️‍⚧️ (@PriestessOfDada) December 12, 2022

Alongside this, artists are also raising concerns about the replication of their distinctive artistic styles in generated photos, adding complexity to the legal dispute. The discussions on artistic style infringement are gaining prominence, with several authors emphasizing the potential violation of their rights. The next case, in particular, is noteworthy in shedding light on this aspect.

Midjourney, Stability AI, DeviantArt

Midjourney, Stability AI, and Deviantart, prominent players in the AI and creative industries, are confronted with a significant legal challenge. Artists Sarah Andersen, Kelly McKernan, and Karla Ortiz have collectively filed a lawsuit against these companies, asserting infringement on the rights of millions of artists. The legal action alleges that these companies trained their AI tools on a dataset of five billion images scraped from the web without securing consent from the original artists. The lawsuit highlights the potential risk of flooding the market with an unlimited number of infringing images, stressing the need to ensure fairness and ethics in the realm of AI.

Another more recent case involves Midjourney, facing controversy as the alpha test for Midjourney V6 sparks worries about the generation of images closely mirroring copyrighted originals and iconic figures.

So, yeah, one of those #MidjourneyV6 “screencaps” was reminiscent of HBO’s “Game of Thrones,” so why not add insult to injury and blend it with a still from the show? pic.twitter.com/19QlVhS7fl
— The Calavera Dadaist (@M4RC3L_DCHMP) January 25, 2024

Users, sharing examples, point to instances where Midjourney’s outputs bear striking resemblance to well-known movies. Persistent observations of Midjourney’s cinematic tendencies, marked by images exhibiting cinematic light and composition, further contribute to the discourse.

A look at midjourney's journey
byu/Click_Obvious inmidjourney

Interestingly, Midjourney also updated its Terms of Service shortly after the release of V6, emphasizing the responsibility of the user who creates images that may infringe copyright. Besides, this trend of shifting responsibility to end users is common among AI companies.

In a conversation with us, Leslie Hughes shared an example of stock agencies attempting to shift responsibility to clients for their usage and infringement. Over the years, it became evident that responsibility can lie on all sides.

In the case of generative AI, I have a hard time believing that a court would hold users responsible for infringement of created images. This could be an issue from a trademark perspective, for example. Let’s take an AI system that had ingested the illustrations and video from “The Simpsons,” a very popular animated TV show. The characters are protected under trademark law. If a generated image comes out looking like Bart Simpson, who should be held accountable? The person who generated the image or the company that ingested protected works and did not make the client aware that this content could be protected? I think that the user might have some responsibility because ignorance is not usually a defense but the platform might very well have greater responsibility for making protected content available.

So the difference may be willful misuse or infringement versus unintentional misuse or infringement. If the images are used to create deepfakes, that is intentional misuse.

However, in the case above, the user has no way of knowing if the platform is using protected works. While Bart Simpson would be well known in the U.S., it might not be the case in Europe for example.

Leslie Hughes, DMLA’s President

OpenAI

In the latest legal showdown, The New York Times has filed a copyright infringement lawsuit against OpenAI, asserting that millions of its articles were used without permission to train AI models, including ChatGPT. The complaint accuses OpenAI of capitalizing on The Times’s substantial investment in journalism, creating products that substitute for The Times and divert audiences away from it.

The Copyright Crisis in Generative AI. An example of ChatGPT's outputs.

OpenAI, in response, challenges allegations of intentional misuse, asserting that the lawsuit lacks merit. OpenAI insists on its fair use in AI model training, featuring an opt-out option for publishers and underscoring its efforts to tackle concerns such as unintentional content duplication, referred to as regurgitation:

Interestingly, the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.

OpenAI

Generally, OpenAI expresses optimism for a constructive partnership with news organizations to leverage AI’s transformative potential in journalism.

AI Companies’ Efforts in Shaping Copyright Discourse

One of the prominent efforts undertaken by AI companies in this copyright crisis is to publicly prove that training with copyrighted data is fair use.

To bring clarity to the topic, it is important to break down the key term that frequently takes the spotlight: fair use. Fair use is a legal doctrine that promotes freedom of expression by permitting the unlicensed use of copyright-protected works in certain circumstances. Fair use allows the limited use of copyrighted material without permission for purposes like criticism, commentary, news reporting, teaching, and research.

The ongoing debate surrounding the fair use of copyrighted works in training AI programs has garnered diverse perspectives from key industry players:

Some companies compare the training to the act of “knowledge harvesting” that aligns with the purpose of copyright law, highlighting that the necessity for copies in the technological extraction of ideas and facts should not affect the fair use outcome.
Others emphasize the broadly beneficial purpose of using works in training, limiting the use of copies to program development without publicizing them. They argue that training GenAI models with copyrighted material is non-consumptive, highlighting the transformative process that avoids storing copyrighted data during training to respect copyright holders’ rights.
Some draw a parallel with influential cases where intermediate copying was considered essential for reverse engineering, contributing to a rise in independently designed video games and promoting creative expression — a fundamental goal of the Copyright Law of the U.S. Take, for example, the Sega v. Accolade case, where the court deemed intermediate copying of Sega’s software in reverse engineering as fair use.
Or even point to other countries that have reformed copyright laws to create a safe space for AI training, fostering innovation in the industry.

There is also found a slightly different perspective, asserting that GenAI code is copyrightable, emphasizing the importance of human authorship in the final output. For instance, when a human developer controls GenAI tools, reviews proposed code, and makes decisions on its form and use, including conversions, the resulting code is believed to have sufficient human authorship to be protected by copyright.

It is worth noting that these statements may also be prompted by the context in which companies operated before. They collectively underscore dissatisfaction with current norms, indicating a need for compromise as old standards fall short, and new ones are yet to emerge.

The U.S. legal framework has long been one of the most effective for venture capital investment, thanks to a well-functioning system of checks and balances that creates an environment conducive to business and innovation.

The GenAI cases illustrate a scenario where these strengths can inadvertently stifle innovation. The very nature of machine learning technology requires access to data, yet the existing legal landscape, once beneficial to all, now constrains AI companies by creating barriers to training neural networks on publicly available data. Ignoring these norms predictably invites copyright challenges, with creators seizing the opportunity to defend their rights. And, predictably, AI companies are scrambling to establish legal precedents for data access.

GenAI is undoubtedly the new oil, however, we can’t close our eyes to the fact that AI companies risk pushing the content industry into a deadlock.

Dmitry Shironosov, CEO of Everypixel

Faced with shifting legal dynamics against them, companies seem to take the initiative to influence lawmaking and advocate for their perspectives. Take, for instance, Stability AI’s statement to the U.S. Senate AI Insight Forum, which addresses copyright concerns among other topics, emphasizing fair use and the transformative nature of AI development.

Authors’ Perspectives

Authors, in defense of their rights, also take various actions. They sign letters to major AI CEOs, underscoring the injustice of using their works without consent, credit, or compensation. Expressing concerns about AI systems replicating content without acknowledgment, they urge AI leaders to address potential harm to authors’ livelihoods.

Authors also contribute to Congressional testimonies on AI and copyright, debunking the false equivalence of AI models to human learning, and clarifying that AI operates on mathematical algorithms, lacking true creativity. They argue against inefficient opt-out options, advocating for an explicit opt-in approach to ensure the ethical use of creators’ data by AI companies.

It is our position that training with copyrighted material is not fair use. In the U.S., protection under copyright law is actually part of the U.S. Constitution. In effect, it gives creators the right to say if, how and when their copyrighted works are used. “Public availability” is not the measurement. Otherwise, one could make the argument that anything on Google is publicly available and therefore not protected by copyright. Copyrighted works can be made available for education and informational use but that does not mean permission has been granted for a commercial use. In fact, this is a core issue in some of the lawsuits now underway. […]

While fair use does address transformation, there are actually four factors that are examined under Fair Use –

1) Purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes.

2) Nature of the copyrighted work

3) Amount and substantiality of the portion used in relation to the copyrighted work as a whole

4) Effect of the use upon the potential market for or value of the copyrighted work

Also, the goal of the Fair Use Doctrine is to protect freedom of expression not to circumvent artists’ right to make money from their works. Licensing training sets, for example, offers real market potential for collections and distributors. Additionally, the companies that have scraped content are generating income from the use of this content.

Leslie Hughes, DMLA’s President

Despite reservations about the effectiveness of opt-out, major players in AI, such as OpenAI, Google, and Stable Diffusion, are implementing measures allowing content creators to opt out of having their work used in AI training.

Among other approaches for copyright protection, there is data poisoning. A tool, developed by Nightshade, for example, designed to counter unauthorized AI data scraping and protect the intellectual property of visual artists, works by “poisoning” the data within images. This process, invisible to the human eye, renders the images ineffective for AI training.

Even that small number of cases against AI companies highlight a shared concern among authors, and Dmitry Shironosov, CEO of Everypixel, stresses the importance of paying attention to it and recognizing authors as an integral part of the market.

The value authors bring to the ecosystem is often overlooked. AI wouldn’t exist without the intellectual contributions of authors, who can’t simply be excluded from the creative process. Including creators in the AI ecosystem isn’t just a matter of fairness or ethical practice; it’s a strategic move to unlock the full potential of AI. It pushes AI development in a direction that is not only technologically advanced but also creatively boundless, thanks to the unique human touch that only artists and creators can provide.

Dmitry Shironosov, CEO of Everypixel

Spread the word

The Copyright Crisis in Generative AI: Challenges, and Expert Perspectives

Recent Accusations Against AI Companies

AI Companies’ Efforts in Shaping Copyright Discourse

Authors’ Perspectives

Everypixel Journal