Ralph Ellison took seven years to write. Invisible Man. J.D. Salinger took about 10 to write of catcher in the lie. JK Rowling spent at least five years on his first work Harry potter Book. Writing with the hope of getting published is always a bold leap of faith. Do you want to finish the project? Will it find an audience?
Whether the authors realize it or not, this gamble is justified to a large extent by copyright. Who would spend all that time and emotional energy writing a book if someone could steal it without any repercussions? This is the sentiment behind at least nine recent copyright infringement lawsuits against companies that use at least tens of thousands of copyrighted books to train generative AI systems. One of the lawsuits alleges “coordinated theft on a grand scale,” and the AI companies could be liable for hundreds of millions of dollars, if not more.
In contrast, companies such as OpenAI and Meta claim that their language models “learn” from books, just like humans do, and produce “transformative” original work. Therefore, they claim that no copies were made and the training is legal. “The use of text to statistically model language and train LLaMA to generate original representations is transformative in nature and is exemplary fair use,” Meta wrote in the lawsuit last fall. in a court filing for one of the companies, referring to its generative AI model.
But as artist Carla Ortiz told a Senate subcommittee last year, AI companies are using other people’s work “without consent, credit, or compensation” to develop products worth billions of dollars. are doing. For many writers and artists, the crisis is existential. Machines threaten to replace them with cheaper synthetic outputs that provide prose and illustrations on command.
In filing these lawsuits, the authors argue that copyright should prevent AI companies from continuing down this path. These lawsuits get to the heart of the role of generative AI in society. In other words, is AI contributing enough as a whole to supplement what we need? Since 1790, copyright law has fostered a thriving creative culture. Will it last?
Contrary to popular belief, copyright does not exist for the benefit of creators. According to its founding documents and recent interpretations, its purpose is to foster a culture that produces excellent works of science, art, literature, and music. This happens to be done by giving the people who produce those works significant control over their reproduction and distribution, and by providing financial incentives for doing so.
This concern for the public interest is why current law also allows for certain “fair uses” of copyrighted works. For example, printing a short quote from a book or displaying a thumbnail of a photo in search results is considered fair use, as is parody using a story’s plot or characters. (Remember space ball?) AI companies claim that training large language models on copyrighted books is also fair use. This is because the LLM does not duplicate the full text of the books it trains on, but transforms those books into new types of products.
These claims are currently being verified. “Is it in the public interest to allow AI to be trained using copyrighted material?” asked Justice Stefanos Vivas in an opinion a few months ago. Thomson Reuters vs. Ross IntelligenceThis is about the use of legal documents to train AI research tools. Mr. Vivas noted that each side has its own ideas about what benefits the people. While tech companies argue that AI products make knowledge more accessible, plaintiffs argue that because AI is typically stripped of authorship and presented as its creation, there is no incentive to share that knowledge in the first place. claim to decrease. Some writers have already stopped sharing their work online, and courts need to take seriously the idea that current AI training practices can have a chilling effect on human creativity. Dew.
A fundamental question raised by copyright law is whether a generative AI product provides a net public benefit. “Products that substantially impair”[s] Legal scholar Matthew Sugg said in Senate testimony last year that “copyright incentives” may not qualify as fair use. When people habitually ask questions on his ChatGPT instead of reading books and articles, the readership of books and articles (and the incentive to write them) decreases. Current AI tools that present human-generated knowledge without citations already prevent readers (and other authors) from connecting with people who share their interests, jeopardize the health of the research community, and undermine expertise. It undermines the incentive to cultivate and share. All of this could lead to a culture in which the circulation of knowledge is hindered rather than facilitated, and in which future Salingers decide it is not worth writing about. of catcher in the lie. (In response to such concerns, OpenAI argued this week in a motion to dismiss new york timesThe lawsuit against the company says ChatGPT is an “efficiency” tool and “is not in any way a substitute” for newspaper subscriptions. )
Technology companies and AI enthusiasts argue that if humans don’t need a special license to read books in a library, then neither should AI. But as legal scholar James Grimmelman points out, just because it’s fair for individuals to do something for self-study purposes doesn’t necessarily mean it’s fair for companies to do it on a large scale for profit. Not necessarily.
When it comes to the argument that AI training is fair use because it “transforms” the author’s original work, the cases typically cited as precedent include: Authors Guild vs. GoogleIn this case, the Authors Guild sued Google for scanning millions of books to create a research product known as Google Books. In this case, the judge held that Google Books primarily functioned as a research tool and had strict limits on the amount of copyrighted text and its purpose (to provide insight into the entire book collection); The court ruled that the scan was fair use. (runs through the Ngram Viewer) was very different from the purpose of the book used to build it (which is meant to be read).
However, generative AI products such as ChatGPT and DALL-E cannot always serve a purpose distinct from the books and artwork they are trained on. AI-generated images and text can replace purchasing a book or requesting an illustration. Also, the output of an LLM is usually different from the text used for training, but this is not always the case.
Recent times The suit filed by Universal Music Group and another suit show that LLMs sometimes copy training texts. According to UMG, Claude, his LLM created by Anthropic, can reproduce the lyrics of an entire song almost verbatim and present it to users as an original work.of times We showed that ChatGPT can reproduce large chunks. times article. This action is called “memorization.” At this point, this is difficult if not impossible to eliminate. Although it can be hidden to some extent, the complexity and unpredictability of generative AI (often referred to as a “black box”) makes it difficult for its creators to control how often and in what ways LLMs train it. We cannot give guarantees as to what will be quoted in your situation. data. Imagine a student or journalist who doesn’t promise not to plagiarize. That’s the ethically questionable position these products are taking.
Courts sometimes struggle with new technology. Consider a player piano that takes a roll of paper as input. A roll is a sheet of music where notes are punched holes rather than written symbols. A music publisher once sued a piano roll manufacturer for manufacturing and selling illegal copies. The case went all the way to the Supreme Court, which ruled in 1908 that the rolls were not copies, but merely part of the player piano’s “mechanism.” In retrospect, this decision makes little sense. This is like saying that a DVD is not a copy of the film because the sound and images are encoded digitally rather than analog.
This decision was overridden by the Copyright Act of 1909, which obligated manufacturers of piano rolls to pay royalties. But as Grimmelman told me, the esoteric techniques of reproduction can be deceptive. They can seem magical, incomprehensible, or disconnected from the intellectual property that powers them.
Some wonder whether copyright law, which has remained essentially unchanged since the late 1700s, can accommodate generative AI. Its basic unit is the “copy,” but ever since music and video streaming began in the 1990s, this concept has felt less relevant. Could generative AI finally bend copyright beyond the breaking point? I spoke to William Patry, a former senior official at the U.S. Copyright Office, about this. His copyright papers are among the most frequently cited in federal courts, and he was Google’s senior copyright attorney in the Authors Guild case. “I wrote law for a living for seven years,” he told me. “It’s not easy.” He said laws can’t always be changed to accommodate new technologies that often emerge that threaten to upend legal systems and social norms.
Copyright language may seem frustratingly outdated, but this is probably what good law should be. Joints need to be stable, so you know what’s going to happen, but at the same time they need to be dynamic “in the sense that there’s play in the joints,” said Patry, author of the book. I am. How to repair copyright, Said. Although he is critical of certain aspects of the law, he doubts that AI will ultimately become the technology that breaks the law.
Instead, he said, judges may be more cautious in their decisions. A comprehensive ruling on AI training is unlikely. Instead of saying “AI training is fair use,” the judge said that depending on the product’s features and the frequency of citations from the training data, it is fair for certain he to train AI products, but for other he AI products may decide otherwise. Additionally, different rules may apply for commercial and non-commercial AI systems. Grimmelman said judges may also consider extraneous factors, such as whether defendants are developing AI products responsibly or recklessly. In either case, the judge faces a difficult decision. As Mr. Vivas acknowledged, “Determining whether it is in the public interest to protect the creator or the reproducer is a dangerous and uncomfortable position for courts.”
Generative AI could not make up for the loss of fiction, investigative journalism, and deeply researched nonfiction. Because it is a technology that makes statistical predictions based on the data it encounters, it can only produce imitations of what came before. As the dominant mode of creativity, it will stop the culture in its tracks. The idea that human authors write and publish works that move us, help us empathize, and transport us to imaginary places that change our perspective and help us see the world more clearly. If we don’t have the will, we’ll just have a culture without those things. Generative AI may serve as a synthetic reminder of what came before, but can it help us build for the future?