AI developers are going to find it almost impossible to avoid plagiarism, theft, and illegal access to private data. The thing is, AI cannot be trained without massive amounts of information, and no one has permission to enter the internet and collect everything they find.
It is becoming increasingly difficult to keep up with the demands for copyright infringement against generative AI, and last week a new class action lawsuit was filed in court.
This time, the authors are suing Nvidia for its AI platform NeMo, a language model that allows companies to create and train their own chatbots, as reported by Ars Technica. They claim that the company trained it on a controversial dataset that illegally used their books without consent.
Nvidia used stolen books to train its AI
Authors Abdi Nazemian, Brian Keene, and Stewart O’Nan demanded a jury trial and asked Nvidia to pay damages and destroy all copies of the Books3 dataset used to train the NeMo language models (LLM).
They claim that this dataset copied a shadow library called Bibliotek composed of 196,640 pirated books.
“In summary, NVIDIA has admitted to training its NeMo Megatron models with a copy of The Pile dataset,” the lawsuit claims. “Therefore, NVIDIA necessarily also trained its NeMo Megatron models on a copy of Books3, because Books3 is part of The Pile.”
Certain books written by the plaintiffs are part of Books3 – including the infringed works – and, therefore, Nvidia necessarily trained its NeMo Megatron models on one or more copies of the infringed works, thus directly infringing the plaintiffs’ copyright, they explain.
In response, Nvidia stated to The Wall Street Journal that “we respect the rights of all content creators and believe that we created NeMo in full compliance with copyright law.”
Last year, OpenAI and Microsoft faced a copyright lawsuit filed by authors of non-fiction works, who claimed that the companies were making money from their works but refusing to pay them. Earlier this year, a similar lawsuit was filed.
This adds to a lawsuit by news organizations such as The Intercept and Raw Story and, of course, to the legal action that started all this by The New York Times.