News
OpenAI confirms that it is impossible to train ChatGPT… without stealing data
Generative tools have a plagiarism problem that is difficult to solve.

- January 9, 2024
- Updated: July 2, 2025 at 12:15 AM

Training an artificial intelligence is not a simple task. Large generative models like ChatGPT or DALL-E need gigantic datasets to improve their capabilities and results. However, sometimes these datasets may include copyrighted material. According to OpenAI, the company behind ChatGPT, it is a necessary evil. The company claims that it would be “impossible” to create such high-level neural networks without using copyrighted material.
Public access to generative models and their extreme popularity has made their legislation lag behind, not knowing how to proceed in these cases. In an investigation into the risks and opportunities presented by these tools carried out by the UK Commission on Communications and Digital Affairs, OpenAI admitted that their models require copyrighted material to function.
In this case, the company has come to confirm what was an open secret. And it is that if we turn to these tools, we will see that it is not very difficult to recreate scenes from very famous movies or existing writings. But are these practices legal? To this day, it is a question that continues to generate a lot of controversy.
A report by the IEEE states that Midjourney and DALL-E 3, two of the most popular image generation models, can recreate existing movie and video game scenes almost to the millimeter. Two of the co-authors of this report, Gary Marcus (AI expert) and Reid Southern (digital illustrator), conclude with almost certainty that both Midjourney and OpenAI trained their generative models with protected works.

For OpenAI, the explanation is simple: “As copyright covers virtually all types of human expression today […] it would be impossible to train the leading AI models without using copyrighted material.” On the other hand, OpenAI has offered to indemnify companies that make rights claims, as long as customers have not consciously generated such works. In this case, if I ask DALL-E to recreate a scene from a protected movie exactly, this would not entitle me to compensation.
Artist by vocation and technology lover. I have liked to tinker with all kinds of gadgets for as long as I can remember.
Latest from María López
You may also like
NewsGood news! Generation Z is increasingly going to the movie theaters… unlike the rest of the world
Read more
NewsThe new Tekken champion is Japanese and is 92 years old
Read more
NewsOne of the most successful movies of the year premieres on HBO Max
Read more
NewsNaughty Dog is back at it: they are already crunching for their next game
Read more
NewsPedro Almodovar returns with a new movie that is very Christmas-like
Read more
NewsLarian Studios will hold an AMA to clarify their stance on AI early in the year
Read more