News
OpenAI confirms that it is impossible to train ChatGPT… without stealing data
Generative tools have a plagiarism problem that is difficult to solve.

- January 9, 2024
- Updated: July 2, 2025 at 12:15 AM

Training an artificial intelligence is not a simple task. Large generative models like ChatGPT or DALL-E need gigantic datasets to improve their capabilities and results. However, sometimes these datasets may include copyrighted material. According to OpenAI, the company behind ChatGPT, it is a necessary evil. The company claims that it would be “impossible” to create such high-level neural networks without using copyrighted material.
Public access to generative models and their extreme popularity has made their legislation lag behind, not knowing how to proceed in these cases. In an investigation into the risks and opportunities presented by these tools carried out by the UK Commission on Communications and Digital Affairs, OpenAI admitted that their models require copyrighted material to function.
In this case, the company has come to confirm what was an open secret. And it is that if we turn to these tools, we will see that it is not very difficult to recreate scenes from very famous movies or existing writings. But are these practices legal? To this day, it is a question that continues to generate a lot of controversy.
A report by the IEEE states that Midjourney and DALL-E 3, two of the most popular image generation models, can recreate existing movie and video game scenes almost to the millimeter. Two of the co-authors of this report, Gary Marcus (AI expert) and Reid Southern (digital illustrator), conclude with almost certainty that both Midjourney and OpenAI trained their generative models with protected works.

For OpenAI, the explanation is simple: “As copyright covers virtually all types of human expression today […] it would be impossible to train the leading AI models without using copyrighted material.” On the other hand, OpenAI has offered to indemnify companies that make rights claims, as long as customers have not consciously generated such works. In this case, if I ask DALL-E to recreate a scene from a protected movie exactly, this would not entitle me to compensation.
Artist by vocation and technology lover. I have liked to tinker with all kinds of gadgets for as long as I can remember.
Latest from María López
- Using Acrobat And Illustrator To Create Interactive PDFs for Marketing
- Color Theory and AI: How To Create Perfect Palettes Every Time
- Adobe Premiere Pro: Using Generative Extend to create custom ambient audio and room tone
- One-Touch Landscapes: Enhancing Skies, Subjects, and Backgrounds with Scene Enhance on Mobile
You may also like
- News
This series starring Kristen Bell and Adam Brody returns to Netflix with its second season
Read more
- News
8,424 cryptocurrency wallets are stolen due to a cybersecurity issue
Read more
- News
If you're missing wacky humor in Borderlands 4, this modder is the reason it hasn't been there and for it to come back
Read more
- News
It’s the new French series that everyone is talking about and it will premiere very soon on Apple TV+
Read more
- News
The Yakuza saga confirms a remake of its most polarizing installment among fans
Read more
- News
One of the great classics of the 2000s video game will receive a remaster worthy of its legend
Read more