News
OpenAI confirms that it is impossible to train ChatGPT… without stealing data
Generative tools have a plagiarism problem that is difficult to solve.

- January 9, 2024
- Updated: July 2, 2025 at 12:15 AM

Training an artificial intelligence is not a simple task. Large generative models like ChatGPT or DALL-E need gigantic datasets to improve their capabilities and results. However, sometimes these datasets may include copyrighted material. According to OpenAI, the company behind ChatGPT, it is a necessary evil. The company claims that it would be “impossible” to create such high-level neural networks without using copyrighted material.
Public access to generative models and their extreme popularity has made their legislation lag behind, not knowing how to proceed in these cases. In an investigation into the risks and opportunities presented by these tools carried out by the UK Commission on Communications and Digital Affairs, OpenAI admitted that their models require copyrighted material to function.
In this case, the company has come to confirm what was an open secret. And it is that if we turn to these tools, we will see that it is not very difficult to recreate scenes from very famous movies or existing writings. But are these practices legal? To this day, it is a question that continues to generate a lot of controversy.
A report by the IEEE states that Midjourney and DALL-E 3, two of the most popular image generation models, can recreate existing movie and video game scenes almost to the millimeter. Two of the co-authors of this report, Gary Marcus (AI expert) and Reid Southern (digital illustrator), conclude with almost certainty that both Midjourney and OpenAI trained their generative models with protected works.

For OpenAI, the explanation is simple: “As copyright covers virtually all types of human expression today […] it would be impossible to train the leading AI models without using copyrighted material.” On the other hand, OpenAI has offered to indemnify companies that make rights claims, as long as customers have not consciously generated such works. In this case, if I ask DALL-E to recreate a scene from a protected movie exactly, this would not entitle me to compensation.
Artist by vocation and technology lover. I have liked to tinker with all kinds of gadgets for as long as I can remember.
Latest from María López
- The infinite canvas: use of Generative Expand for print bleeds and concept exploration
- From Prompt to Pattern: Creating Custom Vector Patterns for Fashion and Textiles with AI
- Rescuing lost memories: using Generative Upscale technology to prepare old family photos for printing
- Unlock the Power of Adobe Lightroom: Batch Editing & AI Features Explained
You may also like
- News
After sweeping through its country, the Portuguese version of the most iconic series of Spanish television arrives
Read more
- News
This remake of an iconic film about marital collapse arrives in theaters
Read more
- News
Orange Belgium is facing a significant data leak affecting 850,000 customers
Read more
- News
This new Netflix series delves into a political crisis led by women
Read more
- News
The AI tools and how they are redefining the online presence of companies
Read more
- News
Daniel Day-Lewis returns after 8 years of retirement, and he does it in style directed by his son
Read more