Artificial intelligences are advancing at an overwhelming pace. They have more functions, can do more things, and to a certain extent, one could say they are more ‘intelligent.’ However, as many people know, this is due to the immense amounts of data that companies use to train their AI. Data that often comes from rather… controversial sources.
“A few months ago, we discussed how many AI systems generating art used works from various artists to train without their consent. This week, it’s been revealed that Meta has done something similar to train its new AI-powered virtual assistant, Meta AI.
As acknowledged by Meta’s Vice President of Global Affairs, Nick Clegg, during the annual Meta Connect conference, the company used posts from Facebook and Instagram users to train its new artificial intelligence. However, the company asserts that private posts and users’ chat messages were not utilized.”
“We have tried to exclude datasets where personal information predominates,” stated Clegg, in statements reported by Reuters, who wanted to make it clear that the “vast majority” of the data used by Meta for training were public.
Moreover, the executive asserts that Meta took measures during this training process to ensure filtering out private details of users from the public datasets used. However, it is unknown what exactly the company considers as ‘public’ or ‘private,’ and whether the AI has been trained on sensitive data.”
“Meta’s statements come at a time when many major tech companies, such as OpenAI or Google, have been criticized by both users and regulatory bodies for using information extracted from the internet without permission to train their AI systems.”