A new report by Human Rights Watch unveils the use of over 170 images of Brazilian children without consent in AI training datasets. The images, sourced from personal blogs and YouTube videos, date back to the mid-1990s and have raised concerns about privacy and data misuse.
A new report from Human Rights Watch alleges that over 170 images and personal details of Brazilian children have been scraped from the internet without consent and utilized to train artificial intelligence (AI). The images span from the mid-1990s to 2023, sourced from mommy blogs, personal blogs, and YouTube videos, typically shared for family and friends.
Human Rights Watch’s researcher Hye Jung Han discovered these images in LAION-5B, an open-source dataset used to train various AI models, including Stability AI’s Stable Diffusion. Created by the German nonprofit LAION, this dataset contains over 5.85 billion image-caption pairs.
LAION spokesperson Nate Tyler noted that the dataset has been taken down in response to a Stanford report identifying illegal content within it. LAION is coordinating with the Internet Watch Foundation, the Canadian Centre for Child Protection, Stanford University, and Human Rights Watch to eliminate known references to illicit content.
YouTube, from where some content was scraped, stated that unauthorized scraping violates their Terms of Service. This issue underscores growing concerns about the potential misuse of AI training data, including generating explicit content and revealing sensitive personal information.