Human Rights Watch report reveals over 170 images and personal details of children from Brazil have been scraped from the internet for AI training without consent, raising concerns about privacy breaches and potential misuse. The dataset, LAION-5B, sourced from Common Crawl, includes billions of image-caption pairs. Efforts are underway to clean the dataset and prevent further violations, highlighting broader concerns over deepfake technology and child protection.
A new report by Human Rights Watch, released on Monday, details that over 170 images and personal details of children from Brazil have been scraped from the internet and used to train artificial intelligence (AI) without consent. These images, some posted as far back as the mid-1990s, were included in the LAION-5B dataset, which is extensively used by AI startups for training purposes. The dataset, created by the German nonprofit LAION, includes over 5.85 billion image-caption pairs derived from Common Crawl, a massive repository of web data.
Hye Jung Han, a researcher at Human Rights Watch, warned that the inclusion of children’s images in AI training datasets breaches privacy and exposes them to potential misuse by malicious actors.
The dataset reportedly contained images of children sourced from personal blogs and YouTube videos, which were intended for limited sharing. In response to a Stanford University report that identified illegal content in the dataset, a LAION spokesperson confirmed that they have removed these entries and are collaborating with various organizations, including the Internet Watch Foundation and the Canadian Centre for Child Protection, to clean the dataset.
YouTube has emphasized that unauthorized scraping of its content violates its Terms of Service, and measures are being taken to combat this abuse. The issue has broader implications beyond privacy violations, as explicit deepfakes are increasingly being used to bully students in the U.S. schools, heightening concerns about the future misuse of AI-generated content.