A recent report by Human Rights Watch exposes the inclusion of over 170 images of Brazilian children in an AI training dataset, raising concerns over privacy violations and deepfake generation.

A recent report by Human Rights Watch revealed the unauthorized use of over 170 images and personal details of Brazilian children in AI training datasets. These images, scraped from the web, were included in LAION-5B, a dataset utilized by various AI models, including Stability AI’s Stable Diffusion. The dataset derived its content from Common Crawl, a web scraping repository.

The data collection occurred from sources such as mommy blogs and YouTube videos, often posted with an expectation of privacy. LAION-5B, created by the German nonprofit LAION, holds more than 5.85 billion image-caption pairs.

Hye Jung Han, a children’s rights and technology researcher at Human Rights Watch, asserts that this practice violates children’s privacy and exposes them to significant risks. Deepfakes generated from such data further exacerbate these threats, potentially facilitating malicious use. In response, LAION has begun removing illegal content from the dataset, working with organizations like the Internet Watch Foundation.

YouTube and other platforms cite such scraping as a violation of their terms of service and have committed to taking action against these practices. The issue highlights concerns over AI’s ability to generate realistic deepfakes and the broader risks of data privacy, especially for children.

Share.
Leave A Reply

Exit mobile version