The New York Times investigation reveals major tech companies like OpenAI, Google, and Meta have been accused of bending rules to gather data for AI development, sparking legal and ethical concerns in the industry.
Major Tech Companies Accused of Rule-Bending in AI Data Collection
A recent investigation by The New York Times has revealed that leading tech companies, including OpenAI, Google, and Meta, have allegedly bent and broken rules to obtain data essential for developing their artificial intelligence (AI) systems. The investigation, involving technology reporter Cade Metz, highlights the measures these companies took to secure data, such as using copyrighted content.
In late 2021, OpenAI reportedly exhausted nearly all the available English language text on the internet for its ChatGPT system. To overcome this limitation, the company allegedly used audio and video content from various online sources, including YouTube, despite knowing it violated YouTube’s terms of service. A Google spokesperson stated the company was unaware of OpenAI’s actions, although some employees might have turned a blind eye due to Google’s similar practices.
Meta, also striving to enhance its AI capabilities, considered acquiring data by various means, including potentially purchasing publisher Simon & Schuster. However, it ultimately eschewed these plans in favor of gathering data from the internet, despite legal risks.
The report also highlights ongoing lawsuits from authors, publishers, and news organizations, including The New York Times, accusing these tech giants of using their copyrighted content without permission. Legal experts suggest that these lawsuits could significantly impact the future operations of AI companies, potentially requiring them to license data, which might be economically unfeasible given the volume needed.
If courts rule in favor of the plaintiffs, AI companies might have to develop new methods for data collection, potentially including the use of synthetic data generated by AI systems themselves.
This investigation underscores the complex legal and ethical issues surrounding the use of vast amounts of data in AI development, posing significant challenges for the tech industry moving forward.