Tech Giants Face Backlash for Unauthorized Use of YouTube Subtitles in AI Training
July 18, 2024Tech giants Apple, Nvidia, and Anthropic are criticized for using subtitles from over 170,000 YouTube videos without permission for AI training.
The data was sourced from EleutherAI's dataset The Pile, raising scrutiny from Google for potential platform terms violations.
The Pile includes data from sources like the European Parliament, Wikipedia, Enron emails, and YouTube subtitles, featuring content from popular YouTubers and conspiracy theory channels.
Despite Google's measures against unauthorized scraping, companies including Salesforce, Bloomberg, and Databricks have also used The Pile for AI training.
OpenAI reportedly downloaded over 1 million hours of YouTube videos for AI training.
The ethical concerns have not hindered Nvidia's stock performance, which surged 86% over two months and 155% year-to-date, highlighting the importance of verifying data sources to avoid legal issues.
Nvidia's stock could potentially reach a target of $150 if the upward trend continues.
Summary based on 34 sources
Get a daily email with more Startups stories
Sources
Mashable • Jul 16, 2024
Apple and Salesforce AI training datasets co-opt MrBeast, Marques Brownlee videosInc. • Jul 17, 2024
AI Data 'Scraping' Allegations Dog Apple, Anthropic, and NvidiaPCMag • Jul 16, 2024
Nvidia, Apple AI Scraped Dataset With 173K YouTube Videos, Taylor Swift Lyrics