AI Giants' Data Hunt Raises Ethical Concerns Amid YouTube Transcription Scandal
April 8, 2024In late 2021, leading tech companies OpenAI, Google, and Meta were grappling with a shortage of data needed to train advanced AI systems.
OpenAI employed Whisper, a speech recognition tool, to transcribe over a million hours of YouTube content, a move that may contravene YouTube's policies.
Google and Meta also engaged in practices to acquire AI training data that raised concerns, including transcribing YouTube videos and contemplating the purchase of publishing rights.
Meta specifically faced difficulties in gathering ample English-language data for training an AI model to compete with OpenAI's ChatGPT, sparking ethical and legal issues.
The data scarcity has underscored the pressing demand for explicit regulations and ethical standards governing the use of copyrighted materials for AI development.
Summary based on 3 sources
Get a daily email with more Tech stories
Sources
Yahoo News • Apr 6, 2024
How Tech Giants Cut Corners to Harvest Data for AIPCMag • Apr 6, 2024
Google Let OpenAI Scrape YouTube Data Because Google Was Doing It TooBusiness Standard • Apr 7, 2024
How tech giants cut corners to harvest data for artificial intelligence