Companies Involved: Apple, Anthropic, Salesforce, Nvidia, and other AI firms.
Data Source: Used a publicly available dataset called Pile, containing plain text of YouTube video subtitles without any video imagery.
Creators Affected: Data included subtitles from popular YouTube creators such as MrBeast, Marques Brownlee, PewDiePie, and Indian creators like CarryMinati, BB ki Vines, and Ashish Chanchlani.
Models Trained: AI models including Apple’s OpenELM, as well as models from Salesforce, Nvidia, and Anthropic, reportedly used the Pile dataset according to their research papers.
Anthropic’s Statement: Jennifer Martinez, Anthropic spokesperson, clarified that the Pile includes a very small subset of YouTube subtitles and distinguished between direct use of YouTube’s platform and use of the Pile dataset. She referred potential violations of YouTube’s terms of service to the Pile authors.