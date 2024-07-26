This is AI generated summarization, which may have errors. For context, always refer to the full article.

Runway's Gen-3 is said to use scraped Youtube data from media and entertainment companies, alongside videos from content creators, based on an internal spreadsheet of its training data

Runway’s AI (artificial intelligence) video generation tool, Gen-3, was trained on scraped YouTube videos and pirated media, a new report from 404 Media revealed on Thursday, July 25.

The 404 Media report cited an internal spreadsheet of training data for Gen-3, which the outlet also tested. It also cited a former Runway employee, speaking anonymously to prevent professional retribution.

404 Media said that in its testing of the model “part of its training data is popular content from the YouTube channels of thousands of media and entertainment companies, including The New Yorker, VICE News, Pixar, Disney, Netflix, Sony, and many others.”

The spreadsheet also holds links to channels and individual videos from popular content creators, such as Marques Brownlee and Casey Neistat, among others.

A unnamed former Runway employee added in 404 Media’s report that, “The channels in that spreadsheet were a company-wide effort to find good quality videos to build the model with,”

The employee added, “This was then used as input to a massive web crawler which downloaded all the videos from all those channels, using proxies to avoid getting blocked by Google.”

In a June TechCrunch interview with Runway co-founder Anastasis Germanidis, he said Runway has “an in-house research team that oversees all of our training and we use curated, internal datasets to train our models,” but did not explain further.

Reports of AI being trained on data owned by other people isn’t new. Numerous authors and journalistic institutions have put forward lawsuits asserting companies used their work to train their artificial intelligence models. In its report on the Runway spreadsheets, The Verge noted OpenAI’s CTO, Mira Murati, as saying she “wasn’t sure” whether its Sora text-to-video generation tool was trained on YouTube.

404 Media added a caveat that, while it could not confirm every single video included in the spreadsheet was used to train Gen-3, as it was possible some content was filtered out at a later date and not all the links were scraped for data. Despite this, the report pointed out. “the training data reveals specifics about the generative AI industry, which has been repeatedly accused of training models on copyrighted material.”

Runway did not respond to media requests for a comment on the spreadsheets.

Google, which is a Runway investor, meanwhile referred media to a Bloomberg story from April for comment.

In it, YouTube CEO Neal Mohan said OpenAI training its AI video generator Sora with YouTube videos would violate YouTube’s rules and was not allowed. “Our previous comments on this still stand,” a Google spokesperson said, referring to the Mohan interview, when asked about Runway’s actions. – Rappler.com