Connect with us

Tech

Meta Used Copyrighted Books For AI Training Despite Legal Warnings

Avatar of AlishbaW

Published

on

Meta Used Copyrighted Books For AI Training Despite Legal Warnings

(CTN News) – According to a recent filing in a copyright infringement lawsuit, Meta Platforms was warned by its lawyers about the legal risks of using pirated books to train its AI models.

However, the company proceeded with this action despite the warning. The filing, which consolidates two lawsuits brought against Meta by notable authors such as Sarah Silverman and Michael Chabon, alleges that utilized their works without permission to train its AI language model, Llama.

The Silverman lawsuit had a portion dismissed by a California judge, who indicated that the authors would be allowed to amend their claims.

Meta has not yet responded to the allegations. The new complaint, filed on Monday, includes chat logs of a Meta-affiliated researcher discussing the acquisition of the dataset in a Discord server.

This evidence suggests that Meta was aware that its use of the books may not be protected by US copyright law.

Researcher Tim Dettmers discusses his correspondence with Meta’s legal department regarding the use of book files as training data.

Dettmers states that using The Pile on Facebook is currently not feasible due to legal constraints. Meta has acknowledged using The Pile to train its initial version of Llama, but Dettmers notes that Meta’s lawyers informed him that the data cannot be used or models cannot be published if trained on that data.

The concerns likely stem from books with active copyrights. Dettmers, when approached by Reuters, was unable to comment on the allegations.

Tech companies this year have faced lawsuits for using copyrighted works without permission to develop generative AI models, which have gained attention and investment.

If successful, these cases could reduce enthusiasm for generative AI by increasing expenses for AI companies, who may have to compensate artists and authors for using their works.

Additionally, new regulations in Europe may require companies to disclose the data they use to train their models, exposing them to legal risks.

Meta released the initial version of its Llama large language model in February, disclosing the datasets used for training. However, they did not disclose the training data for their latest model, Llama 2, which was released this summer.

Llama 2 can be used for free by companies with fewer than 700 million monthly active users, posing a threat to dominant players like OpenAI and Google who charge for model usage.

SEE ALSO:

ChatGPT 4 Has Grown Sluggish, According To OpenAI

Alishba Waris is an independent journalist working for CTN News. She brings a wealth of experience and a keen eye for detail to her reporting. With a knack for uncovering the truth, Waris isn't afraid to ask tough questions and hold those in power accountable. Her writing is clear, concise, and cuts through the noise, delivering the facts readers need to stay informed. Waris's dedication to ethical journalism shines through in her hard-hitting yet fair coverage of important issues.

Continue Reading

CTN News App

CTN News App

Recent News

BUY FC 24 COINS

compras monedas fc 24

Volunteering at Soi Dog

Find a Job

Jooble jobs

Free ibomma Movies