Authors take Microsoft to court in yet another AI v copyright battle

The latest complaint comes as Meta and Anthropic both receive legal relief in similar copyright lawsuits.

A group of authors have filed a lawsuit against Microsoft, accusing the tech giant of using copyrighted works to train its large language model (LLM).

The class action complaint filed by 10 authors and professors, including Pulitzer prize winner Kai Bird and Whitting award winner Victor LaVelle claim that Microsoft ignored the law by downloading around 200,000 copyrighted works and feeding it to the company’s Megatron-Turing Natural Language Generation model.

The end result, the plaintiffs claim, is an AI model able to generate expressions that mimic the authors’ manner of writing and the themes in their work.

“Microsoft’s commercial gain has come at the expense of creators and rightsholders,” the lawsuit states. The complaint seeks to not just represent the plaintiffs, but similar copyright holders under the US Copyright Act.

The aggrieved party seeks damages of up to $150,000 per infringed work, as well as an injunction prohibiting Microsoft from using any of their works.

This latest lawsuit is yet another that seeks to challenge how AI models are trained. Visual artists, news publishers and authors are just some of the classes of creators who claim that AI models infringe upon their rights.

However, yesterday (25 June), a US court ruled that Meta’s training of AI models on copyrighted books fell under the “fair use” doctrine of copyright law.

The lawsuit was brought on by authors Richard Kadrey, Christopher Golden and Sarah Silverman back in 2023.

Earlier this year, the trio’s counsel claimed that Meta allowed Llama, its LLM, to commit copyright infringement on pirated data and upload it for commercial gain.

In the decision yesterday, the judge said that the plaintiffs “made the wrong arguments,” ultimately failing to prove their case.

However, he also added that the ruling does not mean that Meta’s use of copyrighted materials to train its LLM is lawful. The judge ruled that in this case, Meta’s use of copyrighted works was “transformative”.

While in another blow to authors, a different US court earlier this week ruled that Anthropic’s use of books to train Claude AI also qualifies as “fair use”.

This case was brought by another trio of authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson in 2024, who claimed that Anthropic used pirated versions of various copyrighted material to train Claude, its flagship AI model.

However, “Claude created no exact copy, nor any substantial knock-off. Nothing traceable to [the plaintiffs’] works,” the judge wrote in his summary judgement.

Although, it appears that Big Tech companies, at times, acknowledge the role copyright holders play in creating the primary data from which their AI models extrapolate from.

Last year, Bloomberg reported that Microsoft and publishing giant HarperCollins signed a content licensing deal where the tech giant could use some of HarperCollins’ books for AI training.

While AI search engine Perplexity, which has repeatedly come under fire for allegedly scraping content from news publishers, also launched a revenue sharing platform with publishers after receiving backlash.

Meanwhile OpenAI has a content-sharing deal for ChatGPT with more than 160 outlets in several languages.

Earlier this year, Thomson Reuters CPO David Wong told SiliconRepublic.com that not only is it possible to create AI systems that respect copyright, but that respecting copyright will further those systems and improve accessibility to information.

Recent rulings seem to place Big Tech as the emerging winner in the AI-fair use battle. Still, companies such as OpenAI and Microsoft continue to battle similar lawsuits.

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.