Copyright Group Takes Down Dutch Language AI Dataset (aol.com) 14
Dutch-based copyright enforcement group BREIN has taken down a large language dataset that was being offered for use in training AI models, the organization said on Tuesday. From a report: The dataset included information collected without permission from tens of thousands of books, news sites, and Dutch language subtitles harvested from "countless" films and TV series, BREIN said in a statement. Director Bastiaan van Ramshorst told Reuters it was not clear whether or how widely the dataset may already have been used by AI companies. "It's very difficult to know, but we are trying to be on time" to avoid future lawsuits, he said. He said the European Union's AI Act will require AI firms to disclose what datasets they have used to train their models.
Re: They shot themselves in the foot (Score:3)
Re: (Score:2)
because relying on a [..] based AI might not be wise
.. Humans do not have AI yet so nothing of value was lost.
AOL is has a new site? (Score:2)
Re: (Score:2)
Back before they were bought by Verizon back in 2015, AOL was a multi-billion-dollar company more than a decade past the end of AOL classic being an important product. They are now owned by Yahoo, which is also still a multi-billion-dollar company with over 10,000 employees. Oh, and apparently AOL desktop is still a thing, and it costs $7/mo.
Follow the trail (Score:1)
Re: (Score:2)
In courts they will argue that LLMs are transformative, not derivative. But I guess we have to wait to see how this plays out.
Re: (Score:2)
The dataset may violate copyright, but the LLM may not even if trained on the dataset.
Re: Follow the trail (Score:2)
Wrll, they eill go to dutch courts And I don't know if that distinction matters there!
Re: (Score:3)
This would mean that illegally-trained LLMs can use a larger dataset and deliver much better results than legal ones, opening the door to all sorts of abuse. People would illegally mod their own vocal assistants, search engines, and image generators to get better results. Startups would be tempted to violate the law and secretly use an illegal model on their server to gain customer
Re: (Score:2)
Re: Follow the trail (Score:1)
Re: Follow the trail (Score:1)
European law does not have such distinctions in copyright law, especially not with the AI angle, copyright violation there is based on intent and profit motives. They will consider damages and guilt primarily based on the size, mission and origin of the company.