Copyright Group Takes Down Dutch Language AI Dataset (aol.com) 14

Posted by msmash on Tuesday August 13, 2024 @12:50PM from the tussle-continues dept.

Dutch-based copyright enforcement group BREIN has taken down a large language dataset that was being offered for use in training AI models, the organization said on Tuesday. From a report: The dataset included information collected without permission from tens of thousands of books, news sites, and Dutch language subtitles harvested from "countless" films and TV series, BREIN said in a statement. Director Bastiaan van Ramshorst told Reuters it was not clear whether or how widely the dataset may already have been used by AI companies. "It's very difficult to know, but we are trying to be on time" to avoid future lawsuits, he said. He said the European Union's AI Act will require AI firms to disclose what datasets they have used to train their models.

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 14 Comments Log In/Create an Account

Comments Filter:

- Re: They shot themselves in the foot (Score:3)
  
  by Fons_de_spons ( 1311177 ) writes:
  
  That has its advantages. I often get Phishing messages. As everybody speaks Dutch down here, they are easy to spot. If they communicate in English you know it is fake. If I get a phonecall from Microsoft and it is in English you know it is a scam. They try with automatic translation, but it is obviously bad. Having bad language models for Dutch may not be that bad. We're used to googling in English anyway. We'll live.
- Re: (Score:2)
  
  by NettiWelho ( 1147351 ) writes:
  
  because relying on a [..] based AI might not be wise
  .. Humans do not have AI yet so nothing of value was lost.
AOL is has a new site? (Score:2)

by nugatory78 ( 971318 ) writes:

Off topic.... but the link is to AOL.com and I'm more shocked by this than the article itself.
- Re: (Score:2)
  
  by ebunga ( 95613 ) writes:
  
  Back before they were bought by Verizon back in 2015, AOL was a multi-billion-dollar company more than a decade past the end of AOL classic being an important product. They are now owned by Yahoo, which is also still a multi-billion-dollar company with over 10,000 employees. Oh, and apparently AOL desktop is still a thing, and it costs $7/mo.
Follow the trail (Score:1)

by EldoranDark ( 10182303 ) writes:

Now they need to track down any LLMs that may have been contaminated with the compromised training material, take down those and inform any users of derivative work to put up appropriate disclaimers.
- Re: (Score:2)
  
  by cowdung ( 702933 ) writes:
  
  In courts they will argue that LLMs are transformative, not derivative. But I guess we have to wait to see how this plays out.
  - Re: (Score:2)
    
    by cowdung ( 702933 ) writes:
    
    The dataset may violate copyright, but the LLM may not even if trained on the dataset.
  - Re: Follow the trail (Score:2)
    
    by godrik ( 1287354 ) writes:
    
    Wrll, they eill go to dutch courts And I don't know if that distinction matters there!
  - Re: (Score:3)
    
    by fph il quozientatore ( 971015 ) writes:
    
    It's interesting to imagine a world in which LLMs are ruled to be derivative, and yet the copyright law we know stays unchanged.
    
    This would mean that illegally-trained LLMs can use a larger dataset and deliver much better results than legal ones, opening the door to all sorts of abuse. People would illegally mod their own vocal assistants, search engines, and image generators to get better results. Startups would be tempted to violate the law and secretly use an illegal model on their server to gain customer
    - Re: (Score:2)
      
      by martin-boundary ( 547041 ) writes:
      
      That's an interesting world. Also, police SWAT teams would break down doors and shoot or arrest people running illegal LLMs, electricity companies would monitor potentially illegal power surges, and the mob would offer protection rackets for LLM businesses.
    - Re: Follow the trail (Score:1)
      
      by EldoranDark ( 10182303 ) writes:
      
      Well, we have a similar situation with food. We track where it was grown, restrict certain practices, monitor storage and transportation conditions, and crack down on problematic products. We'll have to make sure any use of LLMs is documented, declared and can be verified by independent sources, even if they have to sign NDAs for access. It's all a combination of LLM model, settings, seeds and prompts. If you cant reproduce a result, you're doing something fishy.
  - Re: (Score:1)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Copyright Group Takes Down Dutch Language AI Dataset (aol.com) 14

Copyright Group Takes Down Dutch Language AI Dataset More Login

Copyright Group Takes Down Dutch Language AI Dataset

Re: They shot themselves in the foot (Score:3)

Re: (Score:2)

AOL is has a new site? (Score:2)

Re: (Score:2)

Follow the trail (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: Follow the trail (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: Follow the trail (Score:1)

Re: (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot