Why is it so hard to request these AI companies to provide the sources used for their dataset?

Then if most are illegally acquired, seized the models. The AI company could retrain a model from scratch but with legally acquired data.

  • Pisck@lemmy.ml
    link
    fedilink
    English
    arrow-up
    6
    ·
    1 year ago

    You’ve answered your own question - transparency is potentially self-incriminating.

  • Rentlar@lemmy.ca
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    Most will not be criminal liability but more likely civil liability. By admitting to using datasets containing intellectual property or other information that infringes on others’ rights, those parties can sue them. But by keeping it all under a mysterious black box and blending everything together, any specific rights holder would have a difficult time putting together a case.