• krayj@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    11
    ·
    edit-2
    1 year ago

    The difference is that when the robot reads that book, it maintains a verbatim copy of that book as part of it’s training material indefinitely and can reference and re-reference that material infinitely. That is not how it works when a human reads a book.

    The ‘copy’ that the AI retains indefinitely is a verbatim copy of the original work, and the entire point of “copyright” is to control how and where copies are used.

    Yes, there are ‘fair use’ exceptions to copyright. I don’t think you realize it, but your argument is less about whether this violates copyright (it absolutely does under the textbook definition) and more about whether there should be a fair-use exemption for AIs; you seem to think yes, I would disagree.

    I’d also argue the AI example qualifies as it as ‘derivative work’ based on the original, which STILL would require honoring copyright laws and compensating the creators of the original works. Basically, before reading the book it was just “AI”. After reading the book it has become “AI + book1”, a derivative work, and on and on and on.

    • fubo@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      edit-2
      1 year ago

      The difference is that when the robot reads that book, it maintains a verbatim copy of that book as part of it’s training material indefinitely and can reference and re-reference that material infinitely. That is not how it works when a human reads a book.

      However, that is how it works when a human memorizes a copyrighted work. If I memorize a poem, I may then reference it from my memory without further need for the original text before me. If I am an actor and learn my lines for a play, I commit them to my memory.

      Which is not an infringement.

      The infringement happens if the human performs or publishes that work; e.g. reciting that copyrighted poem or play from memory before an audience; writing that work down from memory and publishing it; etc., without a copyright license for that performance or republication.

      I suggest merely applying the same standard: infringement doesn’t happen when a work is read, indexed, scanned, etc.; it does happen if that work is then recited.

      For instance, ChatGPT currently knows the text of the Harry Potter novels, but it does not recite them when asked to do so. (Try it! It will answer questions about the text, but it will freeze up if asked to recite it; evidently because it has a filter against reciting copyrighted material.)

      • FaceDeer@kbin.social
        link
        fedilink
        arrow-up
        11
        ·
        edit-2
        1 year ago

        No, the reason ChatGPT can’t recite the text of Harry Potter verbatim is because it doesn’t actually “contain” it. It learned from it, but it doesn’t “remember” it word-for-word. There is no filter against reciting copyrighted material. Try asking it to recite a scene from a Shakespearean play, for example - that’s out of copyright and ChatGPT was almost certainly trained on it. It may be able to quote some famous lines because it’s been overfit like crazy on them (“To be or not to be” is probably everywhere on the Internet) but that’s not a verbatim chunk.

        I’ve actually experimented with this myself on my local machine, I took one of the smaller open-source models and I gave it additional training using 20 megabytes of My Little Pony fanfiction. The AI knew a lot about the fanfic afterward but it was clearly just picking up tidbits of general knowledge rather than “remembering” the whole thing.

      • TitanLaGrange@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 year ago

        ChatGPT currently knows the text of the Harry Potter novels, but it does not recite them when asked to do so.

        I tried that several weeks ago while discussing some details of the Harry Potter world with ChatGPT, and it was able to directly quote several passages to me to support its points (we were talking about house elf magic and I asked it to quote a paragraph). I checked against a dead-tree copy of the book and it had exactly reproduced the paragraph as published.

        This may have changed with their updates since then, and it may not be able to quote passages reliably, but it is (or was) able to do so on a couple of occasions.

    • FaceDeer@kbin.social
      link
      fedilink
      arrow-up
      7
      arrow-down
      1
      ·
      1 year ago

      That’s not how these AIs work. They don’t contain verbatim copies of their training data. They get trained on terabytes of text, they couldn’t possibly remember it all.