Authors using a new tool to search a list of 183,000 books used to train AI are furious to find their works on the list.

      • just another dev@lemmy.my-box.dev
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        4
        ·
        9 months ago

        Fair use is any copying of copyrighted material done for a limited and “transformative” purpose, such as to comment upon, criticize, or parody a copyrighted work.

        I don’t see why it should.

        • FaceDeer@kbin.social
          link
          fedilink
          arrow-up
          7
          arrow-down
          1
          ·
          9 months ago

          The creation of the AI model is transformative. The AI’s model does not contain a literal copy of the copyrighted work.

          • just another dev@lemmy.my-box.dev
            link
            fedilink
            English
            arrow-up
            6
            arrow-down
            5
            ·
            9 months ago

            No, but the training data does contain a copy. And making a model is not criticising, commenting upon, or creating a parody of it.

            • FaceDeer@kbin.social
              link
              fedilink
              arrow-up
              6
              arrow-down
              1
              ·
              9 months ago

              That list is not exclusive, it’s just a list of examples of fair use.

              The training data is not distributed with the AI model.

              • just another dev@lemmy.my-box.dev
                link
                fedilink
                English
                arrow-up
                6
                arrow-down
                2
                ·
                edit-2
                9 months ago

                it’s just a list of examples of fair use.

                Yes, it’s a list of quite similar ways of commenting upon a work. Please explain how training an LLM is like any of those things, and thus, how Fair use would apply.

                • FaceDeer@kbin.social
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  9 months ago

                  I’m not saying that training an LLM is like any of those things. I’m saying it doesn’t have to be like those things in order for it to still be fair use.

                • FontMasterFlex@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  4
                  ·
                  9 months ago

                  It’s not. The humans that trained it (assumably) purchased the material used to train it. What’s the problem?

                  • BURN@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    3
                    arrow-down
                    1
                    ·
                    9 months ago

                    The use of the material to create a commercial product as well as the reality being that the humans training it never buy the data on an individual level.

    • lloram239@feddit.de
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      edit-2
      9 months ago

      Authors Guild, Inc. v. Google, Inc. decided that it is fair use to scan books and make large parts of them available verbatim on the net. What AI does is far more transformative than that, as very little of a book can be reproduced verbatim with AI (e.g. popular quotes), you really just get “knowledge” from the books. The sources are however lost in the process, unlike with Google, which by itself however also makes it difficult to argue for copyright violation, since you can’t point at what was actually copied.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      9 months ago

      The training argument is probably going to come up dry by the time the court works its way through expert testimony, as the underlying argument for training as infringement is insane.

      But where OpenAI is probably in hot water is that torrenting 100k books in the first place runs afoul of existing copyright legislation.

      Everyone is debating the training in these suits, but the real meat and potatoes is going to be the initial infringement of obtaining the books, not how they were subsequently used.