• fubo@lemmy.world
    link
    fedilink
    English
    arrow-up
    84
    arrow-down
    4
    ·
    edit-2
    1 year ago

    The argument regarding the specific case of AI-generated images of real actors makes sense, but the headline overgeneralizes hugely.

    If you write a book about carpentry, and someone checks that book out from the library, reads it, learns how to do carpentry from it, and goes into the carpentry business, they do not owe you a share of their profits.

    It’s nice if they give you credit. But they do not owe you a revenue stream.

    If they are a robot, the same remains true.

    • The Snark Urge@lemmy.world
      link
      fedilink
      English
      arrow-up
      43
      arrow-down
      2
      ·
      edit-2
      1 year ago

      Corollary: if a corporation scapes the talk of the whole internet, which itself was shaped by the aggregate culture and knowledge of ten thousand years of human history, and their resultant product is an AI that can replace workers, it is morally valid to eminent domain that shit and divert its profits to a fledgling UBI program.

      Edit to add: Not a statement about how UBI should really work, just a throwaway comment about seizing means.

      • d3Xt3r@lemmy.world
        link
        fedilink
        English
        arrow-up
        15
        arrow-down
        1
        ·
        edit-2
        1 year ago

        UBI should be a government initiative, and funding for it should be collected in form of tax, irrespective of AI. Because more and more humans are getting replaced with automation and technology in general, and a lot of this being done so gradually that you don’t notice it, or think of it as a problem. Every time you saw headlines like “xx corporation has laid off hundreds/thousands of employees” in the past, had very little to do with AI, but could have to do with technology and progress in general, plus a lot of other factors. Every little new development could have a butterfly effect that’s hard to calculate.

        Neither AI, nor the loss of jobs in general, should be a factor for UBI funding. AI is just another new technological development, maybe even a disruptive one, but it’s nothing so new that we need to pick up our pitchforks against.

        As for compensating creative owners, that’s a bigger discussion on IP protection and ownership in general, and the responsibility falls upon the IP owners (and maybe appropriate laws). For instance, we’ve seen news sites, science publishers etc paywall their work, and that’s because they want to protect their work and get compensation for viewership - and this has nothing to do with AI. If people want compensation for their work, then they should take appropriate measures to protect their work, and/or come up with alternate revenue streams, if it’s impossible to paywall their work (for instance, how some youtubers choose to seek sponsorship or patreon donations). If people want to prevent their work from being stolen and redistributed, appropriate action should be taken against the persons/sites stealing their work (eg via DMCA etc). It’s not the AI’s fault for eating up copyrighted content on public sites like pastebin.com or Scribd, it’s the fault of the people uploading it.

      • FaceDeer@kbin.social
        link
        fedilink
        arrow-up
        7
        arrow-down
        1
        ·
        1 year ago

        UBI should not be dependent on its specific sources and specific destinations. It’s universal, it’s right in the name. It should be funded by a tax on the wealthy - regardless of how that wealth is obtained - and be issued to everyone.

        The goal is not to “level the playing field” so that human employees can continue to labor and companies can’t afford to hire robots to replace them. The goal is to make it so that if companies replace all their employees with robots those employees don’t have to find some other job to continue living.

    • phillaholic@lemm.ee
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      6
      ·
      1 year ago

      I’m not sure that’s a fair comparison. You wouldn’t instantly ingest that information and know it. It’s more like photocopying a book and including it in another book that you sell. It’s a paradigm shift, and I’m not sure what the answer is.

        • phillaholic@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          2
          ·
          1 year ago

          I don’t think it’s that simple. Like I said it’s a paradigm shift. It doesn’t fit into existing laws well. My point is what we consider fair use now, summarizing a book or movie by a human, is based on the limited abilities of humans. When you have AI with limitless abilities, that will change things. The same rules abs considerations may have to be rethought.

    • scarabic@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      5
      ·
      1 year ago

      Analogies to humans are not relevant, and yours is a bad one anyway. LLMs don’t read a carpentry book and then go build houses. They chew up carpentry books and spit out carpentry books.

      Your final line remains to be established in court.

      • fubo@lemmy.world
        link
        fedilink
        English
        arrow-up
        18
        arrow-down
        2
        ·
        1 year ago

        Oh sure, if a copyright holder can demonstrate that a specific work is reproduced. Not just “I think your AI read my book and that’s why it’s so good at carpentry.”

        • silence7@slrpnk.netOP
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          17
          ·
          1 year ago

          The thing is that they’re all reproduced, at least in part. That’s how these models work.

          • fubo@lemmy.world
            link
            fedilink
            English
            arrow-up
            22
            arrow-down
            1
            ·
            edit-2
            1 year ago

            Reproducing a work is a specific thing. Using an idea from that work, or a transformation of that idea, is not reproducing that work.

            Again: If a copyright holder can show that an AI system has reproduced the text (or images, etc.) of a specific work, they should absolutely have a copyright claim.

            But “you read my book, therefore everything you do is a derivative work of my book” is an incorrect legal argument. And when it escalates to “… and therefore I should get to shut you down,” it’s a threat of censorship.

            • Cylusthevirus@kbin.social
              link
              fedilink
              arrow-up
              2
              arrow-down
              8
              ·
              1 year ago

              A person reading and internalizing concepts is considerably different than an algo slurping in every recorded work of fiction and occasionally shitting out a bit of mostly Shakespeare. One of these has agency and personhood, the other is a tool.

            • silence7@slrpnk.netOP
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              12
              ·
              1 year ago

              The problem is that the LLMs (and image AIs) effectively store pieces of works as correlations inside them, occasionally spitting some of them back out. You can’t just say “it saw it” but can say “it’s like a scrapbook with fragments of all these different works”

              • fubo@lemmy.world
                link
                fedilink
                English
                arrow-up
                17
                ·
                1 year ago

                I’ve memorized some copyrighted works too.

                If I perform them publicly, the copyright holder would have a case against me.

                But the mere fact that I could recite those works doesn’t make everything that I say into a copyright violation.

                The copyright holder has to show that I’ve actually reproduced their work, not just that I’ve memorized it inside my brain.

                • silence7@slrpnk.netOP
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  arrow-down
                  11
                  ·
                  edit-2
                  1 year ago

                  The difference is that your brain isn’t a piece of media which gets copied. The AI is. So when it memorizes, it commits a copyright violation

          • FaceDeer@kbin.social
            link
            fedilink
            arrow-up
            5
            ·
            1 year ago

            No, that’s not how these models work. You’re repeating the old saw about these being “collage machines”, which is a gross mischaracterization.

      • FaceDeer@kbin.social
        link
        fedilink
        arrow-up
        5
        ·
        1 year ago

        That article doesn’t show what you think it shows. There was a lot of discussion of it when it first came out and the examples of overfitting they managed to dig up were extreme edge cases of edge cases that took them a huge amount of effort to find. So that people don’t have to follow a Reddit link, from the top comment:

        They identified images that were likely to be overtrained, then generated 175 million images to find cases where overtraining ended up duplicating an image.

        We find 94 images are extracted. […] [We] find that a further 13 (for a total of 109 images) are near-copies of training examples

        They’re purposefully trying to generate copies of training images using sophisticated techniques to do so, and even then fewer than one in a million of their generated images is a near copy.

        And that’s on an older version of Stable Diffusion trained on only 160 million images. They actually generated more images than were used to train the model.

        Overfitting is an error state. Nobody wants to overfit on any of the input data, and so the input data is sanitized as much as possible to remove duplicates to prevent it. They had to do this research on an early Stable Diffusion model that was already obsolete when they did the work because modern Stable Diffusion models have been refined enough to avoid that problem.

      • BrianTheeBiscuiteer@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        If I was to read a carpentry book and then publish my own, “regurgitating” most of the original text, then I plagiarized and should be sued. Furthermore, if I was to write a song and use the same melody as another copyrighted song I’d get sued and lose, even if I could somehow prove that I never heard the original.

        I think the same rules should apply to AI generated content. One rule I would like to see, and I don’t know if this has precedent, is that AI generated content cannot be copyrighted. Otherwise AI could truly replace humans from a creative perspective and it would be a race to generate as much content as possible.

    • Taleya@aussie.zone
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      7
      ·
      1 year ago

      AI isn’t learning how to do carpentry though. It’s simply including my work in an aggregate pool that it now claims as its own.

      • FaceDeer@kbin.social
        link
        fedilink
        arrow-up
        9
        arrow-down
        2
        ·
        1 year ago

        It is not. The AI’s model does not contain a copy of your work, there is no “aggregate pool.” AI is not some sort of magical compression algorithm that’s able to somehow crush whole images down to less than a byte of data. The only thing that it’s “including” in itself are the concepts that it learned from your work. Those are ideas, which are not copyrightable.

    • krayj@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      11
      ·
      edit-2
      1 year ago

      The difference is that when the robot reads that book, it maintains a verbatim copy of that book as part of it’s training material indefinitely and can reference and re-reference that material infinitely. That is not how it works when a human reads a book.

      The ‘copy’ that the AI retains indefinitely is a verbatim copy of the original work, and the entire point of “copyright” is to control how and where copies are used.

      Yes, there are ‘fair use’ exceptions to copyright. I don’t think you realize it, but your argument is less about whether this violates copyright (it absolutely does under the textbook definition) and more about whether there should be a fair-use exemption for AIs; you seem to think yes, I would disagree.

      I’d also argue the AI example qualifies as it as ‘derivative work’ based on the original, which STILL would require honoring copyright laws and compensating the creators of the original works. Basically, before reading the book it was just “AI”. After reading the book it has become “AI + book1”, a derivative work, and on and on and on.

      • fubo@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        ·
        edit-2
        1 year ago

        The difference is that when the robot reads that book, it maintains a verbatim copy of that book as part of it’s training material indefinitely and can reference and re-reference that material infinitely. That is not how it works when a human reads a book.

        However, that is how it works when a human memorizes a copyrighted work. If I memorize a poem, I may then reference it from my memory without further need for the original text before me. If I am an actor and learn my lines for a play, I commit them to my memory.

        Which is not an infringement.

        The infringement happens if the human performs or publishes that work; e.g. reciting that copyrighted poem or play from memory before an audience; writing that work down from memory and publishing it; etc., without a copyright license for that performance or republication.

        I suggest merely applying the same standard: infringement doesn’t happen when a work is read, indexed, scanned, etc.; it does happen if that work is then recited.

        For instance, ChatGPT currently knows the text of the Harry Potter novels, but it does not recite them when asked to do so. (Try it! It will answer questions about the text, but it will freeze up if asked to recite it; evidently because it has a filter against reciting copyrighted material.)

        • FaceDeer@kbin.social
          link
          fedilink
          arrow-up
          11
          ·
          edit-2
          1 year ago

          No, the reason ChatGPT can’t recite the text of Harry Potter verbatim is because it doesn’t actually “contain” it. It learned from it, but it doesn’t “remember” it word-for-word. There is no filter against reciting copyrighted material. Try asking it to recite a scene from a Shakespearean play, for example - that’s out of copyright and ChatGPT was almost certainly trained on it. It may be able to quote some famous lines because it’s been overfit like crazy on them (“To be or not to be” is probably everywhere on the Internet) but that’s not a verbatim chunk.

          I’ve actually experimented with this myself on my local machine, I took one of the smaller open-source models and I gave it additional training using 20 megabytes of My Little Pony fanfiction. The AI knew a lot about the fanfic afterward but it was clearly just picking up tidbits of general knowledge rather than “remembering” the whole thing.

        • TitanLaGrange@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          1 year ago

          ChatGPT currently knows the text of the Harry Potter novels, but it does not recite them when asked to do so.

          I tried that several weeks ago while discussing some details of the Harry Potter world with ChatGPT, and it was able to directly quote several passages to me to support its points (we were talking about house elf magic and I asked it to quote a paragraph). I checked against a dead-tree copy of the book and it had exactly reproduced the paragraph as published.

          This may have changed with their updates since then, and it may not be able to quote passages reliably, but it is (or was) able to do so on a couple of occasions.

      • FaceDeer@kbin.social
        link
        fedilink
        arrow-up
        7
        arrow-down
        1
        ·
        1 year ago

        That’s not how these AIs work. They don’t contain verbatim copies of their training data. They get trained on terabytes of text, they couldn’t possibly remember it all.

  • Ocelot@lemmies.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    3
    ·
    1 year ago

    If I post some work publicly on the internet (like open source code) so that an AI is able to scrape it why in the hell should I expect to get paid for it?

      • fedev@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Maybe they have to do the Twitter way and show case their work behind a registration page or even better if there could be an implementation of the robots.txt file but for ai crawlers.

        Still, there are countless of ways in which a reproduction could be leaked. I could buy a painting, which I then own, take a picture of it and upload it to a public location. Same for a book.

        But I tend to agree that is the model generates an image or text that only has traces of the original work, then no compensation should be needed.

  • tal@kbin.social
    link
    fedilink
    arrow-up
    18
    arrow-down
    1
    ·
    1 year ago

    I don’t see the argument for it. The same bar doesn’t apply to humans who train their minds on other human works.

    • AgentCorgi@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      1 year ago

      Is your work worth paying? Put your work behind a paywall if it’s that valuable. They will reach out to you.

  • asudox@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Imagine if AI humans became real and they had to work for lifetime to pay their debt to all the people’s effort and property and such that their training data was made from.

  • 👍Maximum Derek👍@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    6
    ·
    edit-2
    1 year ago

    It should also be taxed as labor.

    Edit: To clarify, I mean the company in control of the AI should pay some equivalent to income tax for using AI instead of a person.

      • scarabic@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Yep.

        “dO yOu PaY fOR boOkS?”

        It’s like… tell me you didn’t go to college without telling me you didn’t go to college.

        • l0v9ZU5Z@feddit.de
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 year ago

          I went to university but never had to buy a book. My university library offered all the books I needed and an online access to research articles by Springer, Elsevier, and so on. You can get access as a regular person without being an enrolled student.