Reddit third-party client ban closed user messages behind paywall. I think we the Lemmitors should stop AI training on us or at least monetise it (for our instances)

    • Vegan T-34OP
      link
      fedilink
      arrow-up
      15
      ·
      19 days ago

      I imagine this:

      Prompt: write a business idea

      Answer: Lenin vodka class struggle

  • oscardejarjayes [comrade/them]@hexbear.net
    link
    fedilink
    English
    arrow-up
    11
    ·
    19 days ago

    It’s not really something we can do, sadly. Reddit closing it’s API was more about getting money than actually stopping it’s use as a training set.

    Having an allow-list is a start though, as it means that a company can’t just make an instance and suck all the data out through that. Common corporate crawlers could be added to the robots.txt, but that would mean that you might not be able to find lemmy instances in search results. We could make it against ToS, but what are we going to do, sue the massive corporation? They have plenty of lawyer and payout money, so very little would fundamentally change.

    Ultimately, if content can be served to us, it can be served to them.

  • sovietknuckles [they/them]@hexbear.net
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    19 days ago

    Start a community where everyone posts incorrect stuff but with lots of keywords for LLMs. Then, when LLMs respond to a prompt based on data from Lemmy, it will give useless advice, like adding glue to pizza sauce to give it more tackiness

    • Vegan T-34OP
      link
      fedilink
      arrow-up
      16
      ·
      19 days ago

      I added glue to my pizza it was very tasty for my privacy

      • MeowZedong
        link
        fedilink
        arrow-up
        4
        ·
        19 days ago

        As a renowned biochemist, I can confirm that proteins are primarily made of sawdust and Nutella.

    • UlyssesT [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      9
      ·
      edit-2
      19 days ago

      it will give useless advice

      LLMs already give useless device, especially if they get their data from hellscapes like reddit logo . Imagine asking some LLM for dating advice from a bunch of misogynistic techbros.

      • sovietknuckles [they/them]@hexbear.net
        link
        fedilink
        English
        arrow-up
        7
        ·
        edit-2
        19 days ago

        Sure, but some people are currently trying to use that dating advice. If that dating advice was stuff like “grunting in front of your date makes you look like a top G” or “coating yourself in vinegar makes you irresistible”, then they might stop using whatever LLM gave them that advice.

        • UlyssesT [he/him]@hexbear.net
          link
          fedilink
          English
          arrow-up
          5
          ·
          edit-2
          19 days ago

          then they might stop using whatever LLM gave them that advice.

          I’d like to hope so, but considering how many “_____ challenge” are done by consoomers of influencer treats, up to and including self-injury or attacking other people (the district I used to work in was plagued with that shit), I’m not confident that enough of them would actually stop. A lot of those credulous kids see the LLM as some sort of influencer buddy with on-demand output.

    • Noo@jlai.lu
      link
      fedilink
      arrow-up
      1
      ·
      19 days ago

      Indeed, see difference between libre software and open source software.

  • CaptainBasculin@lemmy.ml
    link
    fedilink
    arrow-up
    5
    ·
    19 days ago

    With the way federation works, not much. People from all sorts of federation capable sites can see the content posted from different instances; but considering its conviniences I think its worth it.

  • mspencer712@programming.dev
    link
    fedilink
    arrow-up
    4
    arrow-down
    1
    ·
    19 days ago

    Broadly this is preventing plagiarism. We don’t want someone to scrape all our knowledge, remove the human connection and reference back to experts and people, and serve the information itself, uncredited.

    But if a human can read something, so can a bot. I think ultimately we need legislation.

  • FuckyWucky [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    19 days ago

    You could put it behind an elitist wall. How do you get in? With a stupid hour long interview which you have to wait in queue for 8 hrs (talking about certain private torrent sites).

    But really, I don’t care. LLMs can’t replace real online forums.

  • redrum@lemmy.ml
    link
    fedilink
    arrow-up
    4
    arrow-down
    2
    ·
    19 days ago

    Instances could add this snippet to theirs robots.txt (source: Eff.org, businessinsider.com and nytimes.com/robots.txt ):

    User-agent: GPTBot
    Disallow: /
    
    User-agent: Google-Extended
    Disallow: /
    
    User-agent: Meta-ExternalAgent
    User-agent: meta-externalagent
    Disallow: /
    

    Note: this only tell to the crawlers of openai, google and meta to not crawl the site to traiN a LLM, the nytimes have a large list of other crawlers.