Just putting that out there. While we might have struggle sessions over bullshit, the larger internet zeitgeist is putrid and rancid.

  • m532
    link
    fedilink
    English
    arrow-up
    12
    ·
    4 months ago

    In diffusion, this has already been done. Most models that were made after SD1.5 have a “handpicked” input dataset. I guess its because most of SD1.5 input had garbage quality, which transferred over to the output.

    • Carl [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      8
      ·
      4 months ago

      I have to check that out at some point, models like Gemini and GPT take up all the space in the room and it’s easy to forget that there’s others

      • piccolo [any]@hexbear.net
        link
        fedilink
        English
        arrow-up
        5
        ·
        4 months ago

        The other person was talking about image generation models, not LLMs. I think that the only LLMs with super curated input sets are tiny and less useful. Unfortunately it takes a lot of data for LLMs to be trained so it’s hard to find enough good quality data if you’re curating it.