The rest of the internet is bad, like real fucking bad, this place and lemmygrad and trueanon (when they're not saying shitty stuff) are like the only good normal healthy places left online.

SorosFootSoldier [he/him, they/them]@hexbear.net · 6 months ago

The rest of the internet is bad, like real fucking bad, this place and lemmygrad and trueanon (when they're not saying shitty stuff) are like the only good normal healthy places left online.

m532 · 6 months ago

In diffusion, this has already been done. Most models that were made after SD1.5 have a “handpicked” input dataset. I guess its because most of SD1.5 input had garbage quality, which transferred over to the output.

Carl [he/him]@hexbear.net · 6 months ago

I have to check that out at some point, models like Gemini and GPT take up all the space in the room and it’s easy to forget that there’s others

piccolo [any]@hexbear.net · 6 months ago

The other person was talking about image generation models, not LLMs. I think that the only LLMs with super curated input sets are tiny and less useful. Unfortunately it takes a lot of data for LLMs to be trained so it’s hard to find enough good quality data if you’re curating it.