• dan@upvote.au
      link
      fedilink
      arrow-up
      14
      ·
      1 year ago

      You should be able to fit a model like LLaMa2 in 64GB RAM, but output will be pretty slow if it’s CPU-only. GPUs are a lot faster but you’d need at least 48GB of VRAM, for example two 3090s.

      • PolarisFx@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        6
        ·
        edit-2
        1 year ago

        Amazon had some promotion in the summer and they had a cheap 3060 so I grabbed that and for Stable Diffusion it was more than enough, so I thought oh… I’ll try out llama as well. After 2 days of dicking around, trying to load a whack of models, I spent a couple bucks and spooled up a runpod instance. It was more affordable then I thought, definitely cheaper than buying another video card.

        • dan@upvote.au
          link
          fedilink
          arrow-up
          4
          ·
          1 year ago

          As far as I know, Stable Diffusion is a far smaller model than Llama. The fact that a model as large as LLaMa can even run on consumer hardware is a big achievement.

          • PolarisFx@lemmy.dbzer0.com
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            1 year ago

            I had couple 13B models loaded in, it was ok. But I really wanted a 30B so I got a runpod. I’m using it for api, I did spot pricing and it’s like $0.70/hour

            I didn’t know what to do with it at first, but when I found Simply Tavern I kinda got hooked.

    • j4k3@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      I need it just for the initial load on transformers based models to then run them in 8 bit. It is ideal for that situation