A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you’d need a dual socket EPYC server motherboard with 768GB of RAM.

  • 小莱卡
    link
    fedilink
    English
    arrow-up
    10
    ·
    5 months ago

    Brb gotta start my FundMe campaign for one of these servers lol

    • FuckBigTech347
      link
      fedilink
      arrow-up
      3
      ·
      5 months ago

      DW. in like two years from now, companies will start throwing out similar machines. Just keep an eye on second-hand markets and dumpsters.