• ☆ Yσɠƚԋσʂ ☆OP
      link
      fedilink
      arrow-up
      12
      ·
      1 year ago

      It’s a way to run models on your local machine and provide an API that’s compatible with OpenAI that can be used by apps that normally rely on that.

      • Kultronx
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        what is the advantage of doing something like this? i am a layperson

        • ☆ Yσɠƚԋσʂ ☆OP
          link
          fedilink
          arrow-up
          6
          ·
          1 year ago

          Privacy and ability to generate content you want. Using commercial services like OpenAI means your data is sent to their servers, so anything you query is known to the company, and their models are often restricted in terms of content they will allow you to generate. For example, Google’s Gemini will refuse to deal with many political subjects.

      • JucheStalin
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        Hm so it downloads fixed models and works without an internet connection? Interesting.

        • ☆ Yσɠƚԋσʂ ☆OP
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          Right, you can download any publicly available model and run it without using the internet. Caveat is that you do need a relatively fast machine to make it performant.

          • FuckBigTech347
            link
            fedilink
            arrow-up
            3
            ·
            1 year ago

            For reference the oldest card I have that Vulkan supports is an RX 560 that I bought in 2017 (I’m on GNU/Linux w/ amdgpu and the RADV mesa driver aka. “The Default”). Most medium models on it run at around 6 - 10 Tokens/s. Some crawl to below 6 Tokens/s though and become slower the longer the answer they output is, probably because parts of the model is in RAM since that card has “only” 4GB of VRAM. Models that fully fit in VRAM are a lot faster.

            • KrasnaiaZvezda
              link
              fedilink
              arrow-up
              1
              ·
              1 year ago

              I can run Qwen 2.5 Coder 14B Q4_k_m on CPU at only a little above 1 t/s but it’s worth it when I just want to have it look at whatever code I have without disclosing it with corporations that don’t have my best interests in mind.