• Chozo@fedia.io
    link
    fedilink
    arrow-up
    0
    ·
    2 days ago

    It’s pretty easy to run a local LLM. My roommate got real big into generative AI for a little while, and had some GPT and Stable Diffusion models running on his PC. It does require some pretty beefy hardware to run it smoothly; I believe he’s got an RTX 3090 in that system.

    • lemming741@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 days ago

      I got my 3090 for $600 when the 40 series came out. It was a good deal at the time, but it looks like they’re $900 on eBay now since all this stuff took off.

    • PlzGivHugs@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 days ago

      For most of the good LLM models its going to take a high-end computer. For image generation, a more mid-range gaming computer works just fine.

      • KoalaUnknown@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        2 days ago

        I run models at 10-20B parameters pretty easily on my M1 Pro MacBook. You can get good response times for decent models on a $500 M4 Mac Mini. A $4000 Nvidia GPU isn’t necessary.

        • Septimaeus@infosec.pub
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 days ago

          This is correct. The popular misconception may arise from the marked difference between model use vs development. Inference is far less demanding than training with respect to time and energy efficiency.

          And you can still train on most consumer GPUs, but for really deep networks like LLMs, well get ready to wait.

        • PlzGivHugs@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 days ago

          Really? When I was trying to get it to run a little while ago, I kept running out of memory with my 3060 12GB running 20B models, but prehaps I had it configured wrong.

          • Arkthos@pawb.social
            link
            fedilink
            English
            arrow-up
            0
            ·
            2 days ago

            You can offload them into ram. The response time gets way slower once this happens, but you can do it. I’ve run a 70b llama model on my 3060 12gb at 2 bit quantisation (I do have plenty of ram so no offloading from ram to disk at least lmao). It took like 6-7 minutes to generate replies but it did work.