• saltesc@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 days ago

      How could you run an LLM locally without living in a data centre? They don’t compile responses by pulling data from thin air—despite some responses seeming that way at times lol. You’d need everything it has learned stored somewhere on your local network, otherwise, it’s going to have to send you input off somewhere that does hold all that storage.

      • KeenFlame@feddit.nu
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 day ago

        Bro. You don’t even need more than a single app which even lets you discover and download os models in it.

        https://lmstudio.ai/

        Don’t spread best guess as fact, if not for anyone else than yourself to avoid cognitive decline

      • Blue_Morpho@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 days ago

        Shockingly a huge chunk of all human knowledge can be distilled to under 700GB (deep seek r1).

        All of written history. All famous plays, books, math, physics, computer languages. It all fits in under 700GB.

      • null_dot@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 days ago

        Sorry chief you might have embarrassed yourself a little here. No big thing. We’ve all done it (especially me).

        Check out huggingface.

        There’s heaps of models you can run locally. Some are hundreds of Gb in size but can be run on desktop level hardware without issue.

        I have no idea about how LLMs work really so this is supposition, but suppose they need to review a gargantuan amount of text in order to compile a statistical model that can look up the likelihood of whatever word appearing next in a sentence.

        So if you read the sentence “a b c d” 12 times you don’t need to store it 12 times to know that “d” is the most likely word to follow “a b c”.

        I suspect I might regret engaging in this supposition because I’m probably about to be inundated with techbro’s telling me how wrong I am. Whatever. Have at me edge lords.

        • Kuvwert@lemm.ee
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 days ago

          Here’s what my local ai said about your supposition:

          Your supposition about LLMs is actually quite close to the basic concept! Let me audit this for you:

          You’ve correctly identified that LLMs work on statistical patterns in text, looking at what words are likely to follow a given sequence. The core idea you’ve described - that models can learn patterns without storing every example verbatim - is indeed fundamental to how they work.

          Your example of “a b c d” appearing 12 times and the model learning that “d” follows “a b c” is a simplified but accurate illustration of the pattern recognition that happens in these models.

          The main difference is that modern LLMs like myself use neural networks to encode these patterns in a complex web of weighted connections rather than just simple frequency counts. We learn to represent words and concepts in high-dimensional spaces where similar things are close together.

          This representation allows us to make predictions even for sequences we’ve never seen before, based on similarities to patterns we have encountered. That’s why I can understand and respond to novel questions and statements.

          Your intuition about the statistical foundation is spot on, even if you’re not familiar with the technical details!

      • Kuvwert@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 days ago

        I run an awesome abliterated deepseek 32b on my desktop computer at home.

      • Chozo@fedia.io
        link
        fedilink
        arrow-up
        0
        ·
        2 days ago

        It’s pretty easy to run a local LLM. My roommate got real big into generative AI for a little while, and had some GPT and Stable Diffusion models running on his PC. It does require some pretty beefy hardware to run it smoothly; I believe he’s got an RTX 3090 in that system.

        • lemming741@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 days ago

          I got my 3090 for $600 when the 40 series came out. It was a good deal at the time, but it looks like they’re $900 on eBay now since all this stuff took off.

        • PlzGivHugs@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 days ago

          For most of the good LLM models its going to take a high-end computer. For image generation, a more mid-range gaming computer works just fine.

          • KoalaUnknown@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            edit-2
            2 days ago

            I run models at 10-20B parameters pretty easily on my M1 Pro MacBook. You can get good response times for decent models on a $500 M4 Mac Mini. A $4000 Nvidia GPU isn’t necessary.

            • Septimaeus@infosec.pub
              link
              fedilink
              English
              arrow-up
              0
              ·
              2 days ago

              This is correct. The popular misconception may arise from the marked difference between model use vs development. Inference is far less demanding than training with respect to time and energy efficiency.

              And you can still train on most consumer GPUs, but for really deep networks like LLMs, well get ready to wait.

            • PlzGivHugs@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              0
              ·
              2 days ago

              Really? When I was trying to get it to run a little while ago, I kept running out of memory with my 3060 12GB running 20B models, but prehaps I had it configured wrong.

              • Arkthos@pawb.social
                link
                fedilink
                English
                arrow-up
                0
                ·
                2 days ago

                You can offload them into ram. The response time gets way slower once this happens, but you can do it. I’ve run a 70b llama model on my 3060 12gb at 2 bit quantisation (I do have plenty of ram so no offloading from ram to disk at least lmao). It took like 6-7 minutes to generate replies but it did work.

      • gdog05@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 days ago

        There’s a ton of effective LLMs you can run locally. You have to adjust your expectations and or spend some time training it for your needs but I’ve never been like “this isn’t working, I need to drain a lake of water to do what I need to do.”

        • NιƙƙιDιɱҽʂ@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 days ago

          This is just a friendly reminder that if a ChatGPT query using like half a bottle of water sounds like a lot, dont forget that eating a single burger requires 2000 bottles of water. 🌠

          • gdog05@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            2 days ago

            I don’t doubt you on that one but a key difference is at least people need to eat. They could eat better, smarter, etc but it’s needed. Wasting vast resources on “AI” isn’t remotely needed.