• RussianBot8453@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    19 days ago

    I’m a data engineer that processes 2 billion row 3000 column datasets every day, and I open shit in Excel with more than 60k rows. What the hell is this chick talking about?

    • person420@lemmynsfw.com
      link
      fedilink
      arrow-up
      0
      ·
      19 days ago

      Some interesting facts about excel I learned the hard way.

      1. It only supports about a million or so rows
      2. It completely screws up numbers if the column is a number and the number is over 15 digits long.

      Not really related to what you said, but I’m still sore about the bad data import that caused me days of work to clean up.

      • Mniot@programming.dev
        link
        fedilink
        English
        arrow-up
        0
        ·
        18 days ago

        The row limitation seems, to me, like an actually-good thing. Excel is for data where you might conceivably scroll up and down looking at it and 1M is definitely beyond the ability of a human even to just skim looking for something different.

        An older version of Excel could only handle 64k rows and I had a client who wanted large amounts of data in Excel format. “Oh sorry, it’s a Microsoft limitation,” I was thrilled to say. “I have no choice but to give you a useful summarization of the data instead of 800k rows (each 1000 columns wide) of raw data.”

  • zalgotext@sh.itjust.works
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    18 days ago

    my hard drive overheated

    So, this means they either have a local copy on disk of whatever database they’re querying, or they’re dumping a remote db to disk at some point before/during/after their query, right?

    Either way, I have just one question - why?

    Edit: found the thread with a more in-depth explanation elsewhere in the thread: https://xcancel.com/DataRepublican/status/1900593377370087648#m

    So yeah, she’s apparently toting around an external hard drive with a copy of the “multiple terabytes” large US spending database, running queries against it, then dumping the 60k-row result set to CSV for further processing.

    I’m still confused at what point the external drive overheats, even if she is doing all this in a “hot humid” hotel room that she can’t run any fans I guess because her kids were asleep?

    But like, all of that just adds more questions, and doesn’t really answer the first one - why?

  • Tiefling IRL@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    0
    ·
    19 days ago

    60k isn’t that much, I frequently run scripts against multiple hundreds of thousands at work. Wtf is he doing? Did he duplicate the government database onto his 2015 MacBook Air?

  • rumba@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    18 days ago

    Unless I’m misreading it which is possible it’s awfully late, he said he processed 60,000 rows didn’t find what he was looking for but his hard drive overheated on the full pass.

    Discs don’t overheat because there was load. Even if he f***** up and didn’t index the data correctly (I assume it’s a relational database since he’s talking about rows) The disc isn’t just going to overheat because the job is big. It’s going to be lack of air flow or lack of heatsink.

    I guarantee you he was running on an external NVMe, and one of those little shitty-ass Chinese enclosures. Or maybe one of those self immolating SanDisk enclosures. Hell, maybe he’s on a desktop and he slept a raw NVMe on his motherboard without a heatsink

    There are times when you want a brilliant college student on your team, But you need seasoned professionals to help them through the things they’ve never seen before and never done before.

  • MystikIncarnate@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    17 days ago

    IT guy checking in.

    The only time I’ve even seen drive temp sensor alarms is on server raid arrays and other similar hard drives/SSDs… Never in my life have I seen one available on a consumer device, nor have I seen any alarm for and drive temp, go off. It just doesn’t happen.

    IMO, this is one of those language barriers where people call their computer chassis (and everything in it) the “hard drive”.

    Applying that assumption, their updated statement is: His computer over heated.

    Idk what kind of shit system he’s running on that 60k rows would cause overheating, but ok.

  • GreenKnight23@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    18 days ago

    I smell something, but it’s not overheating electronics.

    I’ve processed over 5 million records on a laptop that’s almost 10 years old. it took two days to get my results.

    there’s no way 60,000 records overheated ANYTHING.

    • wewbull@feddit.uk
      link
      fedilink
      English
      arrow-up
      0
      ·
      18 days ago

      Doesn’t actually say that 60k overheated his drive. He says that he ran a run on 60k, and that he couldn’t do the whole database due to overheating. Two unrelated statements except that 60k is the lower bound for what he could process.

      Doesn’t mean he knows what he’s doing though, as pretty huge datasets are processable on quite modest hardware if you do it right.

      • GreenKnight23@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        18 days ago

        that’s somehow worse.

        a “data analyst” couldn’t cut up the work into a parallel processes and run them synchronously? what the actual fuck?

        “sorry, I can only do 60k at a time.”

        just fucking split them up into 6 parallel batch processes running 10k at a time. it’s fucking math, not rocket science. I’m not even an analyst and I could fucking do that much.

        • sniggleboots@lemm.ee
          link
          fedilink
          arrow-up
          0
          ·
          18 days ago

          I don’t want to take away from the valid point of your comment, but rocket science is almost exclusively math

  • Psaldorn@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    19 days ago

    From the same group that doesn’t understand joins and thinks nobody uses SQL this is hardly surprising .

    Probably got an LLM running locally and asking it to get data which is then running 10 level deep sub queries to achieve what 2 inner joins would in a fraction of the time.

  • ryedaft@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    19 days ago

    This sounds like trying to do stuff in Excel? The computer isn’t overheating but the amount of memory needed is very high which would make it run poorly. They might interpret that as overheating?

    • monkeyman512@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      19 days ago

      It also makes sense if they are on calling the entire computer “the hard drive” like grandma and the fans kicked on.

  • LillyPip@lemmy.ca
    link
    fedilink
    arrow-up
    0
    ·
    18 days ago

    This cannot be real, wtf. This is cartoon levels of ineptitude.

    Or sabotage by someone heading out? Please let this be resistance sabotage they haven’t noticed yet.

    • explodicle@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      19 days ago

      Literally every time someone dismisses Wikipedia, it’s because they believe something crazy that Wikipedia told them is wrong.

      • irelephant [he/him]🍭@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        ·
        19 days ago

        I checked conservapedia once, and its actually unhinged. If someone tells you to look at that, or reccommends it, they’re crazy.

  • Onno (VK6FLAB)@lemmy.radio
    link
    fedilink
    arrow-up
    0
    ·
    19 days ago

    Wow.

    I’ve been processing a couple of billion rows of data on my machine, the fans didn’t even come on. WTF are they teaching “experts” these days, or has Elmo only hired people who claim that they can “wrangle data” and say “yes” ?

    • wise_pancake@lemmy.ca
      link
      fedilink
      arrow-up
      0
      ·
      18 days ago

      60k rows is generally very usable with even wide tables in row formats.

      I’ve had pandas work with 1M plus rows with 100 columns in memory just fine.

      After 1M rows move on to something better like Dask, polars, spark, or literally any DB.

      The first thing I’d do with whatever data they’re running into issues with is rewrite it as partitioned and sorted parquet.

    • bleistift2@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      19 days ago

      Even if querying data was processing-heavy and even if somehow the ‘hard drive’ got warm during this, then there still would need to be a hardware defect in order for the drive to overheat.

      • IrateAnteater@sh.itjust.works
        link
        fedilink
        arrow-up
        0
        ·
        19 days ago

        Yes, but this may be a symptom of an issue I’ve been seeing with younger programmers; they’ve siloed themselves so specifically into whatever programming they “specialize” in, that they become absolutely useless at dealing with absolutely anything else related to their job. And exasperating this issue is the fact that they’ve grown up with systems that “just work”. Windows, iOS, and android are all at the point where fucking around with hardware issues is very uncommon for the average person.

        Asking this guy to solve a hardware problem is like asking hime to tune a carburetor. He likely has not the slightest clue how to start.

        • bleistift2@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          0
          ·
          19 days ago

          That’s the price of specialization. Don’t ask a software engineer to troubleshoot hardware. Don’t ask a backend dev to write a frontend. Don’t ask a proctologist to look at your cough.

          You simply cannot be proficient at every sub-sub-specialty. That’s why we collaborate and hand the ‘my computer gets hot’ problems to the hardware people. The alternative would be only moderately useful generalist.

          • IrateAnteater@sh.itjust.works
            link
            fedilink
            arrow-up
            0
            ·
            19 days ago

            I’m not asking everyone to be able to become a hardware specialist, but if you can’t even figure out “my computer gets hot” I’m not going to be able to trust anything you do. Identifying a heat issue does not take a rocket surgeon.

        • Snot Flickerman@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          19 days ago

          In my experience, a lot of software dev degree paths basically don’t even have relevant classes on hardware at all. Classes on hardware are all in IT Helpdesk and Network Admin degree paths whereas the software dev students are dumped straight into Visual Studio right off the bat with no relevant understanding of the underlying hardware or OS.

          • atomicbocks@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            0
            ·
            19 days ago

            My experience does not reflect yours. Computer Architecture, Discrete Math (logic gate math), and Operating System Concepts were all required classes in my CS degree from just a few years ago.

          • kryptonianCodeMonkey@lemmy.world
            link
            fedilink
            arrow-up
            0
            ·
            edit-2
            19 days ago

            My CS degree had a hardware/IT support class, but A) it was entirely simulation based. We never touched any actual hardware. We “built” PC’s or identified physical issues in 3d sim software, set up RAID arrays in software, etc. B) it was super hand holdy and you only ever go over a problem once, so nothing on the class has stuck. I know much more from having built, troubleshot and maintained my own computers and network than I ever learned from that class, then learned more by doing in an actual IT support position before becoming an engineer.

            • applebusch@lemmy.blahaj.zone
              link
              fedilink
              English
              arrow-up
              0
              ·
              18 days ago

              I mean to be fair the sheer amount of material most university engineering programs require these days makes spending significant time on specific problems almost impossible. They try to shove so much theory into your head they lose track of practical implementation. Basically everyone I went to school with complained about the lack of practical application relative to theory, and I studied mechanical engineering which is theoretically and literally chiefly concerned with hardware.

          • bleistift2@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            0
            ·
            19 days ago

            You don’t teach a farmer how an internal combustion engine works. Computers are tools to software engineers. What they need to know is how to operate them, not how to maintain them.

            • hayalci@fstab.sh
              link
              fedilink
              English
              arrow-up
              0
              ·
              19 days ago

              No, not really. Programming requires understanding of the underlying hardware, at least to a certain extent. Otherwise performance issues will look like dark magic and optimizing anything would be impossible.

              Where do you start debugging if something goes wrong with the software and your information level is this low/ do you look at network stats? CPU utilization, paging/swapping? Is the hard disk bandwidth the bottleneck? Without at least some passable understanding of a computer architecture people like this just throw up their hands, or throw whatever tricks they know at the wall and see what sticks.

            • bane_killgrind@slrpnk.net
              link
              fedilink
              English
              arrow-up
              0
              ·
              19 days ago

              What the fuck

              How is he going to fix his tractor? Wait days for John Deere to send somebody? Let the crop rot on the vine?

            • chickenf622@sh.itjust.works
              link
              fedilink
              arrow-up
              0
              ·
              19 days ago

              A lot of farmers are learning how they work cause the companies that sell them the equipment keep fucking them over. I would argue that farmers nowadays needs to know how that works along with basic programming to get past the anti-consumer bullshit companies put in to make it nigh impossible to fix things yourself.

              • KillingTimeItself@lemmy.dbzer0.com
                link
                fedilink
                English
                arrow-up
                0
                ·
                18 days ago

                doesnt matter if you know how to program, john deere is just going to put some autistic encryption and ID locking on their shit, what needs to happen is for john deere to stop fucking doing this.

                Most tractors are walking computers anyway, farmers are genuinely the most multi talented people you will ever meet in your life.

            • KnitWit@lemmy.world
              link
              fedilink
              arrow-up
              0
              ·
              19 days ago

              I’m not sure how well that analogy holds up. Farmers are usually pretty well versed in mechanical systems. To the point that now that John Deere has been screwing them over on right to repair that some farmers are even becoming versed in computer programming so they can flash the firmware on their tractors.

              • bleistift2@sopuli.xyz
                link
                fedilink
                English
                arrow-up
                0
                ·
                19 days ago

                I never said that it was impossible for a farmer to learn things outside their immediate field. Just like computer programmers often have knowledge of hardware and the general technology stack.

                My point, to make it explicit to a few of the illiterates who’ve replied to my comment so far, is that it is not necessary to teach a web developer how a goddamn CPU works. They can gain nothing from that knowledge because there are at least 3 levels of abstraction between JavaScript and assembly.

                • KillingTimeItself@lemmy.dbzer0.com
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  edit-2
                  18 days ago

                  no but a web dev should have some knowledge basis on what the ever living fuck their AIDs code fuelled by nothing but the cheapest source of caffeine and brain damage they have even does.

                  This is the entire reason why half of the internet is just broken, stupid developers who don’t know how anything works, but know how to code, making dogshit implementations of anything and everything they can get their hands on.

                  It doesn’t matter that the learning is segmented, you should STILL be learning about computer hardware and it’s architectural choices, it’s literally the reason why programming languages work the way that they do.

                • KnitWit@lemmy.world
                  link
                  fedilink
                  arrow-up
                  0
                  ·
                  edit-2
                  19 days ago

                  And my point is that the example you used does not make the point you are trying to make, but rather the opposite. I get what you’re saying, it just doesn’t apply to farmers and mechanics.

                • bane_killgrind@slrpnk.net
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  19 days ago

                  Operating your tools and being able to maintain and repair your tools are the unequivocally essential skills for everyone in every single industry.

                  If you can’t, you are not a professional.

                  The concepts of machine logic, registers/lookups/etc are essential for every programmer. If you don’t have a clear idea about how the simplest CPU functions, you don’t have any basis of understanding the abstractions in front of you, scripting in JS. Not a professional.

                • KnitWit@lemmy.world
                  link
                  fedilink
                  arrow-up
                  0
                  ·
                  19 days ago

                  No, but if a farmer’s tractor is overheating (as in the gard drive conparison), I’m sure they could diagnose it.

            • kryptonianCodeMonkey@lemmy.world
              link
              fedilink
              arrow-up
              0
              ·
              19 days ago

              Horseshit. Computers aren’t tools for a software engineer. Computers are tools to an administrator, an accountant. Computers are the sandbox you are building castles in as a software engineer. If you don’t understand the system upon which you build, its abilities and features, its limitations, it’s dependencies, you are going to make some stupid mistakes.

              You need to understand discrete mathematics as a consequence of computer computation. You need to understand parallel processing and threading for muli-core processors. You need to understand networking, package management, security vulnerabilities, etc. from different architectures and protocols. And it ALWAYS helps to understand the very basics of a computer’s functioning, from hardware, CPU architecture, machine code, assembly/low level programming, memory management, etc.

              print('Hello, World!) is day one shit for a reason. Programming language and logic is the basics. The real expertise comes from your 3rd and 4th year materials. Databases, architecture, theory of computation, discrete mathematics, networking, operating systems, compilers, etc.

              • KillingTimeItself@lemmy.dbzer0.com
                link
                fedilink
                English
                arrow-up
                0
                ·
                18 days ago

                computers are a tool to anybody who uses them?

                If you’re using a tool, it goes without saying, you should probably have at the very least, a cursory understanding of it’s function. Lest you injure yourself gravely.

    • rtxn@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      19 days ago

      I’ve read a story on the forbidden website where a “database” was a single table with a single column holding a single row that contained the actual data as a CSV blob. I’m willing to bet the muskies are not beyond such acts of genius.

      • Jo Miran@lemmy.ml
        link
        fedilink
        arrow-up
        0
        ·
        19 days ago

        There is nothing wrong with being 19-25. There’s something wrong with being wholly incompetent.

      • Kane@femboys.biz
        link
        fedilink
        arrow-up
        0
        ·
        19 days ago

        Hey! Thats offensive to 19-25 year olds, there are many who just finished college/university and are more than aware.

        They’re just role playing like in movies, with no idea of the consequences.

          • T156@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            18 days ago

            If they went into uni straight out of high school, they could. A lot of Bachelor holders would be around that age, since they start at 18.

    • Jax@sh.itjust.works
      link
      fedilink
      arrow-up
      0
      ·
      19 days ago

      You have to understand that the average Trump voter probably knows everything they know about computers from watching the ‘wacky-zaney hacker with personality issues/quirks’ “hack” into things by tippity tapping their fingies on a keyboard in your average copaganda performance.

      This is something those types of people will believe.

      • Sanctus@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        19 days ago

        You’re on the mark. I’m like Help Desk Level 2, I wouldnt even consider myself an actual wizard. The average person in my office thinks I’m Gandalf. Its scary how much these people dont know. And each one of them is out there on the internet.