- cross-posted to:
- programmerhumor@lemmy.ml
- cross-posted to:
- programmerhumor@lemmy.ml
The right one is after 10+ hours of debugging.
It’s WYSIWYG all over again…
I personally find copilot is very good at rigging up test scripts based on usings and a comment or two. Babysit it closely and tune the first few tests and then it can bang out a full unit test suite for your class which allows me to focus on creative work rather than toil.
It can come up with some total shit in the actual meat and potatoes of the code, but boilerplate stuff like tests it seems pretty spot on.
I believe that, because test scripts tend to involve a lot of very repetitive code, and it’s normally pretty easy to read that code.
Still, I would bet that out of 1000 tests it writes, at least 1 will introduce a subtle logic bug.
Imagine you hired an intern for the summer and asked them to write 1000 tests for your software. The intern doesn’t know the programming language you use, doesn’t understand the project, but is really, really good at Googling stuff. They search online for tests matching what you need, copy what they find and paste it into their editor. They may not understand the programming language you use, but they’ve read the style guide back to front. They make sure their code builds and runs without errors. They are meticulous when it comes to copying over the comments from the tests they find and they make sure the tests are named in a consistent way. Eventually you receive a CL with 1000 tests. You’d like to thank the intern and ask them a few questions, but they’ve already gone back to school without leaving any contact info.
Do you have 1000 reliable tests?
Offtopic: But when I was a kid, I was obsessed with the complex subway rail system in NYC, I keep trying to draw and map it out.
OpenTTD is a good game.
When did you get diagnosed?
He’s got that ol’ New York City Metropolitan Area Transit Authority Blues again, momma!
The key is identifying how to use these tools and when.
Local models like Qwen are a good example of how these can be used, privately, to automate a bunch of repetitive non-determistic tasks. However, they can spot out some crap when used mindlessly.
They are great for skett hing out software ideas though, ie try a 20 prompts for 4 versions, get some ideas and then move over to implementation.
God, seriously. Recently I was iterating with copilot for like 15 minutes before I realized that it’s complicated code changes could be reduced to an
if
statement.AI can’t imagine an image full glass of wine because there are barely any images of that in any dataset out there. AI can’t think, just massage it’s dataset into something vaguely plausible.
I don’t understand how build times magically decrease with AI. Or did they mean built?
They mean time to write the code, not compile time. Let’s be honest, the AI will write it in Python or Javascript anyway
Not to be that guy, but the image with all the traintracks might just be doing it’s job perfectly.
That’s the problem. Maybe it is.
Maybe the code the AI wrote works perfectly. Maybe it just looks like how perfectly working code is supposed to look, but doesn’t actually do what it’s supposed to do.
To get to the train tracks on the right, you would normally have dozens of engineers working over probably decades, learning how the old system worked and adding to it. If you’re a new engineer and you have to work on it, you might be able to talk to the people who worked on it before you and find out how their design was supposed to work. There may be notes or designs generated as they worked on it. And so-on.
It might take you months to fully understand the system, but whenever there’s something confusing you can find someone and ask questions like “Where did you…?” and “How does it…?” and “When does this…?”
Now, imagine you work at a railroad and show up to work one day and there’s this whole mess in front of you that was laid down overnight by some magic railroad-laying machine. Along with a certificate the machine printed that says that the design works. You can’t ask the machine any questions about what it did. Or, maybe you can ask questions, but those questions are pretty useless because the machine isn’t designed to remember what it did (although it might lie to you and claim that it remembers what it did).
So, what do you do, just start running trains through those tracks, assured that the machine probably got things right? Or, do you start trying to understand every possible path through those tracks from first principles?
Engineers love moving parts, known for their reliability and vigor
Vigor killed me
Might is the important here
It gives you the right picture when you asked for a single straight track on the prompt. Now you have to spend 10 hours debugging code and fixing hallucinations of functions that don’t exist on libraries it doesn’t even neet to import.
Not a developer. I just wonder about AI hallucinations come about. Is it the ‘need’ to complete the task requested at the cost of being wrong?
No, it’s just that it doesn’t know if it’s right or wrong.
How “AI” learns is they go through a text - say blog post - and turn it all into numbers. E.g. word “blog” is 5383825526283. Word “post” is 5611004646463. Over huge amount of texts, a pattern is emerging that the second number is almost always following the first number. Basically statistics. And it does that for all the words and word combinations it found - immense amount of text are needed to find all those patterns. (Fun fact: That’s why companies like e.g. OpenAI, which makes ChatGPT need hundreds of millions of dollars to “train the model” - they need enough computer power, storage, memory to read the whole damn internet.)
So now how do the LLMs “understand”? They don’t, it’s just a bunch of numbers and statistics of which word (turned into that number, or “token” to be more precise) follows which other word.
So now. Why do they hallucinate?
How they get your question, how they work, is they turn over all your words in the prompt to numbers again. And then go find in their huge databases, which words are likely to follow your words.
They add in a tiny bit of randomness, they sometimes replace a “closer” match with a synonym or a less likely match, so they even seen real.
They add “weights” so that they would rather pick one phrase over another, or e.g. give some topics very very small likelihoods - think pornography or something. “Tweaking the model”.
But there’s no knowledge as such, mostly it is statistics and dice rolling.
So the hallucination is not “wrong”, it’s just statisticaly likely that the words would follow based on your words.
Did that help?
Full disclosure - my background is in operations (think IT) not AI research. So some of this might be wrong.
What’s marketed as AI is something called a large language model. This distinction is important because AI implies intelligence - where as a LLM is something else. At a high level LLMs are using something called “tokens” to break apart natural language into elements that a machine can understand, and then recombining those tokens to “create” something new. When a LLM is creating output it does not know what it is saying - it knows what token statistically comes after the token(s) it has generated already.
So to answer your question. An AI can hallucinate because it does not know the answer - its using advanced math to know that the period goes at the end of the sentence. and not in the middle.
While being more complex and costly to maintain
Depends on the usecase. It’s most likely at a trainyard or trainstation.
The image implies that the track on the left meets the use case criteria
The one on the right prints “hello world” to the terminal
And takes 5 seconds to do it
I think I would more picture planes taking off those railroads when it comes to AI. It tends to hallucinate API calls that don’t exist. if you don’t go check the docs yourself you will have a hard time debugging what went wrong.
It depends. AI can help writing good code. Or it can write bad code. It depends on the developer’s goals.
It depends. AI can help writing good code. Or it can write bad code
I’ll give you a hypothetical: a company is to hire someone for coding. They can either hire someone who writes clean code for $20/h, or someone who writes dirty but functioning code using AI for $10/h. What will many companies do?
Many companies chose cheap coders over good coders, even without AI. Companies I heard of have pretty bad code bases, and they don’t use AI for software development. Even my company preferred cheap coders and fast development, and the code base from that time is terrible, because our management didn’t know what good code is and why it’s important. For such companies, AI can make development even faster, and I doubt code quality will suffer.
My goal is to write bad code
LLMs can be great for translating pseudo code into real code or creating boiler plate or automating tedious stuff, but ChatGPT is terrible at actual software engineering.
Honestly I just use it for the boilerplate crap.
Fill in that yaml config, write those lua bindings that are just a sequence of lua_pushinteger(L, 1), write the params of my do string kind of stuff.
Saves me a ton of time to think about the actual structure.
I gave it a harder software dev task a few weeks ago… Something that is not answered on the internet… It was as clueless as me, but compared to me, it made up shit that could never work.
OTOH humans did design the tracks in both images.
And then 12 hours spent debugging and pulling it apart.
And it still doesn’t work. Just “mostly works”.
A bunch of superfluous code that you find does nothing.
We’re in trouble when it learns to debug.
But then, as now, it won’t understand what it’s supposed to do, and will merely attempt to apply stolen code - ahem - training data in random permutations until it roughly matches what it interprets the end goal to be.
We’ve moved beyond a thousand monkeys with typewriters and a thousand years to write Shakespeare, and have moved into several million monkeys with copy and paste and only a few milliseconds to write “Hello, SEGFAULT”
And if you need anything else, you have to use a new prompt which will generate a brand new application, it’s fun!
That’s not really how agentic ai programming works anymore. Tools like cursor automatically pick files as “context”, and you can manually add them or the whole ckdebase as well. That obviously uses way more tokens though.
Im looking forward in the next 2 years when AI apps are in the wild and I get to fix them lol.
As a SR dev, the wheel just keeps turning.
I’m being pretty resistant about AI code Gen. I assume we’re not too far away from “Our software product is a handcrafted bespoke solution to your B2B needs that will enable synergies without exposing your entire database to the open web”.
Our gluten-free code is handcrafted with all-natural intelligence.
It has its uses. For templeting and/or getting a small project off the ground its useful. It can get you 90% of the way there.
But the meme is SOOO correct. AI does not understand what it is doing, even with context. The things JR devs are giving me really make me laugh. I legit asked why they were throwing a very old version of react on the front end of a new project and they stated they “just did what chatgpt told them” and that it “works”. Thats just last month or so.
The AI that is out there is all based on old posts and isnt keeping up with new stuff. So you get a lot of the same-ish looking projects that have some very strange/old decisions to get around limitations that no longer exist.
Holdup! You’ve got actual, employed, working, graduated juniors who are handing in code that they don’t even understand?
The AI also enabled some very bad practices.
It does not refactor and it makes writing repetitive code so easy you miss opportunities to abstract. In a week when you go to refactor you’re going to spend twice as long on that task.
As long as you know what you’re doing and guide it accordingly, it’s a good tool.
Yeah, I think personally LLMs are fine for like writing a single function, or to rubber duck with for debugging or thinking through some details of your implementation, but I’d never use one to write a whole file or project. They have their uses, and I do occasionally use something like ollama to talk through a problem and get some code snippets as a starting point for something. Trying to do too much more than that is asking for problems though. It makes it way harder to debug because it becomes reading code you haven’t written, it can make the code style inconsistent, and a non-insignifigant amount of the time even in short code segments it will hallucinate a non existent function or implement something incorrectly, so using it to write massive amounts of code makes that way more likely.
The CursoAI debugging is the best experience ever.
It’s so much easier than googling don’t stack trace and then browsing GitHub issues and stack overflow.
without exposing your entire database to the open web until well after your payment to us has cleared, so it’s fine.
Lol.
You can get decent results from AI coding models, though…
…as long as somebody who actually knows how to program is directing it. Like if you tell it what inputs/outputs you want it can write a decent function - even going so far as to comment it along the way. I’ve gotten O1 to write some basic web apps with Node and HTML/CSS without having to hold its hand much. But we simply don’t have the training, resources, or data to get it to work on units larger than that. Ultimately it’d have to learn from large scale projects, and have the context size to be able to hold if not the entire project then significant chunks of it in context and that would require some very beefy hardware.
and only if you’re doing something that has been previously done and publically released
Generally only for small problems. Like things lower than 300 lines of code. And the problem generally can’t be a novel problem.
But that’s still pretty damn impressive for a machine.
But that’s still pretty damn impressive for a machine.
Yeah. I’m so dang cranky about all the overselling, that how cool I think this stuff is often gets lost.
300 lines of boring code from thin air is genuinely cool, and gives me more time to tear my hair out over deployment problems.