Rendered at 19:12:04 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
jdlshore 24 hours ago [-]
Carson’s experience matches mine: AI is good at analysis and boilerplate, but not good at the kind of critical thinking necessary for good designs. If it were human, I would say that it jumps to solutions to quickly, rather than stepping back to consider the big picture and how everything should fit together to make a cohesive whole.
It’s not human, of course, and I think this problem actually relates to the fact that LLMs don’t have a world model. They don’t study and think through a design in the way that humans do. They don’t form a mental model of how everything fits together and how that design can be tweaked to most elegantly support a change.
I suspect that this is a fundamental limitation of LLMs, and that design will remain a weak point until some sort of bespoke design AI is bolted onto the side. In the meantime, we’ve got a lot of people producing a lot of code very quickly, and I think the debt in that code is going to be a millstone around our necks for a long time to come.
stymaar 13 hours ago [-]
> I suspect that this is a fundamental limitation of LLMs
I suspect there's also a strong sociological bias at play: LLMs are being made by people who are familiar with coding but aren't software engineers. So they design their RL policies around the idea that the LLM must learn how to code, not that they must learn to design a maintenable piece of software.
jacobedawson 12 hours ago [-]
I feel as though that world model strongly correlates with memory - the experience of having jumped to a conclusion early and full-steaming ahead, only to be bitten by constraints and problems later down the track.
Part of that is critical thinking and projecting forward / simulating potential issues, and part of that is that memory which in humans we probably would see as "wisdom".
I don't know if that's a fundamental limitation of LLMs, or, rather, that this can be solved moving forward with better memory systems, harnesses, and context windows.
mewpmewp2 10 hours ago [-]
Yeah, I think it's more so learning from experiences that didn't scale. E.g. I started out with Notepad, and wrote everything for the website I wanted to build in a single large massive php file. I of course don't do that anymore, but it was a step by step iterative progress to move to where I am now. Although I still miss how easy it was to see changes locally, deploy to prod quickly, and make hotfixes in prod. I sometimes think maybe I should go back to php.
recroad 19 hours ago [-]
Have to disagree with this as it's excellent at helping you wide and broad before converging. I suggest trying OpenSpec and use /ospx:explore to state your problem and go from there.
pgwhalen 17 hours ago [-]
These takes Arne necessarily incompatible. It can be a great tool for helping you do this kind of big picture design, but still need you as a guide and taste-maker to get to a good end result.
rst 21 hours ago [-]
One partial mitigation is to ask it to use plan mode -- and then very carefully review the plan before allowing it to execute.
jdlshore 17 hours ago [-]
My experience with AI plans is that they’re a wall of text that’s very hard to extract meaning from. Combined with it not doing a good job to begin with, I don’t think plan+revise is a great use of time.
weakfish 16 hours ago [-]
I feel the same way. Maybe it’s the ADHD, maybe I’m just dumb, but I cannot parse well the giant walls they tend to produce.
carljungslabtek 15 hours ago [-]
It’s melting my brain to read them all day. Our merge request descriptions are a mile long and so dense with jargon that it’s very difficult to figure out the important part of the changes.
They turned the english language into enterprise java and my train of thought is now a series of NullPointerExceptions
clates 4 hours ago [-]
> They turned the english language into enterprise java
.... tell them not to do that if you don't like it?
"PR Descriptions must explain the entirety of the PR's contents in 300 characters or less and be written at no greater than a 600 lexile score. After writing the description, carefully review it's claims against the changeset diff if any staged changes are unable to be tracked back to a claim in the PR description, reject the creation and alert the user of the discrepancy offering solutions on how to remediate"
fugaziboutit 12 hours ago [-]
An LLM conversation is like handling clay. When I don't grok an answer I mold the LLM's approach to fit my level of mastery of the subject. It's one of the few interactions you can have in life where you can tell someone how to talk to you without considering how they feel about being ordered around.
mjfisher 12 hours ago [-]
That's interesting and actually the opposite of mine. I wonder if it's stack or methodology dependant? For reference I'm usually using cursor and opus4.6 and for a bigger piece of work:
- Start in ask mode - "I'm planning on doing X to achieve Y; are there any alternative approaches? What problems might I run into?"
- Chat for a bit and get the high level approach, switch to plan mode and ask for a nicely formatted plan
- What's kicked out is already in the rough shape of the discussion so far, so it's a case of following a nicely formatted doc through and highlighting sections of text and asking for clarification or changes
- Hitting "build" and then reviewing what's been done
For a new service I might spend an hour in ask/plan mode - but then it gets 95% of the build itself right first time.
Do you do the same with different results, or is there a different stack/methodology you go through?
knollimar 9 hours ago [-]
I get a lot of this in design docs every time I give it a negative constraint:
[Suboptimal choice]
And here's why it's not suboptimal -- you said X sucks and notto do X, but this choice is not technically X, it's just really similar and shares that sucky property.
bob1029 19 hours ago [-]
I've been in a lot of situations where I could step gpt5.x through a big refactor if I spoon feed it one type name at a time. If I let it try to do the whole thing at once it will refuse or get stuck in apply patch loops.
Planner / executor separation can make a huge difference in performance. LLMs are fantastic at coming up with a lot of elaborate narratives regarding what should be done. They are terrible about doing that prescribed work all at once. This impedance mismatch is best resolved with a simple role separation. Placing a shared collection of tasks between these roles is how you can decouple them. The executors need significantly more tokens than your planners to get the job done. It's probably in the range of 10-100x more for really complicated jobs with a lot of iterations through compiler feedback, sql provider errors, etc. This is why you can't do both things in the same context very well.
saagarjha 21 hours ago [-]
At that point I would rather just write the plan myself
altmanaltman 14 hours ago [-]
Okay but that means you already know the plan since you are qualified to review it. So why not just tell it the plan yourself (0-shot) vrs having it guess and you review multiple times (n-shot). Wouldn't the former be more effective everytime?
fatata123 7 hours ago [-]
[dead]
oulipo2 21 hours ago [-]
Exactly, LLM is good at "code inpainting" : define clear structures and goals, and it will fill the boilerplate. But it doesn't work for reasoning and abstraction, so it fails to synthesise and propose novel views. But that's integral to the way it's designed and has been trained, to do a kind of "averaging" which limits it's capacity to explore novel designs
thunky 20 hours ago [-]
> But it doesn't work for reasoning and abstraction, so it fails to synthesise and propose novel views
I disagree. Have a conversation with it about your problem and work through design decisions with it. When I do that, I find it gives me a lot of good ideas.
Disclaimer: I'm not working on anything groundbreaking (like most people)
appplication 15 hours ago [-]
I find I don’t necessarily need or want AI to give me ideas, but I would agree having a conversational back and forth generally yields decent results.
I have found being Socratic in my questions, and trying to get the AI to arrive at my intended design via such conversations supplies the right level of context for properly solving the problem. It’s token intensive, without a doubt, but I find the result is the AI tends to be better equipped to handle the many micro decisions that need to be made along the way.
The contrast to this is I give it a detailed prompt where it then asks questions of me, which also generally works but I find the AI tends to not be as well equipped for decisions it needs to make mid implementation.
It’s not perfect, and maybe not even a good fit for some. I also never know what to think when people tell me their idiosyncratic ways of using AI. Ultimately I think the most effective way is whatever lets you translate the vision in your head into the end result.
altmanaltman 14 hours ago [-]
Sure but you can also google your problem and check what is industry standard/what is the correct way to do things (imo in less time than it takes to go through a conversation).
But the problem is that when you ask ai to solve a problem on its own, its default plan can suck. You can mitigate that by research and context but it doesn't mean the initial problem is solved. But even that requires skill and human judgement (both ai conversation research or traditional research) and a lot of people want to skip that entirely.
sublinear 17 hours ago [-]
Yes, but "good ideas" compared to what? If you were aware of the better alternatives, you probably wouldn't be discussing those details with an LLM. You'd find that it just randomly gave you one. It might work, but you don't know how well until you're already entrenched.
Nobody knows everything, so of course LLMs can be useful sometimes. More useful than plain old search, books, or even discussion with real humans? Maybe.
Search can offer a much broader context than an LLM hyperfocused on just generating text. Books may lead you to realize you were asking the wrong questions. Discussions will provide an overall "vibe" of the topic.
These are not competing options. We can and should be using all of them when possible.
thunky 8 hours ago [-]
> Yes, but "good ideas" compared to what? If you were aware of the better alternatives, you probably wouldn't be discussing those details with an LLM
Even when I already have a good idea of how I plan to do something, I may still ask AI and then find it gave me better idea for some particular thing.
I liken it to using GPS even when you know the route like the back of your hand. It can still steer you around an accident.
To do this effectively I have to drop the idea that I always know better than it does.
Zababa 6 hours ago [-]
I don't think this problem is related to the fact that they don't have a world model, or because they don't form a mental model of how everything fits together, or a fundamental limitation of LLMs. These claims are often meaningless, and the boring answer is usually something like "software architecture is harder to verify than code/maths so RLing on it is harder, and it's harder writing good evals/benchmarks for it".
epolanski 11 hours ago [-]
In my experience harness can do wonders to improve this.
Instead of asking it to generically to analyze and do X, you can use brainstorming skills like those from superpowers [1].
This makes it approach the problem better and keeps you in the loop.
Another step is then to have it review its plans by another LLM acting doing adversarial review. I have a claude skill [2] that calls codex to do it, and they chat among each other.
It's just because not enough people had this very specific problem before.
This article will be part of the next model training set, and probably it will be able to solve it despite not understanding anything about world or not studying or thinking.
recursivedoubts 1 days ago [-]
hello all, this is an article I wrote up on my interaction with an agent, Claude, in fixing a bug in the hyperscript parser
it was a rather mundane bug, but i thought the interaction was interesting and worth analyzing to show where AI is very strong and where it is not as strong
AloysB 20 hours ago [-]
I very much love your work Carson, it has always been and remain a fresh breath of air.
The example is mundane but to the point; and I very much enjoyed this article. It's a concrete example which is rare to read when it comes to using LLMs.
To the risk of being told that we "hold it wrong", it resonates with my experience of using LLMs.
hugeBirb 22 hours ago [-]
Always exciting to see a former professor on the front page and always an enjoyable read Mr. Gross!
Michelangelo11 9 hours ago [-]
> Technical debt, I assert without evidence1, grows exponentially, and therefpre it is very important to minimize it in your projects.
This actually seems like a really important idea absolutely deserving of its own blog post.
I'd have to think about the exact argument for why this feels so right, but the kernel would go something like this: whatever you build on those parts of the codebase where you have technical debt incurs new technical debt, because you're building on top of abstractions you'll remove later. The reason you have to remove the new abstractions, too, is that abstractions are like puzzle pieces: their structure determines which other abstractions they can connect with. So, as a rule (there are some exceptions), you can't take out one bad part, replace it with another, and leave everything around it untouched.
And, of course, it's easier to build on top of something creaky but currently serviceable than it would be to first rip that out and replace it, so that's what you do in most cases ... and the whole codebase gets more creaky and less serviceable; you increase the amount of abstractions you'd have to rip out and replace before building something new. The problem does, indeed, grow exponentially.
The argument is free to a good home -- I don't have the time for a full, meticulous elaboration, but I'd love to read one if someone is interested in making it.
rng-concern 5 hours ago [-]
Ward Cunningham, who coined technical debt, describes it as having interest, which is exponential:
> Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite.... The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.
Nemi 6 hours ago [-]
I agree, it is an interesting thing to ponder. I often phrased it to myself that the cost of technical debt compounds the lower in the code stack you go.
Said another way, tech debt has a multiplicative factor the farther away from the end user you get. Tech debt in the database is worse than in the data layer. It is worse in the data layer than in the business logic. It is worse in the business logic than in the UI code. etc.
This is related to the fact that it gets exponentially more difficult to refactor code the farther away you get from the end user. Changing the database is usually more difficult and impacts more things than the data layer code. And on and on we go back up.
wiremine 21 hours ago [-]
It's a good write up, but it's lacking some details, the most important one is: which Claude model was used?
The second issue is: what was tooling and the prompt approach?
(To be clear, I have no problem with the premise of the write up. But without some details like this, it's sort of like saying "I had a bad board on my deck, and my tape measure wasn't able to help me remove the nails. What a bad tape measure."
recursivedoubts 19 hours ago [-]
Opus 4.whatever (it was last week) via a command line interface in the IntelliJ Claude plugin.
The series of prompts weren't particularly interesting or innovative on my part: a paste in of the user report then a few back and forths on fixing it, me reviewing the changes and coming up with the final answer.
wiremine 6 hours ago [-]
It might feel interesting, but it's sort of the crux of the issue. Average or below average prompts will produce average below-average results. The model can't make up for that.
Not saying every problem can or should be solved but AI, but mastery of the tools is kind of important when evaluating the tools. It's like complaining that vi or emacs is slow to use because of the bindings are complicated.
recursivedoubts 5 hours ago [-]
idk i think i'm pretty good w/AI in general, e.g. designed these using it:
It's more like asking what editor and keyboard layout they use. Highly relevant to the user but you should simply assume someone describing work is using a setup for it they find productive. If you decide to dismiss their output it wouldn't be over these details.
rolisz 12 hours ago [-]
Not quite. Model is extremely important in the quality of results. Harness can also influence things.
thorum 1 days ago [-]
Interesting read! Creating tests is highlighted as something Claude did well, but it strikes me that all the weaker rejected solutions could have been avoided if it were really good at designing intelligent tests for itself. For example, the first solution “was very specific to the reported bug and wouldn’t have fixed the general case” and the third suggestion “prevented the perfectly valid use of as conversion expressions in go commands as well”. I imagine both of these cases could have been noticed and avoided by the agent if it had planned out adequate tests ahead of time.
rapind 21 hours ago [-]
This is kind of what coding with LLMs feels like. Gradually increase guard rails "outside of it's context (automated)" to get the results you want out of it. Static typing, quick compilation, not having nulls, and lints are a great start (I would also argue for managed side effects and functional, but to each their own).
It gets pretty far to the solution on it's own and quickly, but then you spend time adjacent to the problem, building out it's cage while iterating through the remainder of the solution.
piskov 19 hours ago [-]
As humans we have a concept of viscosity. That resistance, like being in quicksand or a swamp, is how you “easily” identify a code smell, something that needs to be refactored, etc. Part of it is human laziness, part of it some concept of elegance, an itch of being not quite tidy as it can be, etc.
LLM, being a tiresome little helper, will gladly output hundreds of lines, hacks, and what have you.
I don’t think any amount of tests, prompts, harnesses and other “my shaman is a better shaman” will help it to acquire this trait. Some other AI architecture someday maybe — just not today.
And that’s why it is good at what it is and really bad at stuff like code “design” (unless it is a well-known solution being baked in the training set)
AloysB 16 hours ago [-]
> “my shaman is a better shaman”
This made me chuckle. I will steal this from you.
nok22kon 13 hours ago [-]
have you tried asking?
I've used with great success prompts like "when implementing this feature, did you encounter sections of code that were needlessly complex, that were making it hard for you to work? what would you change in the design/architecture to make it leaner?"
piskov 10 hours ago [-]
Everything is just one more prompt away, I swear — literally like a gambling addict with a slot machine.
You forgot the premise of the article and why the proposed solutions were not good. It was not the complexity of a solution: they were simple fast fixes like a tape on a leak, but the hacky tape they were.
(of course I tried, the code after “refactor” is still shit unless you start going very explicit about it at a point of being better and faster of doing it yourself)
nok22kon 9 hours ago [-]
yes, LLMs are not perfect
this is what separates real engineers which solve hard problems, adapt and overcome, versus the ones which complain that whatever they have access to is not perfect and so they will give up on it because "its unusable"
varun_ch 1 days ago [-]
maybe slightly unrelated but the new htmx homepage (https://four.htmx.org/) feels a little ironic, seemingly written with tailwindcss and a full JS ecosystem Astro build system. It also has the ‘vibey’ ‘hypey’ landing page design that’s hard to describe but you’ll find on any web framework, rather than dropping you to docs like the old site.
Compared to the original simple HTML site it’s really surprising to see from the grugbrain.dev author!
recursivedoubts 1 days ago [-]
:) i let a younger person on the core team create the new website for something different
it is using astro, we are scaling down the use of tailwind (I wanted to give it a try, but didn't really click with it.)
I don't mind someone doing something kind of fun with the website and trying something new out, I know some people don't like it but some people do. All good.
librasteve 12 hours ago [-]
i suppose you have to at least try tailwind if you advocate for LOB … in https://harcstack.org, I have started with https://picocss.com which keeps the HTML squeaky clean. it is open to other Themes down the line and I have not rules tailwind out, but I suspect that it will make me feel dirty when I come to it. in general hArc is able to leverage Raku roles for code decomposition and the optimum design is settling on pinning CSS styles to elements (grid, table, form, etc) and encapsulating them so that changes to one thing do not cascade to another
varun_ch 1 days ago [-]
that’s fair! It definitely looks good and modern!! I just wonder if it compromises the initial impressions of the project in some way.
mistrial9 1 days ago [-]
isnt it obvious that some web sites will become unreadable without serious machine assistance, while classical HTML web standards have some fallback path to read by a human ?
clear text with minimal markup has many desirable properties IMHO
I disagree with the trope -- (AI effects) "the slow dulling of our intellects". I am old enough to remember my career change, being a developer in the Apple ecosystem, confident with Objective-C and native system libraries in iOS and MacOS. I changed direction using a very different software stack in cloud services as a data engineer with deep utilization of Clojure. I have personal projects that I occasionally would return to in the former world -- often a decade or more later. I saw what I forgot immediately; but soon after, with engagement, I saw how quickly I was able to remember. Extended use of AI for me has exactly this footprint. Even "use it or lose it" is wrong -- "use it when you need to" is honestly more like it -- the brain is plastic. Some AI fears are warranted, this isn't one of them.
ekidd 20 hours ago [-]
> I saw what I forgot immediately; but soon after, with engagement, I saw how quickly I was able to remember.
We actually have pretty good models for how long it takes to forget things. It's the same basic math that powers Anki. To oversimplify, if you force yourself to remember something right before you would have otherwise forgetten it, you will remember it roughly 2.5 times as long before forgetting it again. (This changes at both the shortest time intervals and the longer ones, so treat it as a rough rule of thumb, not an exact formula.)
But this provides a handy bound! If you've been doing something professionally for 20 years, you should expect to remember it for another 50. At which point you're likely well into old-age, and memory performance may decrease for other reasons.
Where AI kills you is actually at the other end: initial learning. You are much less likely to need to recall something after 1 day, 2.5 days, 6.25 days, etc. And thanks to the lack of the "testing effect", memory formation will be much weaker.
In other words, I would naively expect AI to make long-used skills a bit rusty, but to drastically impede formation of new skills and knowledge.
hankbond 21 hours ago [-]
do you propose its maybe closer to the idea that you can regain strength faster after having lost it (in the context of bodybuilding and extended time off)? Gaining something from scratch requires much effort and experimentation, regaining it less so?
luisln 21 hours ago [-]
In all my side projects, instead of thinking about architecture or design decisions, I just ask it what I want the end effect to be. "I want this button to do a thing". You're saying this is good for my brain?
effnorwood 20 hours ago [-]
read this to mean the construction material. was wrong.
zuzululu 13 hours ago [-]
has anybody successfully shipped anything with htmx and llm ?
i tried it before with sonnet and the results weren't very good
went back to react
tatsuya-tamaya 9 hours ago [-]
[flagged]
z0ltan 16 hours ago [-]
[dead]
Reuben_Santoso 18 hours ago [-]
[dead]
Ozzie-D 18 hours ago [-]
[flagged]
nsonha 1 days ago [-]
AI makes the case for htmx, we don't have to think about the spaghetti code, AI does it for us /s
smokefoot 22 hours ago [-]
The author admits that the logic of the language and the design of the parser are idiosyncratic. Even the solution the author likes is an extension of an existing hacky trap door. He could be more open-minded about the solutions the AI proposed and in fact, I think AI could potentially rearchitect this in a more structured, sustainable, and legible way.
Many developer criticism of AI coders could be easily directed at 95%+ of human developers. Much coding is monkey see, monkey do and keep trying until it does the things we want it to do. AI can certainly do that cheaper and faster and really this is why automated testing became such an important software discipline with or without AI.
slopinthebag 22 hours ago [-]
Yeah, no. The AI was unable to come up with a good solution whereas the human was. Point human.
smokefoot 21 hours ago [-]
Maybe fair. I think my point was the author emphasizes how strange the software is. The further you are from the training data, the less well a model will perform. I haven't looked at the project, but it seems like it could maybe be written more conventionally. Or maybe not! In which case AI is bad at creativity and thinking outside the training data and that's a genuine insight.
It’s not human, of course, and I think this problem actually relates to the fact that LLMs don’t have a world model. They don’t study and think through a design in the way that humans do. They don’t form a mental model of how everything fits together and how that design can be tweaked to most elegantly support a change.
I suspect that this is a fundamental limitation of LLMs, and that design will remain a weak point until some sort of bespoke design AI is bolted onto the side. In the meantime, we’ve got a lot of people producing a lot of code very quickly, and I think the debt in that code is going to be a millstone around our necks for a long time to come.
I suspect there's also a strong sociological bias at play: LLMs are being made by people who are familiar with coding but aren't software engineers. So they design their RL policies around the idea that the LLM must learn how to code, not that they must learn to design a maintenable piece of software.
Part of that is critical thinking and projecting forward / simulating potential issues, and part of that is that memory which in humans we probably would see as "wisdom".
I don't know if that's a fundamental limitation of LLMs, or, rather, that this can be solved moving forward with better memory systems, harnesses, and context windows.
They turned the english language into enterprise java and my train of thought is now a series of NullPointerExceptions
.... tell them not to do that if you don't like it?
"PR Descriptions must explain the entirety of the PR's contents in 300 characters or less and be written at no greater than a 600 lexile score. After writing the description, carefully review it's claims against the changeset diff if any staged changes are unable to be tracked back to a claim in the PR description, reject the creation and alert the user of the discrepancy offering solutions on how to remediate"
- Start in ask mode - "I'm planning on doing X to achieve Y; are there any alternative approaches? What problems might I run into?"
- Chat for a bit and get the high level approach, switch to plan mode and ask for a nicely formatted plan
- What's kicked out is already in the rough shape of the discussion so far, so it's a case of following a nicely formatted doc through and highlighting sections of text and asking for clarification or changes
- Hitting "build" and then reviewing what's been done
For a new service I might spend an hour in ask/plan mode - but then it gets 95% of the build itself right first time.
Do you do the same with different results, or is there a different stack/methodology you go through?
[Suboptimal choice]
And here's why it's not suboptimal -- you said X sucks and notto do X, but this choice is not technically X, it's just really similar and shares that sucky property.
Planner / executor separation can make a huge difference in performance. LLMs are fantastic at coming up with a lot of elaborate narratives regarding what should be done. They are terrible about doing that prescribed work all at once. This impedance mismatch is best resolved with a simple role separation. Placing a shared collection of tasks between these roles is how you can decouple them. The executors need significantly more tokens than your planners to get the job done. It's probably in the range of 10-100x more for really complicated jobs with a lot of iterations through compiler feedback, sql provider errors, etc. This is why you can't do both things in the same context very well.
I disagree. Have a conversation with it about your problem and work through design decisions with it. When I do that, I find it gives me a lot of good ideas.
Disclaimer: I'm not working on anything groundbreaking (like most people)
I have found being Socratic in my questions, and trying to get the AI to arrive at my intended design via such conversations supplies the right level of context for properly solving the problem. It’s token intensive, without a doubt, but I find the result is the AI tends to be better equipped to handle the many micro decisions that need to be made along the way.
The contrast to this is I give it a detailed prompt where it then asks questions of me, which also generally works but I find the AI tends to not be as well equipped for decisions it needs to make mid implementation.
It’s not perfect, and maybe not even a good fit for some. I also never know what to think when people tell me their idiosyncratic ways of using AI. Ultimately I think the most effective way is whatever lets you translate the vision in your head into the end result.
But the problem is that when you ask ai to solve a problem on its own, its default plan can suck. You can mitigate that by research and context but it doesn't mean the initial problem is solved. But even that requires skill and human judgement (both ai conversation research or traditional research) and a lot of people want to skip that entirely.
Nobody knows everything, so of course LLMs can be useful sometimes. More useful than plain old search, books, or even discussion with real humans? Maybe.
Search can offer a much broader context than an LLM hyperfocused on just generating text. Books may lead you to realize you were asking the wrong questions. Discussions will provide an overall "vibe" of the topic.
These are not competing options. We can and should be using all of them when possible.
Even when I already have a good idea of how I plan to do something, I may still ask AI and then find it gave me better idea for some particular thing.
I liken it to using GPS even when you know the route like the back of your hand. It can still steer you around an accident.
To do this effectively I have to drop the idea that I always know better than it does.
Instead of asking it to generically to analyze and do X, you can use brainstorming skills like those from superpowers [1].
This makes it approach the problem better and keeps you in the loop.
Another step is then to have it review its plans by another LLM acting doing adversarial review. I have a claude skill [2] that calls codex to do it, and they chat among each other.
It's a tremendous boost in design quality.
[1] https://github.com/obra/Superpowers
[2] https://gist.github.com/enricopolanski/6c5038a8e20cc4098cd99...
This article will be part of the next model training set, and probably it will be able to solve it despite not understanding anything about world or not studying or thinking.
it was a rather mundane bug, but i thought the interaction was interesting and worth analyzing to show where AI is very strong and where it is not as strong
The example is mundane but to the point; and I very much enjoyed this article. It's a concrete example which is rare to read when it comes to using LLMs.
To the risk of being told that we "hold it wrong", it resonates with my experience of using LLMs.
This actually seems like a really important idea absolutely deserving of its own blog post.
I'd have to think about the exact argument for why this feels so right, but the kernel would go something like this: whatever you build on those parts of the codebase where you have technical debt incurs new technical debt, because you're building on top of abstractions you'll remove later. The reason you have to remove the new abstractions, too, is that abstractions are like puzzle pieces: their structure determines which other abstractions they can connect with. So, as a rule (there are some exceptions), you can't take out one bad part, replace it with another, and leave everything around it untouched.
And, of course, it's easier to build on top of something creaky but currently serviceable than it would be to first rip that out and replace it, so that's what you do in most cases ... and the whole codebase gets more creaky and less serviceable; you increase the amount of abstractions you'd have to rip out and replace before building something new. The problem does, indeed, grow exponentially.
The argument is free to a good home -- I don't have the time for a full, meticulous elaboration, but I'd love to read one if someone is interested in making it.
> Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite.... The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.
Said another way, tech debt has a multiplicative factor the farther away from the end user you get. Tech debt in the database is worse than in the data layer. It is worse in the data layer than in the business logic. It is worse in the business logic than in the UI code. etc.
This is related to the fact that it gets exponentially more difficult to refactor code the farther away you get from the end user. Changing the database is usually more difficult and impacts more things than the data layer code. And on and on we go back up.
The second issue is: what was tooling and the prompt approach?
(To be clear, I have no problem with the premise of the write up. But without some details like this, it's sort of like saying "I had a bad board on my deck, and my tape measure wasn't able to help me remove the nails. What a bad tape measure."
The series of prompts weren't particularly interesting or innovative on my part: a paste in of the user report then a few back and forths on fixing it, me reviewing the changes and coming up with the final answer.
Not saying every problem can or should be solved but AI, but mastery of the tools is kind of important when evaluating the tools. It's like complaining that vi or emacs is slow to use because of the bindings are complicated.
https://mtmc.cs.montana.edu
https://bcp.cs.montana.edu
but we can all be better i guess
It gets pretty far to the solution on it's own and quickly, but then you spend time adjacent to the problem, building out it's cage while iterating through the remainder of the solution.
LLM, being a tiresome little helper, will gladly output hundreds of lines, hacks, and what have you.
I don’t think any amount of tests, prompts, harnesses and other “my shaman is a better shaman” will help it to acquire this trait. Some other AI architecture someday maybe — just not today.
And that’s why it is good at what it is and really bad at stuff like code “design” (unless it is a well-known solution being baked in the training set)
This made me chuckle. I will steal this from you.
I've used with great success prompts like "when implementing this feature, did you encounter sections of code that were needlessly complex, that were making it hard for you to work? what would you change in the design/architecture to make it leaner?"
You forgot the premise of the article and why the proposed solutions were not good. It was not the complexity of a solution: they were simple fast fixes like a tape on a leak, but the hacky tape they were.
(of course I tried, the code after “refactor” is still shit unless you start going very explicit about it at a point of being better and faster of doing it yourself)
this is what separates real engineers which solve hard problems, adapt and overcome, versus the ones which complain that whatever they have access to is not perfect and so they will give up on it because "its unusable"
Compared to the original simple HTML site it’s really surprising to see from the grugbrain.dev author!
it is using astro, we are scaling down the use of tailwind (I wanted to give it a try, but didn't really click with it.)
I don't mind someone doing something kind of fun with the website and trying something new out, I know some people don't like it but some people do. All good.
clear text with minimal markup has many desirable properties IMHO
Shameless plug: https://open.substack.com/pub/deimos28/p/the-friction-collap...
We actually have pretty good models for how long it takes to forget things. It's the same basic math that powers Anki. To oversimplify, if you force yourself to remember something right before you would have otherwise forgetten it, you will remember it roughly 2.5 times as long before forgetting it again. (This changes at both the shortest time intervals and the longer ones, so treat it as a rough rule of thumb, not an exact formula.)
But this provides a handy bound! If you've been doing something professionally for 20 years, you should expect to remember it for another 50. At which point you're likely well into old-age, and memory performance may decrease for other reasons.
Where AI kills you is actually at the other end: initial learning. You are much less likely to need to recall something after 1 day, 2.5 days, 6.25 days, etc. And thanks to the lack of the "testing effect", memory formation will be much weaker.
In other words, I would naively expect AI to make long-used skills a bit rusty, but to drastically impede formation of new skills and knowledge.
i tried it before with sonnet and the results weren't very good
went back to react
Many developer criticism of AI coders could be easily directed at 95%+ of human developers. Much coding is monkey see, monkey do and keep trying until it does the things we want it to do. AI can certainly do that cheaper and faster and really this is why automated testing became such an important software discipline with or without AI.