ai coding is best used in small chunks, where the complexity is managed to one task at a time. projects of massive complexity tend to fail quickly. it ends up generating code that theoretically should work but somehow doesn't. it'll make you think you discovered some sort of million dollar idea and then you'll realize the code has no soul and it's a different kind of existential dread.
I had to hammer this over and over and over with my coworkers because I think most people dont understand that at a fundamental level its just a black box of probability.
like, for example lets assume that its 95% right like you said. then you need to isolate the tasks it works on to one prompt at a time for it to stay at 95% chance of it being correct.
if you incorrectly assume it will correct itself, or act like a human, the 5% chance it's wrong compounds every prompt and soon you just have a pile of slop.
its simplest to isolate what tasks are "solved" or deterministic and never let the ai do it. for example: if you let the agent try to find files it will infinitely loop around trying random stuff until it finds the file 80 prompts in. finding files is literally a solved and deterministic thing, so instead of letting the ai do it, make a script to use grep and then calls the AI to act on the files to do something that isn't a solved problem.
this way you remove all the solved problems where the ai could accidentally hit the 5% where it's wrong and spend time doing useless stuff and then it actually does stuff thats useful.
it's all about managing the probability