Recent reporting and analysis present a mixed picture of agentic AI in software development. Multiple sources argue that coding is becoming faster because AI handles large parts of implementation: organizations see productivity gains and more PR throughput, and some teams use AI to move from writing code to reviewing AI-generated changes. However, several articles emphasize that faster code generation does not automatically translate into better or more reliable products. Studies and case-based writeups describe a “verification” or “review” bottleneck, where engineers spend more time checking AI output for correctness, security, and maintainability. Some sources highlight that coding often remains in a human–AI collaboration mode rather than full automation, partly because success criteria and error recoverability are less clear for feature work and architecture changes than for high-volume, well-bounded tasks. Other accounts describe a “production gap” or “technical cliff,” where AI-built prototypes work in demos but struggle with database configuration, authentication, deployment infrastructure, and security hardening. Several pieces also note organizational factors: adoption success depends on process redesign, orchestration/“governance” layers, and how companies redeploy or cut capacity after productivity gains. Collectively, the sources describe a transition from copilots to orchestration and “agentic QA,” but with ongoing risks around reliability, accountability, and production readiness.
AI coding boosts developer output but keeps software production and verification bottlenecks
Recent reporting and analysis present a mixed picture of agentic AI in software development. Multiple sources argue that coding is becoming faster because AI handles large parts of implementation: org...
- Organizations report productivity gains from AI-assisted or agentic coding, often shifting work from writing to reviewing code.
- Coding stays largely in human–AI collaboration because tasks have less clear success criteria and more risk of unrecoverable errors than higher-volume functions.
- Some studies and industry accounts describe a production gap: AI-built prototypes or apps often fail when moving from demos to production, especially around database setup, authentication, and deployment.
- Multiple sources point to increased review/verification burden and reliability concerns (including security vulnerabilities) as AI output scales.
- Adoption outcomes depend heavily on organizational process changes and governance, including orchestration, access control, and appropriate human approval gates.
When a feature is built with AI in a fraction of the usual time, it can feel like the hardest part is already behind. But generating code that runs is not the same as generating code that is bug-free, secure, and built to last. The gap between the two rarely shows up in the demo. It shows up later, in three places: the cost of getting from "working" to actually production-ready, the cost of keeping that code alive as the project grows, and the loss of human judgment that AI can't fully replace. Understanding these three costs upfront is what separates a realistic AI-assisted project plan from one that runs into trouble six months in. Cost of Quality: What You're Really Paying For Bug-free, pixel-perfect code and a working demo with minor bugs are not the same deliverable and they don't cost the same. A few things matter here: AI-generated code is a starting point, not a finished product. It will not come out with flawless architecture or production-grade quality on the first pass. That is normal, and planning for it early helps avoid surprises later. Speed and quality move on separate tracks. AI gets you to a working version faster, but faster isn't the same as ready. Treating them as one metric is where expectations go wrong. AI can speed up generation, but compressed timelines often leave less time for proper testing. As a result, the team may get a working feature quickly, while the real quality check is pushed too close to release. Speed is the right call in the right context. If the goal is to validate a hypothesis or test a prototype, a looser quality bar makes sense, that's exactly what speed is for. But production code comes with a bill that's due later. Code heading to production still needs to be refactored, reviewed for security, and checked for bugs, regardless of how it was generated. Skipping that step now doesn't remove the cost, it just delays it. There is also a review cost that is often underestimated. Human-written code is usually easier to review when the developer understands the architecture and makes deliberate decisions. With AI-generated code, developers often spend more time checking whether the solution fits the existing system, whether the logic is reliable, and whether the code introduces hidden risks. The earlier this cost is planned for, the cheaper it is. A project that accounts for refactoring and review from day one spends less overall than one that treats AI-generated code as "done" and pays for the cleanup as an emergency later. Cost of Ownership: Who Maintains the Code a Year From Now Every project gets harder to maintain the longer it goes without attention. That has nothing to do with whether AI was involved. Frameworks get updated, dependencies change, and parts of a system quietly stop working as the ecosystem around them moves on. That's a normal part of any project staying alive and evolving, not a sign something went wrong. The difference with AI-generated code is in how early that maintenance plan needs to start. Keeping AI-generated code running long-term means thinking about ownership from day one, not after the first issue shows up, because issues will show up from multiple directions at once, and that's expected, not exceptional. That means having a team in place to maintain it, patch gaps, and keep it stable as the project grows. There is also the cost of context. As the project grows, the model needs more information about the existing architecture, dependencies, business logic, and previous decisions. Passing that context properly takes time, and the cost of using AI effectively can grow together with the project itself. Code built the traditional way tends to carry fewer of these issues out of the gate, simply because more deliberate review happens earlier in the process. AI-generated code, by comparison, tends to need a heavier maintenance investment to reach the same level of stability, which is worth factoring into the cost of ownership from the start, not after the fact. Predictability: Why the Problem Is Usually Process, Not AI There's a part of development that doesn't change no matter how good the tooling gets: thinking through risk. That's a form of critical thinking that still depends on a person doing the thinking, planning ahead for things like: Security vulnerabilities; Payment and billing logic; Cloud infrastructure costs and limits; How a feature behaves in unusual or unexpected situations; Dependencies on other parts of the system; Data privacy and compliance requirements. Without clear instructions and boundaries, AI can also optimize for the immediate task instead of the long-term structure of the product. It may solve a narrow case, touch the wrong parts of the codebase, or add a workaround that works now but becomes harder to maintain later. This is why clear boundaries, context, and human review matter so much in AI-assisted development. Generating code skips a lot of that deliberation by design; it gets you to an output faster, with less time spent considering what might fail along the way. This shows up most clearly in predictability. AI can answer questions about a specific situation or a piece of code. But it is much harder to count on a consistently good answer across different projects and contexts. Most real situations still need someone focused on the specific bug, with the judgment to understand what is actually wrong. Bug fixing is a good example: AI doesn't always identify what needs to change, even in code it generated itself. That's not a flaw to work around, it's exactly where a human still has to be in the loop. Summary Across all three areas: quality, ownership, and predictability, the same pattern shows up: the issues that surface usually trace back to process, not to AI itself. Unstructured CI/CD, gaps in review, irrelevant context carried into a project, leftover code that should've been cleaned up - these are the things that actually cause problems, and they're not unique to AI-assisted development. Whether the time saved on generation gets eaten up later by code review and fixes depends entirely on how the development process is set up. Teams that build a solid process around AI-assisted development keep the time they saved. Teams that skip it usually end up spending it later, just under a different name. AI doesn't remove the need for engineering discipline. It just changes where that discipline needs to show up.
22 hours agoThe real upgrade is your workflow contract: permissions, durability, and handoffs. The Hook: the bottleneck moved, and most teams missed it The fastest way to ship with AI right now is not model shopping, it’s workflow engineering. That sounds backwards in a week where everyone is benchmarking the latest model drops and arguing which assistant “feels smarter.” But if you’ve shipped anything non-trivial with LLMs lately, you already know the pain isn’t usually “the model wrote bad code.” The pain is: agent loops that die halfway through a task approval prompts nobody can reason about fragile context chains that can’t survive retries humans doing cleanup because automation forgot state Translation: the constraint moved from intelligence to execution. And this week’s trend signals made that impossible to ignore. What changed this week (and why it matters) A few threads converged hard: Big excitement around new frontier model updates (like Claude Opus 4.8 discussions). Strong traction on “just use Postgres for durable workflows” thinking. A viral little game about AI agent permission fatigue that hit too close to home. Ongoing DEV conversations about how developers are actually using AI at work, not how slide decks say they should. DEV platform work on embeddings-powered relevance, reminding everyone that retrieval and ranking are now product-critical, not side quests. Different posts, same message: capability is rising, but trust and operational control are lagging. We are entering the “orchestration tax” era. If you don’t pay that tax intentionally, you pay it as outages, silent failures, and engineers babysitting bots at 11:40 PM. Why this lands hard in real teams In real codebases, AI output is rarely the final artifact. It’s an intermediate step inside a larger system: ticket triage, PR drafting, test generation, migration planning, incident response, docs updates, and customer-facing changes. That means your core problem isn’t “can the model produce text/code?” It’s: Can the task resume after a timeout? Can we audit who approved what? Can we re-run safely without duplicate side effects? Can a human take over mid-flight without starting from zero? Most teams treat those as “later” concerns. Then later becomes now, usually after one failed launch week. Here’s the uncomfortable part: senior engineers already know how to solve this class of problem. We solved it for payments, queues, and background jobs years ago. Idempotency keys, checkpoints, retries, compensating actions, transaction logs. Same movie, new actors. AI didn’t invent distributed systems pain. It just made junior failure modes happen at senior speed. The wrong question everyone keeps asking The wrong question is: “Which model should we standardize on?” Useful question, sure. But it’s not first-order. You can run an excellent model on a brittle workflow and still get chaos. You can run a merely good model on a robust workflow and get compounding value every sprint. Model quality matters. But it is now one variable in a larger reliability equation. If your process depends on uninterrupted context windows, manual approvals with no policy, and “hope-based retries,” the model leaderboard won’t save you. Choosing a model before choosing your execution contract is like picking a race engine for a car with no brakes. The better question: what execution contract do we enforce? Ask this instead: “What must be true for AI work to be safe, resumable, and reviewable in our stack?” That question leads to engineering decisions, not vibes. Here’s a practical playbook you can apply this week. A concrete playbook for next week’s sprint 1. Define task boundaries before prompt quality Split AI work into explicit steps with inputs/outputs: collect_context propose_change run_checks request_approval apply_change summarize_result Do not let one giant prompt own the whole lifecycle. 2. Persist state in boring infrastructure For many teams, Postgres is enough to start: workflow table with status, step, attempt_count event log table with append-only transitions payload snapshots at key checkpoints If a worker crashes, you can recover from state, not memory. 3. Make retries idempotent by default Every side-effecting action needs a stable operation key. If the same step runs twice, outcome should be identical or safely deduplicated. No idempotency, no production. 4. Replace permission spam with policy tiers Permission fatigue is real. Don’t ask for approval 17 times in a row. Create tiers: Tier 0: read-only ops auto-approved Tier 1: low-risk write ops batched approval Tier 2: high-impact ops explicit human checkpoint Then log every decision. Humans hate prompts; they like clear policy. 5. Instrument failure modes, not just token usage Track: step timeout rate retry success rate human intervention points rollback frequency “completed but unusable” outcomes If you only track latency and cost, you’re blind to operational quality. 6. Optimize prompts after workflow reliability Prompt tuning matters, but sequence matters more: reliable state transitions recoverability approval ergonomics then output polish Polishing unstable systems just gives you prettier failures. 7. Assign ownership like any other production system Give one team explicit ownership of AI workflow reliability. If “everyone owns it,” nobody owns incident response, policy drift, or replay tooling. The contrarian take Here it is: the hottest AI teams in 2026 might look boring from the outside. They won’t brag about autonomous agents replacing everyone. They’ll quietly run durable, observable, policy-driven pipelines that keep shipping with fewer surprises. Their superpower won’t be mystical prompts. It’ll be disciplined systems engineering applied to AI-native work. That is less cinematic. It is also what survives contact with reality. Closing line that should stick Models are getting smarter every month; your edge comes from building workflows that don’t panic when reality shows up.
1 day agoThe hype of AI is at its peak. Scroll through LinkedIn and Medium and you will see “vibe coders” claiming that programming is dead. Startups pitch to investors on products that were “built entirely with AI.” Companies whisper in boardrooms that developers are replaceable, cutting cost and time. The hype has twisted the industry’s perception on AI. It is seen less as a tool and more as a replacement. That’s not just wrong, it’s dangerous. The Illusion of AI as a Replacement Hollywood might be the first one to plant the seed in our heads that AI will one day take over us. We are not there yet, but AI can spit out code, debug snippets, or scaffold an app faster than a junior and even a senior developer. On the surface, it looks like a replacement. But here’s the catch: generating code is not the same as building, maintaining a product and making critical decisions. Real software development is more than writing lines of neat code. It’s: Understanding business logic Architecting systems that scale Navigating security, compliance, and integrations Maintaining legacy code and evolving it over time AI is a pattern machine. It references from existing sources found in the internet and learns from human input. But, it doesn’t understand if the suggestion is relevant and accurate. It doesn’t make trade-offs. It doesn’t negotiate with stakeholders or know why one solution is better than another. Treating AI like a self-sufficient developer is like thinking a scientific calculator can replace a mathematician. The Productivity Multiplier We’re Ignoring The real promise of AI isn’t replacement, it’s augmentation. Developers can leverage AI by automating boilerplate and repetitive work; speed up debugging and testing; and focus on more important problems rather than bogging down on syntax. It’s the same story as every technological leap. Spreadsheets didn’t eliminate accountants. CAD didn’t eliminate architects. They amplified what humans could do. AI should be the same for developers, if we use it right. The “No Juniors” Trap It’s in the headlines everywhere. “AI threatens entry-level jobs as university grads struggle to get hired” “AI-driven layoffs are shrinking the job market for recent grads” “‘A black hole’: New graduates discover a dismal job market” If companies cut junior developer roles because “AI can do it cheaper and faster,” they destroy the pipeline of future talent. Picture this: In 2025, a product company replaces its junior devs with AI copilots By 2030, their seniors start leaving, retiring, or burning out Who will maintain the legacy systems? Who will inherit the institutional knowledge? AI keeps patching code, but without understanding the why, and the technical debt piles up Hiring becomes a nightmare because there’s no generation of mid-level devs to replace the seniors The chain has been broken, the company will be left dangling. The Robot Fallacy The industry keeps projecting sci-fi fantasies onto AI. We think of it as a self-aware robot that can “just do things” autonomously. Deploy it, press the red button and away it goes 24/7. But AI today is context-blind. It can autocomplete based on what you are typing, but it can’t architect on an optimal solution. It can generate lines of code within seconds, but it can’t govern what is happening under the hood. Until that changes, and we’re not even close to it, replacing humans outright is reckless. We’ve Been Using AI Wrong The hype has blinded us. It may even brainwashed us all. The conversation keeps circling around “AI replacing developers” when it should be “AI helping developers.” AI is not here to replace the people who build, maintain, and evolve systems. It’s here to make those people more productive, just like every tool before it. The companies that understand this will thrive. The ones that don’t will eventually collapse under the weight of their own short-sightedness. AI is your fast thinking and working assistant, not an self-aware architect. A tool, not a team. If we keep misusing it as a replacement instead of a force multiplier, we won’t just harm developers. We’ll cripple the very industry we’re trying to “revolutionise.” AI won’t replace developers. But companies that misuse AI will replace themselves, out of existence. The hype has blinded us. It may even brainwashed us all. The conversation keeps circling around “AI replacing developers” when it should be “AI helping developers.” AI is not here to replace the people who build, maintain, and evolve systems. It’s here to make those people more productive, just like every tool before it. The companies that understand this will thrive. The ones that don’t will eventually collapse under the weight of their own short-sightedness. AI is your fast thinking and working assistant, not an self-aware architect. A tool, not a team. If we keep misusing it as a replacement instead of a force multiplier, we won’t just harm developers. We’ll cripple the very industry we’re trying to “revolutionise.” AI won’t replace developers. But companies that misuse AI will replace themselves, out of existence.
1 day agoAI wrote the first 80% of my feature in 10 minutes. The code was clean. The logic made sense. The happy path worked on the first try. I ran it, saw it work, and felt that specific kind of developer pride that makes you lean back in your chair slightly. I was impressed. I felt genuinely productive. I thought I'd be done in another 10, maybe 15 minutes. That was Tuesday By Thursday evening I was still working on the same feature. Not because the AI had failed. Because it had succeeded at exactly the wrong thing the easy part and left the actual hard part entirely to me. The edge cases. The error handling. The null checks. The situations that only surface when a real user does something the happy path didn't anticipate. The AI didn't write those. It didn't even know they existed. It optimized confidently and completely for the world where everything goes right - and that world is not the one your users live in. That's the 80/20 rule of AI code. The first 80% is fast, impressive, and kind of magical. The last 20% is where the real work actually lives. And it takes 80% of your total time. Here's what I've learned about that gap, and why I think it matters more than the 10 minutes you saved on Tuesday. The 80% - Fast, Clean, and Genuinely Impressive I want to be honest about this part before I get into the frustration. The AI is remarkable at the first 80%. You give it a clear prompt, it understands what the happy path looks like, and it generates code that works. Not kind-of works. Actually works, with reasonable variable names and logic that flows the way you'd expect. The first time I saw it in action I genuinely felt like I'd cheated at something. Tickets were closing. The velocity graph was going up. I was shipping things faster than I had in years. And that feeling is real - I'm not being sarcastic about it. The AI is fast because it's operating in familiar territory. The happy path is the well-trodden path. It's the version of your problem that exists in some form in the training data, that has been solved thousands of times before, that the model can pattern-match its way through with confidence. The 80% is real. The speed is real. The problem is that we've started treating the 80% like it's the whole thing. And it isn't. The 20% - Where Tuesday Becomes Thursday The AI wrote the happy path. Here's an honest list of what it didn't write: The empty list. What happens when the user has no data yet? New account, nothing in the database, the list the AI assumed would always have items turns out to be empty. The AI didn't check. You find out from a user report three days after launch, spend an hour tracing back to the unhandled case, and add the check you should have written on Tuesday. The error handling. The AI assumes the network responds. It assumes the API returns what you asked for. It assumes the third-party service is up. Every try-catch block, every fallback, every "what do we show the user when this fails" decision - that's yours. The AI left it blank because things going wrong wasn't part of the prompt. The domain-specific edge cases. This is the one that surprises me every time. The AI doesn't know your business logic. It doesn't know that "empty" means something different in three different parts of your application. It doesn't know about the legacy data that's formatted differently. It doesn't know about the enterprise customer who uses the product in a way nobody expected. You know those things. The AI has never heard of them. The performance cliff. The AI writes code that works for the examples it was given. It doesn't stress-test for scale. You find the bottleneck when the feature goes live and the page suddenly takes four seconds to load for users with large datasets. The code isn't wrong. It just wasn't written with real load in mind. The maintainability tax. This one is the slowest to show up. The AI writes code that solves today's problem. Three months from now when the requirements shift slightly and you're trying to extend it, you realize the abstraction doesn't quite fit the new shape. Refactoring it costs more time than writing it from scratch would have. Each of those items takes time. Together, they consistently add up to about 80% of the total effort on any feature I've shipped using AI-generated code. The 30 Seconds That Cost Me 3 Hours I was looking at a pull request recently - maybe 200 lines of AI-generated code that I'd prompted in about 30 seconds. I spent the next 3 hours with it. Not because the code was broken. The code was fine. I spent 3 hours adding everything the AI had quietly decided wasn't its problem: the error paths, the null checks, the comments explaining the decisions that weren't obvious, the edge case I found by actually thinking about what our users do. During the 30 seconds I felt fast. During the 3 hours I felt slow. But here's the thing I keep coming back to: the 3 hours was the actual work. The 30 seconds was the scaffolding. The AI didn't reduce the work - it relocated it. The time moved from writing the structure to making it real and making it real is slower because it requires something the AI genuinely doesn't have: context about your specific situation, your specific users, your specific history with this codebase. That was the moment I stopped caring about how long generation took and started tracking something more honest: how long until it's actually ready to ship. Why This Isn't a Complaint About AI I want to be clear - the 80/20 split isn't a failure of AI. It's basically the design. The AI is optimized for the common case. The common case is the happy path. Generating the common case quickly is genuinely useful; I'm not being dismissive of that. The issue isn't with the AI. The issue is with how we've started measuring productivity around it. We measure velocity. Tickets closed. Lines generated. Contribution graph. And all of those metrics capture the 80% beautifully - because the 80% is fast and visible and shows up as green squares. The 20% is invisible to those metrics. Nobody's dashboard shows time spent adding error handling. Nobody's standup starts with "I spent yesterday on edge cases the AI didn't anticipate." It doesn't show up anywhere. But it's where most of the actual time goes. The 80% is what gets you to a demo. The 20% is what gets you to production. And if you're not tracking how long the 20% takes, you're not tracking your real productivity - you're tracking how quickly you can type a prompt and feel good about it. What I'm Actually Doing Differently Not quitting AI. Not even thinking about it. But I've changed a few things: I budget for the 20% upfront. When I estimate any task involving AI-generated code, I add roughly 4x to whatever the generation time suggests. The AI says "this is a 10-minute feature." I tell my brain it's a 40-minute feature and plan accordingly. It's not pessimism - it's just the pattern holding. I prompt for the unhappy path explicitly. Before I even generate the main code, I add to the prompt: what should happen with empty input? What should happen when the API fails? What edge cases exist here? The AI won't think of them on its own. If I name them, it at least takes a pass at them. I write the failing tests before the code exists. What would break this? What would a mischievous user do? I write those tests first so the AI has a target. It doesn't catch everything, but it catches more than the AI would find by itself. I remember the 3 hours. When I'm tempted to push something quickly because it works in the demo I think about the 3 hours. The 30 seconds felt good. The 3 hours was the job. None of this makes the 20% disappear. But it makes it predictable instead of surprising, which is the difference between managing it and being ambushed by it. One Question What's the longest you've spent on the last 20% of something the AI generated quickly? I want actual numbers if you have them. The gap between how long generation took and how long it actually took to ship - that's the number I'm curious about. My answer: 30 seconds to generate, 3 hours to finish. What's yours? 👇 Heads up I used AI to help structure this post and refine my thoughts. The experiences stories and opinions are my own.
1 day agoAI coding tools are not reliable out of the box. Anyone who has used them on a real production project knows this. They work great on greenfield code. They fall apart on a codebase with history. I learned this the hard way. The Problem We have a fintech project that has been running for nearly three years. Two React frontends, an admin panel and a client-facing app, sitting on top of a FastAPI backend with a complex relational database. Multiple tables, heavy joins, financial and personal user data that has to be handled carefully. Adding new features should be simple. Most tasks are straightforward: new frontend page, new API route, new integration with an external financial data provider. But it was taking longer than it should. The database structure was only fully understood by one person. Every new query, every new table join, needed that person in the loop. We decided to bring AI into the workflow to move faster. That is when things got interesting. AI Just Completely Ignored Our Users Table I asked the AI to create a contacts table. It did. And it created it with columns for first name, last name, email, and phone number. All fields we already had in our users table. Instead of linking the two tables with a foreign key, it just dumped everything into a new table and duplicated the data. I had to go in and fix the schema manually. This was not the AI being stupid. It had no idea our users table existed. It made a reasonable decision with incomplete information, and that is exactly the kind of mistake that is hard to catch until it is already in your schema. That incident changed how I thought about the problem. Instead of asking "how do I get AI to write better code?" I started asking: "What does the AI need to know to make good decisions?" The Fix: Make the Codebase Legible to AI I took inspiration from Matt Pocock's work on structured AI workflows and adapted it for our team. The core idea is simple: AI is only as good as the context you give it. So we made the context explicit and authoritative. Here's what we set up: 1. Architectural Decision Records (ADRs) We created a docs/adrs/ directory with ADR files. Documents that record why we made specific architectural decisions, not just what they are. Each ADR answers a specific question the AI might face: How do we create a new table? How do we link tables together? How do we structure a new API route? Where does each type of file live in the codebase? The contacts table mistake became ADR-001: When creating a new table, check existing tables first. If relevant fields already exist, use a foreign key. Never duplicate user data. Always ask before creating new columns that could belong in an existing table. Now when the AI encounters a new table task, it reads that rule first. 2. A Context File and Glossary Our codebase had terms that meant specific things to us, words that an AI trained on general code would misinterpret. We wrote a context.md that explained what each term means in our codebase specifically, how different terms relate to each other, and which concepts that sound similar are actually different in our system. We also wrote a plot.md, a high-level map of what the project is, what it does, and how the pieces connect. Both files have one rule at the top: the docs directory is authoritative. These rules are not suggestions. Follow them in order. Do not skip steps. 3. Test Cases for Every API, No Exceptions Every new API route now ships with test cases. Not optional. Not "when we have time." This turned out to matter more than I expected, not just for code quality but for keeping AI reliable over long sessions. Here is what happened once: the AI made a small change to a shared utility function. The kind of change that looks harmless in isolation. But that function was used in twelve places and the change broke eight of them. The test suite caught it immediately. The AI saw the failures, traced them back to the shared function, and instead of just reverting, it created a new version of the function that handled both the old behavior and the new requirement. It fixed its own mistake without me touching it. Without those tests, that bug would have shipped. What Changed for the Team When I showed this to my team, a few people were skeptical. AI tools had let them down before. They had seen hallucinations, broken code, confident wrong answers. After the session, three teammates asked me to set up their projects the same way. The shift was not about trusting AI more. It was about understanding that AI needs structure to be reliable, the same way a new developer needs onboarding, documentation, and code review. You do not blame a new hire for not knowing your codebase. You document it. We did the same for the AI. The Setup, If You Want to Try It docs/ context.md # What this project is, what terms mean, how pieces connect plot.md # High-level map of the codebase adr/ 001-table-creation.md 002-api-structure.md 003-query-patterns.md ... A few things that matter: Be specific in your ADRs. "Check existing tables before creating new ones" beats "follow good database practices." Make the docs authoritative. Tell the AI these rules come first, always. Add a new ADR every time the AI makes a mistake. Turn failures into rules. Write test cases for every new API. Run them before every commit. When the AI breaks something, add a test for that case so it can never happen silently again. One Last Thing This system does not make AI perfect. It makes AI predictable. There is a difference. The AI still makes mistakes. But now the mistakes are smaller, easier to catch, and when they happen, they teach us something we can encode into the next ADR. The goal was never a codebase where AI does everything right. It was a codebase where AI does things consistently, and where the team can move faster because of it. That is what we built. I am a fullstack developer and tech lead working on backend systems in Python, Go, and Node.js.
2 days agoFor a long time, I am thinking about writing something on this topic, but since I was not sure about my thoughts and how to formulate them, I kept silent. The last few months things are getting clearer in my mind, so here I am to express my opinion on current AI development and its influence on software engineering industry and everyday tasks of one developer. I am writing this post in hope that my experience of more than one year actively using LLMs and agents in coding could help someone who is struggling with the entire idea behind. Productivity Fallacy We all heard big announcements on how AI is boosting productivity of developers by 20% - 30% and once when something is in public space as the mainstream idea, it is not so convenient to speak against it. Main industry drivers are making fortune based on that assumptions and it is not popular to be against the idea, or, like in my case, to be less for it as the main narration suggests. I would say that, even, it is not so hard as it is meaningless having in mind that your opinion won’t make any difference. LLMs are indeed helping to write code, or better said, to generate text. This is in the end the idea of large language models -> meaningful text generation. The main trap here is that, as a developer, you get a sense of large productivity because suddenly you are generating large amount of code in seconds. Having this metric in the head, productivity percentage from the beginning of the last paragraph is definitely true. What is important here is that the same code needs to be read and proven if it is working as it should. Before, when you got the task, you would think first on how to implement it and then start with small portions, writing everything on your own, or reading docs of other libraries to be sure how to call their interfaces, etc. In that process, your cognitive focus was entirely on the code writing and in the same time understanding what it is doing. On the other hand, with the AI generating the code, you don’t have anymore same cognitive process, but the cognitive load is shifting towards code review. OK, we can say, we are still good if we do the review faster than we used to write code before. Indeed, that is true, but here lie two things to consider: AI, by default, is generating more code than we wrote before for the same tasks. Having this in mind, code review is longer and it is harder to get entire context at once of what is written, because you have so much code to take a look at. Code review is not the same as writing the code. Here is the big difference; we can compare it with writing the book and reading the book. Does the same understanding of the book content has its author and the reader? Same is happening here with the code. We will produce anomalies where people who are standing behind pull requests do not remember what that code is doing in a longer run. Fears On Productivity If the main measurement of the productivity is the number of generated lines of code, this is already steered in wrong direction, because it will result in two things: Developers will spend less time reviewing because they compete to write the new code. Reviewing it is not something what brings an award. With less code review comes the even lesser understanding of what code does. Both things are extremely dangerous in a long run. What will happen when something breaks in production; who can resolve it and take responsibility of the change on the code that he/she does not understand? AI Assisted Coding vs AI Driven Coding Another aspect what I tend to see is that vibe coding is steering the idea of AI assisted coding towards AI driven coding. Responses from AI agents are being more and more persuasive and many developers tend to put all trust in it, having less critical approach. This makes the game more complicated because it twists around the roles of main driver and his assistant. If we combine this with the statements from the last two paragraphs, we can imagine in which direction it can lead. With the last development on skills and AI bots doing the review on every pull requests, insisting on the human code review has to be bigger than ever. If we rely on bot doing the code review and then pasting the bot comments into prompt asking for the changes, or even when bot automatically creates a changes for us, that is already concerning and should raise the question about who is steering the development and who shall be the main driver. These things could have a big cost in the future when something goes wrong, and it always will. Some things will go wrong, it is just matter of time and cost in the end. Asset vs Liability Code is not an asset, it is a liability. The asset is its functionality which brings money when it is put in production. But every new line of code is liability by default because it is a technical debt, it requires maintenance and therefore spends money. And more you have of it, you are more down in that debt spiral. Less code which is doing its job shall be the main focus. Therefore, productivity measurement which relies on how many lines of code one developer is generating does not make any sense. In fact, it only has a negative impact because, as an output, it has less code reviews and more generated code. Final Thoughts AI agents are very helpful when used carefully. Asking it to explain you and summarize what some portion of legacy code does is extremely helpful. Asking it to write an unit of code which you steer from the beginning to the end and having full focus on it is extremely helpful. AI bots which are doing the review are helpful, can find edge cases which you didn’t think of during the development. All of these things can really be used as a benefit, but not as the only source of truth. Some of the lessons I have learned so far is to aim to write less code as possible, because that is one of the ways how you can still keep yourself focused and not lose the track. Also, when bug-bot does the review, you should ask yourself if that comment makes sense in the business logic you are expecting to implement. Be critical, be more critical on the review, same as you would be for someone who is touching the code base for the first time. This will help to break the illusion which agent is giving persuading you that it did the correct thing by default.
3 days ago
Qualcomm nears deal to acquire AI startup Modular in talks valued around $4 billion
Qualcomm is reported to be in advanced talks to acquire Modular Inc., an AI chip and infrastructure software startup, in...
SPHERE AX partners with Blaize to develop AI semiconductor-based products
SPHERE AX, a South Korea-based company focused on Vision AI and edge AI computing, announces a partnership with U.S. AI...
Tecno expands EllaClaw AI agent with cross-app automation and system optimization
Tecno is expanding its EllaClaw AI agent, adding capabilities that move beyond simple chatbot-style responses into actio...