AI won't fix your engineering team. It will amplify everything - good and bad.

Everyone’s talking about AI making engineers 10x more productive. The pitch is seductive: give your team Copilot, Cursor, or Claude Code and watch the PRs fly. In a way, they do fly. More tasks completed, nearly double the pull requests, faster cycle times. The R&D numbers look fantastic on weekly exec reports.

The question is, are completed features shipping faster?

We’re measuring the wrong engineering metrics, optimising for the wrong outcomes, and most dangerously, we’re about to break the pipeline that creates the next generation of senior engineers.

The amplifier, not the fixer

AI doesn’t fix a team, it amplifies what’s already there.

Strong teams with clean architectures, good testing practices, and loosely coupled systems? AI makes them fly.

Teams drowning in technical debt, tightly coupled codebases, and manual deployment processes? AI makes them ship the same bad code in larger quantities and faster.

This isn’t a tools problem, it’s a systems and practices problem.

I’ve seen successes and failures first-hand. When we introduced AI assisted coding in a less experienced team, we saw more code and PRs being produced but the quality was worse. A case in point was a contributor who wrote an event processing lambda with a long running loop and sleep that polled an SQS queue (as opposed to using a more appropriate event driven pattern). This was caught by a senior doing an architectural spot check, but the code got generated and a PR was submitted for approval. Meanwhile, another team with better practices used the same tools to successfully migrate a legacy system to a modern architecture in golang and K8s. Same tool, completely different outcomes. The difference was the underlying software practices and architecture (test automation, well defined requirements, industry standard patterns), not the AI itself.

The DORA data backs this up. In 2024, AI adoption actually negatively correlated with delivery stability - teams were shipping more changes that broke more things. The 2025 numbers are better on throughput, but AI adoption still correlates with more change failures and longer resolution times. Faster isn’t better if you’re faster at creating incidents.

To be fair, the picture isn’t uniformly negative. A majority of developers in the 2025 DORA survey report that AI has a positive influence on code quality. But there’s a gap between perceived quality at the individual level and actual stability at the system level. The same report shows AI adoption still correlates with higher change failure rates. Developers feel more productive, the metrics say otherwise.

This is the Deploy on Friday problem all over again. If AI is creating instability, the answer isn’t to restrict and regulate AI, it’s to fix your testing, your architecture, and your feedback loops.

AI tooling accelerates technical debt as fast as it accelerates everything else. An engineer who used to write one poorly designed module a week can now produce five. If your architectural standards, your observability, or your test coverage are weak, you won’t notice the debt accumulating until it’s structural. It’s when it’s baked into enough services that untangling it becomes a multi-quarter project. The velocity gains are real, but so is the compounding of technical debt, both known and unknown.

The system thinker problem

AI has dramatically lowered the barrier for writing code, but it has raised the bar of good engineering. I think this is the most underappreciated shift happening right now.

When anyone can generate a working function in seconds, the differentiator is no longer whether you can write code. It’s whether you understand the end to end and how that code fits into the broader system, whether it respects your architecture’s boundaries, the infrastructure constraints, what happens when it fails at 3am on a Saturday, and whether it’s actually solving the right problem.

Andrej Karpathy coined the term “vibe coding” for this: describing requirements in natural language and accepting whatever the AI produces without deeply understanding it. It’s powerful for prototyping, but the production quality gap is real. Vibe coding is the fast path to code that works in a demo and breaks in production.

The engineer who can hold a system in their head, reason about how components interact, and course-correct the AI at every junction, that’s the engineer who thrives in this world. And that engineer is almost always a senior.

ThoughtWorks called it “coding assistant complacency”: relying on AI suggestions without the expert skills to evaluate the output. The result is overbloated systems and subtle architectural decay. The cure is the same thing that’s always separated good engineers from average ones: systems thinking. To be clear, an expert can codify the best practices in the CICD pipeline. Good requirements flowing to acceptance criteria which is implemented in a TDD fashion and observability built from the ground up will take you a long way, especially with AI. But the best engineers will still be the ones who understand the system as a whole and can make judgement calls that no amount of test automation can catch.

Ignoring the cost, there’s also a practical ceiling most teams are about to hit. The current models degrade as codebases grow. The context window that felt generous when you started becomes a constraint when your monorepo has half a million lines. The burn through your Claude credit gets high as the AI loses track of cross-cutting concerns, duplicates logic that already exists, and generates code that doesn’t account for conventions buried in files it can’t see. Context engineering - structuring your codebase, documentation, and tooling so AI can reason about it effectively - is the emerging discipline that separates teams that scale with AI from those that plateau.

Anthropic’s guide on effective context engineering lays out the key practices: scope every model call to the minimum context required rather than filling the window, use embedding based retrieval to reduce token consumption by 60-80% while improving accuracy, and separate durable state from per call context. Teams with strong context discipline using a less powerful model consistently outperform large teams with frontier models and poor context engineering. The practical techniques include AGENTS.md files that codify project conventions, architecture decision records (ADRs) that give agents the why behind design choices, and curated reference implementations that anchor agent behaviour to proven patterns.

The agentic shift changes the stakes

Everything above applies to copilot-style AI, the autocomplete and chat assistants that suggest code while a human stays in the driver’s seat. But the industry is already moving past that. Agentic AI, tools that can autonomously plan, execute multi-step tasks, run tests, and submit pull requests with minimal human prompting, is the current wave. And it raises the amplification problem to a new level.

The ThoughtWorks Technology Radar (Vol. 33) reflects this shift. The Model Context Protocol (MCP) is being described as “the ultimate integration protocol for powering agents.” New standards like Agent-to-Agent (A2A) are emerging to coordinate multi-agent workflows. Teams are experimenting with AGENTS.md files and curated shared instructions to give coding agents consistent project context. The ecosystem is moving fast.

In a copilot model, a human reviews every suggestion before it lands. In an agentic model, the AI is making sequences of decisions autonomously, choosing which files to read, which tests to run, which approach to take, and then presenting a finished result for review. The surface area for unreviewed decisions expands dramatically. A coding agent might refactor three services, update a database migration, and modify a deployment config in a single session. If your architecture is clean and your test suite is comprehensive, the agent has guardrails. If not, you’ve handed an autonomous actor the keys to compound your technical debt.

This is why context engineering matters even more in an agentic world. The ThoughtWorks Radar puts “anchoring coding agents to reference applications” in Trial, a technique where you give agents well-structured example code to follow rather than letting them invent patterns from scratch. It’s the architectural equivalent of onboarding a new hire by pointing them at the right examples. Teams that invest in clear documentation, consistent patterns, and strong CI pipelines will get far more from agentic AI than teams that just point an agent at a messy codebase and hope for the best.

The complacency risk is also amplified. When a copilot suggests a line of code, you’re likely to read it. When an agent submits a 500 line PR, nobody is reading every line, and pretending otherwise is dishonest. Writing code was never the bottleneck, defining the requirements and validating them was. Row by row code review doesn’t scale to AI volume output. What does scale is seniors doing architectural spot checks, strong CI pipelines that enforce standards automatically, and rich observability that tells you whether the code actually works once it hits production. The real validation happens after deployment: canary releases, SLO alignment, and instrumentation that closes the feedback loop between what shipped and what’s actually happening. If your SLOs are holding and your error budgets are intact, the code is working. If they’re not, no amount of pre-merge review would have caught it anyway.

AI is a senior accelerant

For experienced engineers, AI is an extraordinary force multiplier. You know what you want to build. You understand the patterns. You are aware of the trade offs and constraints. You can look at AI generated code and immediately spot when it’s taken a naive approach, introduced a subtle bug, or violated an architectural principle.

Last week I was troubleshooting a production incident where we were exhausting memory on a service. The AI assistant was able to find the problem and propose a fix, however the fix would not have addressed the root cause. The existing code was reading all records from a database table into memory and performing aggregation in memory. The AI assistant correctly identified the issue and proposed batched processing. The senior engineer doing an architectural spot check on the PR spotted that the records were still being read into memory before the batch processing, which would not have solved the problem. The engineer restructured the code to use a streaming approach that never loaded all records into memory at once. The AI’s suggestion was a helpful starting point, but it was the engineer’s experience that turned it into a real solution.

The cognitive load hasn’t decreased, it has shifted. Less time writing, more time reviewing, integrating, and making judgement calls. These are skills that come with experience. Context switching between the AI’s suggestions and your mental model of the system, evaluating trade offs, and making decisions that no amount of test automation can catch, that’s the new senior skillset. The best engineers will be the ones who can leverage AI to do the boilerplate and heavy lifting while they focus on the hard problems.

And AI helps least with the hardest problems. Task completion rates improve meaningfully for routine work, but for high complexity tasks the improvement is marginal. AI excels at the predictable, well scoped stuff, exactly the work that seniors have historically delegated to juniors. For genuinely novel architectural challenges, ambiguous requirements, or systemic debugging, the model is still just a well read assistant.

For seniors, AI takes the tedious bits off their plate and lets them focus on the hard decisions. It’s the best thing that’s happened to experienced engineers in decades.

But it’s pulling juniors backward

And here’s the hard pill to swallow, where many companies declare efficiency as their primary goal.

If AI handles the tasks that junior engineers traditionally cut their teeth on, the boilerplate, the simple CRUD operations, the bug fixes in well understood code, where do juniors learn? The apprenticeship model that has produced senior engineers for decades is built on a progression: you start with simple tasks, you make mistakes, you learn from code review, and you gradually take on more complexity. AI removes that loop.

A junior engineer can now produce code that looks senior. It passes tests, it looks good on the surface. But the engineer behind it hasn’t developed the mental model of why those conventions exist or where there may be security vulnerabilities or performance issues. They haven’t felt the pain of a poorly designed interface or debugged a cascading failure caused by tight coupling. They’ve learned to prompt, not to think how to solve problems.

Recently in a code review I asked the engineer for the reason to use DynamoDB. The data model was well structured and all fields and their types were well defined, but there was no justification for why DynamoDB was the right choice for this use case instead of a RDBMS. The engineer’s response was “that’s what the AI assistant suggested.” This is a clear example of coding assistant complacency. The engineer had no understanding of the trade-offs between different database options, and relied on the AI’s suggestion without evaluating it. This is a problem because it will lead to performance, cost and scalability issues in the future.

The most common developer frustration with AI is “solutions that are almost right but not quite 100%.” Experienced developers have the instincts to catch these problems quickly, juniors don’t, which means they’re spending more time stuck in debugging loops on code they don’t fully understand.

There’s also a trust problem, experienced developers have the highest distrust of AI output, while juniors trust it more.

The team composition elephant in the room

If AI amplifies seniors and potentially stunts juniors, the economic incentive demanded of engineering leaders is obvious: hire more seniors, give them AI tools, and watch productivity increase while costs go down. The pressure to maximise output and reduce costs right now is enormous.

The numbers show the industry is already acting on this. Employment for software developers aged 22-25 has declined nearly 20% from its 2022 peak. In the UK, entry-level tech roles fell 46% in 2024. Marc Benioff announced Salesforce would hire “no new engineers” in 2025, and Anthropic CEO Dario Amodei warned that entry-level jobs are “in the crosshairs.” A Claude Max subscription costs at time of writing 200USD per month versus 90,000USD a year for a junior developer plus six to twelve months of onboarding. The maths is seductive.

I understand this temptation, I feel it daily.

But if every company shifts their senior/junior ratio toward seniors, where do future seniors come from?

In our teams we’re running triangle shaped teams, where one senior is responsible for mentoring two to three juniors. This setup has historically been successful as the juniors have a clear path to learning and growth. With the rise of AI coding agents the intent from leadership is to invert that triangle, with more seniors and fewer juniors. Let the seniors use the AI agents for the tasks that juniors used to do and have the juniors focus on how to prompt the AI agents. While the perceived ROI is temporarily higher, this approach has long term consequences for the development of junior engineers and the future talent pipeline.

This is the classic tragedy of the commons. Every individual company benefits from investing in experienced engineers in addition to leveraging AI. But collectively, the industry destroys the talent pipeline. AI acceleration makes the temptation to skip the investment in junior talent irresistible, and the consequences will arrive soon.

The solution is to be intentional about how juniors learn, not just whether they’re on the team. Pair them with seniors on AI-assisted work, but build in deliberate friction: have juniors explain why the AI’s approach is correct before the PR is approved, not just confirm that it works. Assign them ownership of production incidents where they have to trace a failure through the system, not just fix the symptom. Create AI-free zones for foundational tasks, let them write a database migration or debug a race condition by hand before they learn to delegate it. And use AI-generated code as a teaching tool: have seniors review AI output with juniors, walking through what the model got right, what it got subtly wrong, and why. The apprenticeship model isn’t dead, but it needs to be redesigned around judgement and trade-offs rather than rote implementation.

What to do on Monday morning

AI is here and it’s not going away. The question isn’t whether to adopt it, it’s whether you have the engineering foundations in place to absorb it without breaking things. If you take one thing from this post, let it be this: AI rewards the teams that were already doing the right things and punishes bad behaviours heavily.

Practically, that means:

Fix the foundations first. Invest in test automation, CI/CD pipelines, and architectural standards before scaling AI adoption. AI on top of weak practices accelerates failure.
Treat context engineering as a first-class discipline. Structure your codebase, documentation, and tooling so AI, whether copilot or agent, can reason about your system effectively. This is the new competitive moat.
Shift validation to production. Line by line code review doesn’t scale to AI volume output. Have seniors do architectural spot checks on PRs, automate standards enforcement in CI, and invest in observability that validates code where it matters, in production. Canary releases, SLO budgets and alerts, and tight feedback loops between deployment and impact are your real quality gates.
Protect the junior pipeline. Resist the temptation to eliminate junior roles. Instead, redesign their learning path: pair them with seniors, have them investigate production incidents, and give them ownership of problems where they need to reason about trade offs, not just generate code.
Measure outcomes, not output. PRs merged and lines of code are vanity metrics. Track what matters: SLO adherence, error budget burn rate, change failure rate, time to recovery, and customer impact. The real success metrics happen after release and validation, not at merge time. If AI is increasing your output but your SLOs are slipping, you have a practices problem, not a productivity win.

These are the engineering fundamentals. But engineering practices are only half the picture. AI adoption also introduces organisational risks, runaway costs, security exposures from agentic tool chains, IP liabilities, and compliance gaps, that most companies aren’t adequately preparing for. The gap between companies adopting AI and companies actually getting value from it is widening. That’s the subject of the next post.