Most companies are getting no value from AI

There’s a common trend in the market, an engineering leader rolls out AI tools across the org, adoption numbers look great in the quarterly deck and velocity metrics are up. A few months later nobody can point to a single measurable business outcome that improved or a large initiative that got delivered.

The adoption value gap

The gap between coders and system thinkers broadens because of the practices and not about any specific tooling.

Companies that were already well run get massive returns from AI and the ones with shaky foundations just have their dysfunctions amplified.

The DORA 2025 report found that teams with high AI adoption merged a lot more pull requests, which sounds great until you keep reading: review time increased 91%, and delivery stability metrics were unchanged.

Steve Yegge calls this the “AI Vampire” effect. Engineers are burning out from unsustainable productivity expectations while the actual system level gains remain modest. “You’re damned if you do and damned if you don’t.”

When I audited AI tool usage across our teams, the picture was uneven. Licences were going unused with some people while a select few individuals accounted for most of the meaningful adoption. The teams getting value had something in common: strong development practices, clear architectural direction, engineers who understood the codebase with enough depth to be able to evaluate AI output and not just rubber stamp. The teams where AI was just generating noise were the same ones that had been struggling with late features and bugs before AI started to be adopted.

The Engineering 2028 survey of 89 senior technology leaders puts numbers on the governance side of this gap. Fewer than half are confident their organisation will have the right governance in place. Almost everyone is using AI, and half of them don’t believe they’ll figure out how to govern it properly.

Why pilots die on the way to production

Most companies have AI pilots but few have AI at scale.

Pilots are exploratory and greenfield projects executed by motivated individuals with freedom to experiment but have executive attention. Developing for Prod means integration with legacy systems, support from teams that weren’t involved with the pilot, security processes that weren’t designed with AI in mind.

If you’ve ever tried to take a POC to Prod, you know this gap. The downside with AI is that it has properties that make it worse.

AI outputs are probabilistic and pilots are built with the happy path in mind. A happy path that works in a pilot looks like a success but in Prod, at scale, the edge cases and failures are a support burden and potentially a compliance issue.

AI costs (just like any cloud based consumption mode) come up as surprise at scale. Token based inference pricing means AI assisted development practices that were cheap with 20 users can be unexpectedly expensive with 200.

We had an AI assisted feature that looked fantastic in the pilot. The team had a clean dataset and executive sponsorship. Testing was done dogfooding our data. When it went to Prod, the model started hallucinating on edge cases in customer data that the pilot never included. The 95% accuracy rate in the pilot turned into a 15% failure rate on real inputs. The team spent more time handling edge cases than they saved by building the feature. We ended up pulling it back to a limited rollout and rebuilding the data pipeline underneath it before it could scale. The lesson was painful but simple: your pilot conditions are nothing like production.

The security surface you’re not watching

AI models generate plausible code, not secure code. They’ll confidently produce SQL injection vulnerabilities and hardcoded API keys.

We caught an access control problem in a PR last quarter that would have been a compliance nightmare if it shipped. An AI generated API endpoint was returning user records based on a sequential ID parameter with no authorisation checks, i.e. any user would be able to see details of other users. The code was well structured and had good logging but didn’t verify that the requesting user had permission to see the record they were asking for. The engineer who submitted the PR hadn’t noticed because the rest of the code looked so professional. A senior spotted it during an architectural review.

If your SAST tools and developer security training haven’t kept up with AI accelerated output volumes then you have a widening gap. The breach will happen in code that nobody remembers writing because a model generated it over a weekend.

There’s also shadow AI, where employees use personal AI accounts with company data. Connecting AI assistants to internal systems without IT visibility.

We discovered that an engineer had been using a personal account to connect to our repos and generate code. It wasn’t malicious, it was just somebody trying to be more productive. However this was a data governance nightmare. The discovery only happened because an engineer noticed unusual commit metadata. If you don’t have observability into where AI is touching your data now, you’ll get that visibility the hard way.

IP and compliance exposure

Do you have a clear policy on which data can be sent to which AI systems? Most people don’t.

Customer data in a prompt to a third party model is a support ticket away from a data breach notification. Code generated by certain models may carry licence obligations you haven’t accounted for.

If you’re in a regulated industry, financial services, healthcare, anything touching EU customers, the compliance surface is even broader. AI assisted decisions can create audit obligations. Customer facing AI may trigger transparency requirements under the EU AI Act (officially entered into force on 1 August 2024 and fully applicable on 2 August 2026). Data residency and cross border transfer rules create exposure most teams haven’t mapped.

Get your legal and compliance team into your AI adoption conversation before the tools are deployed.

A leader once told me “I love compliance challenges. Everybody in the industry must comply and we get ahead of if then we have a competitive advantage over everybody else”. Organisations in regulated industries that already have strong data governance cultures that have been built through years of GDPR compliance and PII handling are finding those frameworks extend naturally to cover AI specific risks.
Our own experience has been similar. The data classification work we’d already done for compliance gave us a running start when we needed to define what could and couldn’t flow through AI tools.
Your compliance burden may actually be an asset, lean into it rather than treating governance and compliance purely as overhead.

What observability looks like now

If you’re shipping AI generated code at volume you need richer instrumentation than you’re probably used to. Just printf logs won’t cut it anymore.

It was pretty apparent from the beginning that we needed more granular tracing. If you can’t rely on familiarity with the code to narrow down a problem, then your traces need to tell the story the code doesn’t. If your system has constraints, express them explicitly in runtime checks (null or empty checks, type safety, unit precision). AI generated bugs often manifest as gradual degradation rather than hard failures, and an error rate creeping up 2% is worth a page when you don’t know the codebase as well as you think you do.

Charity Majors makes a sharp point about this in the context of AI powered observability tools: “if the question is a mathematical one, computing the answer will always be faster, cheaper, and more accurate than using AI to derive an answer.” Don’t let vendors sell you AI powered dashboards when what you need is better instrumentation. AI is useful for suggesting remediation steps and finding the needle in the haystack of observability data. But AI suggestions during an incident need the same critical evaluation you’d apply during development. It’s an opinion worth considering and not to be executed without understanding.

The bottleneck was never the code

Overused phrase but it’s true.

Coding accounts for roughly 25 to 35% of the software development lifecycle (do a VSM in your product delivery lifecycle to validate). Even if AI made coding 100% faster (it doesn’t), Amdahl’s Law tells you that’s a 15 to 25% system improvement at best. Ideation, requirements, architecture, testing, deployment, operations, incident response, is where the actual time goes. AI hasn’t meaningfully accelerated most of it yet.

“Writing code has never been the hardest part of software development.” The companies seeing real value from AI aren’t the ones that made coding faster. They’re the ones that used AI to improve the whole lifecycle. Better requirements flowing into better acceptance criteria. AI assisted testing catching regressions earlier. Observability pipelines that use AI to surface patterns humans would miss. That’s where the compound value is.

The companies stuck in the adoption value gap are optimising the 30% and ignoring the 70%, truly missing the forest for the trees.

As I described in Part 3, in our current setup we pull observability data down daily and run failure events and backtraces through AI models that generate PRs with fixes and improvements, ready for an engineer to review next working day. That’s AI applied to operations, not just coding. It reduced our incident rate from a couple a week to one every few weeks. That’s a measurable business outcome, not a vanity metric.

What actually works

The companies closing the adoption value gap in 2026 and 2027 won’t do it by adopting better tools, they’ll do it by fixing the foundations those tools depend on.

Measure outcomes, not adoption. I’ve said this in every post and I’ll say it again. Count what changed for the business, not how many people have licences. If you can’t point to a specific workflow that measurably improved (fewer incidents, faster time to customer value, reduced support burden) your adoption programme isn’t working yet. PRs merged and lines of code generated are vanity metrics. Track SLO adherence, error budget burn, change failure rate, time to recovery, and customer impact.

Fix the data before you scale the models. I know this isn’t exciting. Nobody wants to hear “clean up your data” when they could be deploying agents. But every team I’ve seen succeed started here. Every team I’ve seen fail skipped it. AI amplifies what’s in your data as much as what’s in your team.

Get governance in place now. Not when you scale and especially not when the regulator asks. Do it now. Define which data can go into which provider, which models can be used, who owns AI assisted decisions, and what your review process looks like for AI generated code. The Engineering 2028 survey shows 74% of leaders favour organisation wide governance frameworks, with team leads handling the judgement calls within those guardrails. That hybrid model matches what I’ve seen work: clear organisational boundaries with enough team level autonomy to move fast.

Update your security posture. Your SAST tools and your security training need to account for the volume and nature of AI generated code. If your security practices are the same as they were two years ago, they’re not enough.

Invest in observability like your career depends on it. Because it might. When AI generated code breaks in Prod and nobody on the team wrote it, your traces and your alerts are the only thing standing between you and a very long night.
As I covered in Part 1, the real validation happens after deployment: canary releases, SLO alignment, and instrumentation that closes the feedback loop between what shipped and what’s actually happening. Don’t think your three pillars are enough and if all you have is logs then invest (use a skill in your AI flow) in observability with wide events.

Lead from the front. Mandating tool access and championing adoption are completely different things. When leaders actively model AI use, the whole organisation’s attitude shifts. When they just approve budgets and read adoption dashboards, teams treat AI as another checkbox. AI is not just for code, it applies to your entire business.

The companies that built strong engineering foundations before the AI wave are compounding their advantages.
The companies that didn’t are compounding a different kind of debt.
The bubble is real and so is the underlying value. “The existence of froth doesn’t disprove the existence of value.”
The question is whether your organisation is set up to capture that value or just contribute to the froth.

The gap between those two groups is widening. Every quarter of inaction makes it harder to close.

I’m still working through a lot of this. If you’re navigating the gap between AI adoption and actual value, or if you’ve found something that moved the needle for your teams, I’d like to hear what’s working and what isn’t.