Structuring engineering teams for AI first development

On March 5, 2026, Amazon.com went down for six hours. The root cause was a faulty deployment linked to AI assisted code changes. Within a week, Amazon’s SVP of engineering called a company wide meeting and mandated that junior and mid level engineers must now get senior sign off before deploying any AI generated code to production. The problem wasn’t that AI wrote bad code. It was that the deployment pipeline wasn’t built for the speed and volume at which AI produces changes. Review processes couldn’t keep up, so teams skipped them.

This is the defining team structure challenge of 2026.

In Part 1 I argued that AI amplifies what already exists in your engineering team. In Part 2 I covered the people side: hiring, mentoring, and onboarding. This post covers the structural side: how to model teams, rethink the value of reviews, and manage the hybrid human and agent workforce.

Modelling engineering teams for AI first development

Gartner predicts that by 2030, 80% of large engineering teams will be reorganised into smaller, AI augmented units.

Here’s how I’m thinking about it:

Design for dual track
The CTO Craft/Damilah Engineering 2028 survey of 89 senior technology leaders found several organisations moving toward a bifurcated model. Core teams manage high stakes, regulated systems where stability and compliance come first. Edge teams are small, cross functional units (a PM, a designer, two AI augmented engineers) building micro apps at speed. Core teams optimise for safety and Edge teams optimise for speed.

Distribute AI capability, don’t silo it
The worst thing you can do is create an “AI team” separate from your product teams. AI expertise (prompting, knowing model capabilities, evaluating output quality) needs to be distributed across every team, not concentrated in one.
This applies to tooling too. As AI tools multiply, the Engineering 2028 survey shows organisations leaning toward a standard set of approved tools rather than letting every team pick their own. A governed core with space for controlled exploration at the edges.

Invest in platform engineering
The teams getting the most from AI are the ones who fixed their platform first. If your internal developer platform is poor, AI tools will generate code faster that still gets stuck in your broken deployment pipeline. Platform quality directly determines how much value you capture from AI.

Invest in context engineering and staff for it
In Part 1 I covered context engineering in detail: the practices, the tooling, and why it separates teams that scale with AI from those that plateau. Someone on your team needs to own this as a first class responsibility. In most teams, that falls to platform engineering. They already own the developer experience, and context engineering is a natural extension.

Plan for the hybrid workforce
Hybrid is no longer remote vs in-office, it’s humans with agents. The key question is where the boundary sits. Which tasks can agents own end to end? Where do they need human checkpoints? This needs to be defined explicitly rather than discovering it through incidents. Start by mapping where agents are already operating autonomously (you may be surprised by what access you have provided them), then work backwards to establish the oversight model you actually need.

In my current company we’re restructuring from our traditional senior:junior ratios (1 senior to 3 juniors) towards edge teams. The early reality is that the PM is quite adept at generating code and the engineers can rebuild capabilities in a modern tech stack from an existing legacy codebase very quickly.

Encode your standards where AI agents can enforce them

This is the idea most engineering leaders are sleeping on.

For decades, engineering standards have lived in places that stopped being useful long ago. Confluence pages last updated in 2022, onboarding docs describing an architecture two migrations behind, senior engineers who remember why things are done the way they are (if they still work there). Nobody reads the wiki, the onboarding deck is wrong, tribal knowledge decays every time someone leaves.

AI agents change this, your coding standards, architectural principles, testing requirements and security policies can now be expressed as agent instructions: AGENTS.md files, SKILL.md files that package reusable capabilities, and project level configs. All enforced at generation time, not caught after the fact in review.
Every major agent tool now has a configuration surface for this: CLAUDE.md, .cursorrules, copilot instructions, and the emerging SKILL.md standard. The specific file matters less than the practice of codifying your standards where agents will actually read them.

Here’s what this looks like in practice. A snippet from one of our Claude.md files:

1
## Data access
2
- Always use the repository pattern. No direct database queries in controllers.
3

4
## Observability
5
- All service methods must include OTel spans with operation name, tenant ID, and correlation ID.
6
- Log at INFO for business events, WARN for recoverable failures, ERROR only for unrecoverable failures.
7

8
## Testing
9
- Every new API endpoint needs a contract test before merge.

An agent following these rules will apply them consistently across every PR. No cultural buy in required, no changes caught late by sonar in the CI pipeline.

Your testing strategy, API design conventions, observability, can all become agent enforced defaults.

Tribal knowledge that used to live in senior engineers’ heads now lives in AGENTS.md files, CI configs, and shared prompts.

The “extract knowledge, then cut headcount” playbook is failing where it’s been tried most aggressively.

Klarna replaced 700 employees with AI is now rehiring humans. The CEO publicly admitted the approach produced “lower quality” results.
Duolingo declared itself “AI first” in 2025, then the CEO retracted claims that AI would replace the workforce.
A Harvard Business Review study found that 60% of executives made headcount reductions in anticipation of AI efficiencies, but only 2% made large cuts as a result of actual implementation.

In our current setup we are codifying our coding standards and non functional requirements (OTel instrumentation with lots of execution context and metadata) in Claude friendly format. All the Jira/PR ceremony is handled by PreToolUse hooks and skills.
We pull observability data down on a daily basis and run the failure events and backtraces through AI models that in turn generate PR’s with fixes/improvements ready for an engineer to review next working day.
This has improved stability and reduced incidents from a couple a week to one every few weeks.

Rethink how review works

If seniors are reviewing AI output, mentoring juniors, doing architectural spot checks, s̸̲̈́ţ̷̉ī̵̟l̷͔̇l̷̦̕ ̵͓̃s̶̙͝h̸͕̑i̵̲̓p̵̱̂p̴̻͌i̴̞̊n̶̮̎g̵̣̅ ̶̧̈́t̶͙͂ḣ̵͚ê̵̢i̴̪͠r̸̛̮ ̴̺͝ỏ̷̬ŵ̷̲ń̷͔ ̴̳̈w̴͍͊ŏ̶ͅr̵̤̉k̶͎̓,̴̡̌ ̸̎ͅt̴̫̒h̸̩̀ą̴̌ṯ̸̌’̴̂ͅs̶̬͘ ̵̲̕a̸͎͊ ̷͎̀c̴͔̓ǎ̶ͅp̷͎̍a̸̪͂c̸͚̿ị̷͐t̴̨̽y̴̠̌ ̸̖͊c̸͔͐r̴̈͜i̷͕̽s̸̪̄i̵̢̅s̵͔͆, not a team model. Line by line code review doesn’t scale to AI output volumes.

It takes 60 seconds to prompt an agent to generate a PR, and it may take days for a senior to review it. This lead time is already overwhelming open source maintainers. curl scrapped its bug bounty because fewer than 5% of AI submitted security reports were legitimate.

What works instead
Seniors do architectural spot checks and product fit rather than reading every line. Hooks, skills, CI pipelines enforce standards automatically and observability validates code in production.
The goal is a review process where senior time goes to judgement calls, not syntax.

The vibe coding wildcard

The structural challenges above assume professional engineering teams. There is one more that doesn’t fit that frame: non engineers shipping code.

“Vibe coding”, describing what you want in natural language and letting AI generate working software, has dramatically lowered the barrier to building, similar to what I’ve seen with WebForms introduction 20 odd years ago. Founders are prototyping entire products without an engineering team and product managers are building features. Designers are shipping components directly to production.
Some of this is genuinely exciting when used for prototyping and rapid validation and others argue it is watching a train about to hit a tunnel painted on a mountain.
While the barrier to entry has been lowered, the ceiling of complexity is still very high.

The problem isn’t that non engineers are building things, it’s that they’re building in production environments, on production databases, with production API keys without the mental models experienced engineers bring to those decisions.
They don’t think about SQL injection or worry about rate limits until they’ve hit them. All of this can be mitigated with standards and governance from a few paragraphs above.

Vibe coding in your organisation is shadow AI in its most tangible form. It’s happening in Slack, in Notion, in someone’s personal Cursor window connected to a production API key.

As an engineering leader, you have a few choices. You can try to control it (you won’t), you can ignore it and find out later what it cost, or you can get ahead of it.

What that looks like in practice:

Clear scope boundaries
Define what can be built outside of engineering review and what can’t. Internal tools that read data? Probably fine with guardrails. Anything that writes to a production database, processes customer PII, or exposes an external API needs engineering involvement. Make the policy explicit and easy to find.

Sandboxed environments
Give vibe coders somewhere safe to build. Isolated environments with synthetic data, limited API access, and no path to production. The cost of a sandbox is trivial compared to the cost of a prototype that accidentally went live.

A fast track hardening process
When a prototype shows real value, you need a defined path from “works on my laptop” to production ready. Security scan, architectural review, data access audit, and a handoff to an engineer who can harden it. If this process is too slow, people will skip it. Design it to be fast enough that the right path is also the easy one.

Clear ownership
Someone needs to own governance of vibe coded projects. Whether that’s engineering, security, or a dedicated function depends on your organisation. What matters is that it’s someone’s job to know what’s being built, where it’s running, and what data it touches.

Even Andrej Karpathy, who coined the term “vibe coding,” has moved past it. In early 2026 he reframed the practice as “agentic engineering,” emphasising that AI assisted coding still requires professional oversight. The creator of the concept arrived at the same conclusion: you can’t stop it, but you can’t leave it ungoverned.

Where this goes

None of this works if your deployment pipeline is broken or your security posture is an afterthought. AI tools will happily generate code faster and multiply all of those problems. Fix the foundations first.

The engineering leadership role is shifting. It’s becoming as much about agent permissions, context engineering, review models and encoded standards as it is about technical delivery. Many organisations are already collapsing CTO and CPO into a single CPTO role because the line between product and technology decisions is disappearing.

The research on whether AI actually makes engineers faster is also more contested than it first appeared. METR’s 2025 randomised controlled trial found AI increased completion time by 19% among experienced open source developers. A follow up in early 2026 identified serious selection bias: developers who most valued AI opted out of the no AI condition, and 30 to 50% avoided submitting tasks they preferred to do with AI.
METR now acknowledges the true speedup could be much higher among the developers and tasks that were selected out. The productivity picture is genuinely unclear. If your engineering metrics are still lines of code, PR count, or story points, AI will inflate those numbers while actual customer value stays flat or goes down. Measure end to end outcomes, from idea to customer impact.

AI in engineering is moving very fast with lots of pivots (Ralph Wiggum? MCP who?). I’m still figuring this out. If you’re experimenting with different team models or review structures in an AI first world, I’d like to hear from you what’s working and what isn’t.