The Model Is Only 10%: The Real Lesson of the New SDLC

📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent Google whitepaper emphasizes that in AI-assisted coding, the model itself is only 10% of the system. The focus should be on harness design and context engineering, which have a much larger impact on performance and costs.

A new Google whitepaper published in early 2026 states that the model used in AI coding agents accounts for only about 10% of the overall system behavior. The primary lesson is that harness design and context engineering are the real determinants of performance, cost, and reliability in AI-assisted development, not the size or sophistication of the model itself.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, argues that the prevailing focus on acquiring the latest AI models is misplaced. Instead, the paper emphasizes that 90% of the behavior of an AI agent depends on the harness—the prompts, rules, tools, and observability layers surrounding the model. This is supported by experiments showing that a single team improved their agent’s performance by only changing the harness, with the model remaining constant.

The authors introduce the concept of agentic engineering, where AI is embedded within a structured framework of verification, testing, and guardrails, contrasting with vibe coding, which relies on minimal prompts and quick fixes. They also highlight that costs associated with AI development are driven more by how the harness is built and maintained than by the model’s complexity, with ad-hoc prompting becoming more expensive over time.

At a glance
reportWhen: published early 2026
The developmentThe new Google whitepaper highlights that in AI-driven software development, the model accounts for just 10% of system behavior, shifting focus to harness and context engineering.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Why Harness and Context Engineering Are Game Changers

This shift in focus matters because it redefines where organizations should invest resources for AI development. Instead of chasing newer, larger models, companies should prioritize building robust harnesses and context management systems. This approach can reduce costs, improve reliability, and give organizations a durable competitive advantage, as their custom configurations and frameworks are less likely to be overtaken by model upgrades.

Furthermore, understanding that verification and judgment are the new craft in AI development underscores the importance of human oversight and structured testing, which are critical for deploying AI safely and effectively at scale.

Amazon

AI development harness design tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

The Evolution of AI Development Practices

Since early 2026, the AI development landscape has been shifting from a focus on model size and raw performance towards system design and configuration. The whitepaper builds on earlier observations that AI adoption is widespread, with 85% of developers using AI coding agents regularly and 41% generating code primarily with AI. Prior to this, emphasis was placed on acquiring the latest models; now, the emphasis is on how those models are integrated and controlled.

This development aligns with broader trends in software engineering, where the emphasis on verification, testing, and structured workflows has increased, especially in safety-critical applications.

“The model is only 10% of what determines behavior; the harness is 90%. Focus on configuration, tools, and context.”

— Addy Osmani

Amazon

AI prompt engineering tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Aspects of Harness Design Remain Unclear

It is not yet clear how different industries will adopt these principles at scale or how quickly organizations will shift their investment from models to harnesses. The long-term impact on AI development costs and safety protocols is still being evaluated, and practical implementation guidance is evolving.
Amazon

AI observability and testing software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI Development and Adoption

Organizations are expected to begin prioritizing harness development and context engineering in their AI workflows. Future research and industry practices will likely focus on creating standardized frameworks, tools, and best practices for building durable, cost-effective AI systems. Additionally, further empirical studies are anticipated to quantify the benefits of this approach across different sectors.

Monitoring how AI vendors and enterprise teams adapt to this paradigm shift will be crucial in understanding the full impact of the new SDLC framework.

Amazon

AI model verification tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system behavior?

The whitepaper argues that the model itself provides only the core generation capability, while the surrounding harness—including prompts, rules, tools, and oversight—controls the actual behavior and performance.

How does focusing on harness design reduce costs?

Building a robust harness minimizes unnecessary token usage, reduces maintenance costs, and improves reliability, making AI deployment more cost-effective over time.

What is agentic engineering?

It is an approach that embeds AI within structured workflows, verification, and guardrails, emphasizing systematic configuration over minimal prompting.

Does this mean larger models are obsolete?

Not necessarily. The whitepaper suggests that while larger models offer capabilities, their advantage diminishes unless paired with well-designed harnesses and context management.

What should organizations do now?

They should evaluate and improve their harnesses—including prompts, tools, and verification processes—and shift focus from model size to system configuration and oversight.

Source: ThorstenMeyerAI.com

You May Also Like

Surface Laptop Ultra

Microsoft announces Surface Laptop Ultra, a high-performance device with NVIDIA RTX GPU, up to 128GB RAM, and advanced features for creators and AI builders, launching later this year.

Disaster Recovery Basics: What to Save Besides Files

Inevitably, knowing what to back up beyond files is crucial for comprehensive disaster recovery—discover essential steps to safeguard your entire system.

Epic is working on a ‘ground-up rebuild’ of its launcher that will be 5x faster

Epic Games announced a ground-up rebuild of its launcher, promising five times faster startup and improved performance, with a private beta planned.

Docking Station Compatibility: USB‑C vs USB4 vs Thunderbolt

Navigating docking station compatibility can be tricky; understanding USB‑C, USB4, and Thunderbolt is essential to choose the right one for your device’s needs.