I keep having the same conversation.
A VP of Engineering calls me. Their exec team just got offstage at the company’s annual conference. Standing ovation. They demo’d “autonomous AI agents” that would transform customer workflows. Stock bumped 3%. Press coverage. LinkedIn posts with fire emojis.
Then someone walked backstage and said four words:
“Ship it by Q3.”
And now the engineering team is staring at a prototype held together with curated data and scripted inputs, wondering how to turn a controlled demo into something that works when real users touch it.
The gap between what gets promised on stage and what it takes to actually deliver a reliable agent is the most dangerous disconnect in enterprise tech right now.
The Data Is Catching Up to the Hype
An S&P Global survey of over 1,000 IT leaders found that the percentage of companies abandoning the majority of their AI initiatives nearly tripled in one year.
The Cliff of AI Initiative Abandonment
From 17% to 42% in a single year
Credit: S&P Global
Nearly half of all AI POCs are dying on the vine.
Not because the tech doesn’t work. Because the infrastructure doesn’t exist.
95% of “Agent” Products Are Rebadged Software
Gartner coined a term for what’s happening: agent washing. Of the thousands of vendors claiming agentic AI capabilities, roughly 95% are rebadged existing software.
Each square = 1% of "agent" vendors
Credit: Gartner
The Agent Washing Field
Credit: Gartner
When your CTO comes back from Dreamforce convinced agents are a solved problem because every vendor on the floor said so, the false confidence cascades into timelines, staffing, and customer promises your engineering team then has to live with.
The Demo Worked. Production Won’t.
A tech executive goes onstage. The agent performs beautifully: summarizing customer data, generating reports, even making a recommendation. The audience applauds.
Then the executive walks backstage and tells their product and engineering teams to deliver this capability across every workflow. By next quarter.
The teams are thinking: That demo ran against 50 curated test cases. We have 50,000 edge cases in production. These are not the same thing.
On Stage
50
curated test cases / hand-picked data / scripted inputs
In Production
50,000
edge cases / messy real-world data / unpredictable users
Enterprise Customers Give You One Shot
Getting a Fortune 500 company to adopt a new AI capability takes months of relationship building, security reviews, and proof-of-concept work.
When the agent finally goes live, the team gets one shot.
If the agent hallucinates on a customer’s first interaction — if it surfaces wrong data, makes a nonsensical recommendation, or loses context mid-conversation — that customer doesn’t file a bug report and wait for v2. They shut it off.
“We don’t get an alpha and a beta with these customers. We get one shot. And right now I can’t tell you with any confidence what will happen when we take the guardrails off.”
— VP of Engineering, enterprise AI company
Agent Reliability Has a Formula
And your denominator is killing you.
High numerator + managed pressure
Reliable agent that compounds daily. Every failure is a lesson.
Low numerator + high pressure
Brittle agent that embarrasses the company after being hyped on stage.
Investing in evals feels slower upfront.
It’s the only thing that makes you faster.
Continue reading
Get the full newsletter, free.
Join founders and builders who read Self Aligned every week.
Same Agent. Different Users. Wildly Different Results.
One of our customers discovered that agent performance wasn’t uniform across their user base. The same agent performing at 90%+ accuracy for one segment was hitting 40% for another.
The Performance Canyon
Credit: Clarity API customer data
That’s not a model problem. That’s a context and evaluation problem.
And it’s invisible without the right infrastructure.
The People Who Sang the Land Into Being
Aboriginal Australians navigated 7.7 million square kilometers of continent for over 65,000 years without a single written map, compass, or instrument.
They used songlines.
0 years
of navigation without written maps
A songline is a path across the land encoded in a song. Each verse describes a landmark, a waterhole, a ridge, a sacred site. To navigate, you sing. The rhythm and melody carry the topography.
Songs could be traded between groups who spoke completely different languages. Because the knowledge wasn’t in the words. It was in the structure.
Western mapmaking separates the map from the territory. You make the map once, print it, hand it out. The map is a product. The territory is somewhere else.
In the songline tradition, the map and the territory are the same thing.
You can’t separate knowing the land from traversing it. Understanding isn’t a deliverable. It’s a practice.
I keep thinking about what we lose when we try to turn understanding into a deliverable.