Rain-soaked streets of London, February chill biting through coats at an anti-AI march — a crumpled flyer from Pause AI catches the eye, mocking the industry’s grand plans.
That flyer. Step 1: Grow a digital super mind. Step 2: ? Step 3: ?
South Park gnomes did it better back in ‘98, stealing underpants with dreams of profit sans plan. Now it’s AI’s turn — companies like OpenAI and Anthropic crank out models, hype transformative futures, but who fills that glaring Step 2?
Pause AI demands a pause, regulation to sort the mess. Boosters? They sprint toward salvation, glazing over the void. OpenAI’s chief scientist Jakub Pachocki calls it an “economically transformative technology” — sunny words, hazy map.
Why AI’s Step 2 Feels Like Gnomes’ Pitch Deck
Strip the PR. Two studies cut through. Anthropic guesses LLMs hit managers, architects, media types hardest — groundskeepers safe, for now. Guesses, though, pegged to lab tasks, not office chaos.
Mercor, an AI hiring outfit, tested top agents from OpenAI, Anthropic, Google DeepMind on 480 real banker, consultant, lawyer jobs. Result? Epic fails across the board.
Every agent they tested failed to complete most of its duties.
That’s the quote — brutal, no spin. Why the chasm? Claimants have skin in the game. Anthropic predicts disruption to sell safety. Hype rides coding tool speed, but strategy? Judgment? LLMs flop there.
Real world dirties it up — people, workflows resist. Rip them apart for AI? Takes time, guts businesses lack.
Vacuum sucks in wild claims. One post shakes markets. No evidence anchors us.
Here’s the unique insight: This mirrors dot-com 1999, when fiber optics and servers piled up (Step 1), profits vanished (no Step 3), and Step 2 — actual e-commerce viability — took a brutal bust to reveal. AI risks the same purge; without proving ROI in messy enterprises, it’s bubble 2.0.
Will AI Agents Ever Nail Workplace Tasks?
Doubt it soon. Studies scream failure on non-coding gigs. Banking precision? Legal nuance? Consultants thrive on human read-between-lines — AI’s blind spot.
Model makers hoard data, black-box deploys. Need transparency, real-world benchmarks. Businesses coordinate with researchers, or it’s endless guessing.
Industry bets the farm on transformation. Not yet a sure bet. Next bold claim? Recall the underpants.
Cynical? Twenty years in Valley trenches teach: Hype funds rounds, reality funds graves. Who profits now? Activists printing flyers, maybe. VCs? Until Step 2 solidifies.
Is Regulation the Answer to AI’s Step 2?
Pause AI says yes — halt till safe. But enforcement? Global mess. EU AI Act nibbles edges; US? Lobbyists feast.
Boosters cry innovation killer. Truth: Regulation forces Step 2 clarity, or we loop hype forever.
Mercor’s test isn’t outlier. Chain agents, fine-tune — still flops on edge cases. Workflows entangle; AI drops in, snarls worse.
Historical parallel bites: Telecom bust post-dot-com showed infrastructure alone flops without monetization. AI labs stack parameters (Step 1), promise AGI gold (Step 3). Step 2 demands enterprise pilots proving 10x gains — rare sightings.
Predictions? 2026 sees shakeout. Half these agents shelved; survivors niche in rote tasks. Broad transformation? 2030 earliest, if ever. Businesses hoard cash, wait for proof.
Economy hangs on promise. Skeptics like Pause AI poke holes. Smart money watches Mercor-style evals, not keynotes.
That flyer? Prophetic. Fill Step 2, or join gnomes in meme hell.
🧬 Related Insights
- Read more: Anthropic’s Back-to-Back Leaks Hand Rivals Claude Code’s Blueprint
- Read more: Synthetic Sirens: AI Podcasters Cashing In on Dating Despair
Frequently Asked Questions
What is AI’s ‘missing Step 2’?
The unclear path from building powerful models to actual profits and transformation — studies show agents failing real tasks.
Do AI agents really fail workplace jobs?
Yes, Mercor’s test on 480 tasks from top models like OpenAI’s found most duties incomplete.
Will regulation fix AI hype?
It could force evidence over promises, but global enforcement lags far behind.