DoorDash Shipped the AI-Code Miracle, and Engineers Get the Invoice
Silicon Valley found a faster way to create legacy code.

DoorDash now says “well north of half” its code, probably closer to two-thirds, is written by AI. Same call, same breath, the CEO admitted the company still has to figure out what AI productivity means for workflows and team structure, then asked the only sane question in the room: are customers getting better outcomes? There it is. The wreckage, already labeled as progress. More code. Faster code. Unclear ownership. Unproven outcomes. A delivery company with 933 million quarterly orders is bragging about machine-written software while moving toward a single global technology platform across payments, fraud, support, subscriptions, merchant tooling, and logistics. Apparently the future of software is a global replatforming performed with a leaf blower.
The Magic Trick Has a Burn-Down Chart
The story executives want is simple enough for an investor slide.
AI writes the code. Engineers review the code. Features ship faster. Teams flatten. Customers benefit. Margins improve. Everyone claps, except maybe the people on call, but those people have Slack muted during earnings calls anyway.
DoorDash’s version comes wrapped in caution, which almost makes it worse. The company says AI is boosting productivity, but also says productivity alone does not explain how workflows or teams should change. That admission belongs next to every AI productivity graph. Translation from boardroom to repo: we have accelerated output before deciding how accountability works.
Stunning process maturity. Ship the factory, then debate gravity.
You know what makes the whole thing ridiculous? The claims are all sitting on top of each other. DoorDash is pushing production traffic through a new global platform across three marketplace brands while admitting much of the functional product work remains ahead. Its Q1 release puts that platform work across payments, fraud, support, subscriptions, merchant tooling, and logistics.
Which sounds tidy in a quarterly release and much worse inside the repo. A platform touching fraud, payments, support, merchant tools, subscriptions, and logistics is basically a machine for turning small assumptions into expensive ambiguity. One unclear ownership boundary can become a payout delay. One copied eligibility rule can become a fraud exception nobody trusts. One “shared” service can become the place every team depends on and no team wants during an incident.
Nothing says operational excellence like letting autocomplete help unify fraud and logistics across continents.
The Productivity Claim Starts Leaking Under Light
METR gave the productivity story the kind of test it usually avoids: experienced open-source developers, mature codebases, real issues, repositories they already knew. With AI tools allowed, they took 19% longer to finish issues. Beforehand, they expected AI to make them 24% faster. Afterward, they still believed it had made them 20% faster.
Read that again like someone who has reviewed generated code after midnight.
The tool slowed them down, and the users still felt faster.
Of course they did. AI makes motion feel cheap. The screen fills, which feels better than staring at the empty part. A plausible diff appears before your coffee cools. The branch grows. The author feels unblocked for a while. Then review starts, and all that speed turns into suspicion with syntax highlighting.
Mature systems punish plausible work. Typing the code was never the expensive part. The cost lives in knowing which change should never enter the repo, spotting the invariant buried in an old test name, preserving the weird behavior finance accidentally depends on, and changing one endpoint without waking up a batch job nobody owns plus a compliance export nobody wants to discuss. Faster first drafts do not make that work go away.
AI is excellent at creating more material for a tired reviewer to distrust.
The Review Step Is Where the Fantasy Hides
A Sonar survey reported 72% of developers use AI tools daily, with AI helping write up to 42% of committed code. Buried in the same story is the number that should make every engineering leader stop smiling: 96% said they do not fully trust AI-generated code to be functionally correct, yet fewer than half review it before committing.
That should have ended the panel discussion.
Instead, companies will call it enablement.
Review is where the cost moves. The code arrives faster, so the bottleneck shifts to the person who has to understand whether the code belongs in the system at all. The reviewer now has to parse a 1,200-line PR with perfect formatting, weird abstraction names, a test suite padded with happy-path theater, and a helper function duplicated from another service because the model found the old pattern and copied its smell.
The generated code looks calm. Very professional. Like a hotel lobby built over a sinkhole.
Then someone leaves “minor comments”, because the team is behind the milestone. Someone approves with a nervous “follow-up?” Someone calls the abstraction temporary, and temporary does what it always does: walks permanent code through the front door wearing a visitor badge and pretending not to know where the documentation lives.
Two sprints later, the service nobody owns starts timing out. The incident note says “unexpected interaction between legacy migration path and new eligibility logic,” because writing “we approved a haunted diff” sounds harsh in the postmortem.
Production Does Not Care Who Typed It
The pager does not care whether code came from an engineer, a model, a prompt, or a senior director’s belief in leverage.
Production asks simpler questions.
Who understands the failure mode? Who can roll it back? Who knows whether rollback corrupts downstream state? Who checked the migration path? Who reviewed the generated SQL? Who noticed the permission boundary changed? Who owns the service after the team moved to the new platform org?
Generated code brings the same operational debt as any other code, only now it can arrive at industrial speed. Ownership has to exist. Threat models have to hold. Migration plans need rollback drills, audit trails, load tests, deploy sequencing, and someone with enough spine to say no when the diff looks impressive but the system shape gets worse.
The expensive failure mode is executives treating generated lines as productivity while starving the human work required to make those lines safe.
GitClear’s 2025 code-quality work found spikes in duplicate code blocks, short-term churn, and a decline in moved lines, meaning less reuse and refactoring. The maintenance bill is forming in real time. More copy. More churn. Less deliberate reshaping. The repo gets heavier while the slide says velocity improved.
The Lie Becomes Headcount Math
After leadership buys the miracle, every staffing conversation gets poisoned.
Why does the platform team need more people? AI writes code now.
Why is security review taking so long? AI made the implementation faster.
Why does the migration need another quarter? The code already exists.
Why are incidents up? Must be execution.
The blame lands exactly where it always lands: on the engineers closest to the damage. The reviewer missed it. The staff engineer was too negative. The infra team blocked progress. Security created friction. SRE lacked urgency. Nobody says the metric was garbage from the start.
DoorDash at least asked whether more shipped code improves customer outcomes, which is a useful question to ask before everyone runs off with the output number. Follow it all the way down. If two-thirds of code is AI-generated, two-thirds of the future maintenance surface may have arrived through a tool with no memory of the architecture, no pager, no shame, and no career-limiting fear of making fraud worse during a global platform transition.
A staff engineer will eventually explain this risk in a meeting after leadership has already sold the upside. The wording gets careful because nobody wants to sound emotional, let alone like the person blocking AI transformation over something allegedly minor, such as a cursed chargeback path in one market. “Operational readiness” shows up. “Ownership clarity.” “Verification load.” Maybe “audit exposure,” if the room still has oxygen.
Everyone nods.
The deadline remains.
The Invoice Has Line Items
The bill arrives in places the launch post will never mention.
It arrives when security asks who reviewed the generated permission logic and the room gets very interested in process. It arrives when the migration stalls because two “shared” helpers disagree about the same customer state. It arrives when rollback technically exists, except nobody tested what happens after half the traffic has already written new data. It arrives when support starts seeing cases engineering insists should be impossible.
That is the part executives never price into the miracle. Generated code still has to be owned by someone with a name, a calendar, and a pager. It has to survive audit. It has to survive incident review. It has to survive the next team trying to delete it six months later and discovering three product flows depend on the accident.
Two-thirds AI-generated code can look like leverage from the top.
From the repo, it looks like debt with better typing speed.
