An AI state of the union: We’ve passed the inflection point, dark factories are coming, and automation timelines | Simon Willison

with Simon Willison

2 Apr 2026 11 min read 1h 18m

TL;DR

We've crossed an inflection point where AI coding agents have become reliable enough to handle production software—November 2025 was the moment when GPT-5.1 and Claude Opus 4.5 shifted from "mostly works" to "almost always works." The next frontier isn't just writing code faster, but "dark factories" where software is built and tested by AI without humans reviewing the code, though this requires inventing entirely new quality assurance patterns.

Key Moments

Simon Willison

“in November we had what I call the inflection point where GPT-5.1 and Claude Opus 4.5 came along. And they were both just ex- They were incrementally better than the previous models, but in a way that crossed a threshold. Where previously, if you had these coding agents, you could get them to write you some code and most of the time it would mostly work. But you had to pay very close attention to it. And suddenly we went from that to almost all of the time it does what you told it to do.”

Simon explains the November 2025 inflection point when AI coding became reliable enough for production use

▶ 4:35

Simon Willison

“today, probably 95% of the code that I produce, I didn't type it myself. So, that world is is is is practical already because these the latest models are good enough that you can tell them, "Oh, no, rename that variable and refactor that and and add this line there." And they'll just do it. And it's faster than you typing on the keyboard yourself.”

Simon describes how he's adopted AI code generation to the point where manual typing is now slower than agent-assisted development

▶ 14:29

Simon Willison

“I can fire up like four agents in parallel and have them work on four different problems and by like 11:00 a.m., I am wiped out for the day. Like I have cuz there is a limit on human cognition in how much even if you're not reviewing everything I'm doing, just how much you can hold in your head at one time and it's very easy to pop that stack at the moment.”

Simon explains the cognitive exhaustion of managing multiple parallel AI coding agents despite his 25 years of experience

▶ 26:39

Simon Willison

“What does it look like if you're not reviewing the code? If you're not looking at that code, but you're also not vibe coding. You're not throwing everything to the wind and seeing what happened. You're applying professional practices and quality expectations to code that you're not directly reviewing. The reason it's called the dark factory is there's this idea idea in factory automation that if your factory is so automated that you don't need any people there, you can turn the lights off.”

Simon introduces the concept of "dark factories" for AI-generated software—production code built without human code review

▶ 13:00

Simon Willison

“they had this swarm of simulated employees all in a simulated Slack channel saying things like, "Hey, could somebody give me access to Jira?" The Slack channel itself is simulated. We'll talk about that in a moment. And they 24 hours a day they're making requests and saying, "Hey, I need access to Jira." And all of those kinds of things at an enormous cost. Like, they were spending $10,000 a day on tokens, I think, simulating all of these end users.”

Simon describes StrongDM's innovative approach to QA by simulating entire user populations with AI agents running 24/7

▶ 16:18

About Simon Willison

›

Simon Willison is a legendary software engineer who co-created Django, the web framework powering Instagram, Pinterest, and Spotify. He's been at the forefront of AI's impact on software development, coining terms like "prompt injection" and popularizing "agentic engineering." With 25+ years of experience and 100+ open-source projects including Datasette, Simon is documenting in real-time how AI is fundamentally reshaping how professional software gets built.

Takeaways

The inflection point happened in November 2025 GPT-5.1 and Claude Opus 4.5 crossed a critical threshold where AI-generated code works reliably enough for production. Previous models required heavy oversight; now agents can be given high-level specifications and produce working software with minimal human intervention. This shift is reshaping what's possible in software teams.

Dark factories automate the entire development pipeline StrongDM is proving that software can be built, tested, and verified without humans reading a single line of code—by using swarms of AI agents to simulate end users and automated security penetration testing. The new bottleneck isn't code generation; it's proving quality without code review. Companies are spending $10K+ daily on token costs to replace traditional QA departments.

Human expertise becomes more valuable, not less Using coding agents effectively requires deploying 25+ years of software engineering experience to coordinate parallel agents, design the right architecture, and define quality standards. The cognitive load is so high that even expert engineers burn out after a few hours. AI amplifies existing skill rather than replacing it—the bottleneck shifts to human judgment and leadership.