You're not ready for minions: part 1

(Ironically we’re an AI company, but this post was written entirely by a human named Josh).

Stripe’s Minions posts (see part 1 and part 2) caused quite a stir back in February. Now, a lot of teams are racing to build their own version of Minions.

It is an intoxicating idea; drop a ticket onto an agentic conveyor belt and get a perfect PR out on the other side. It’s the AI software factory we’ve been promised! And, frontier models like GPT 5.5 are now good enough to make this seem feasible. 2026 is now the year of agent orchestration platforms.

“Airflow for agents” products (e.g. Oz) that let you launch armies of clanker-clones via a cron-schedule or @-mention in Github/Slack/“Wherever the team works” are everywhere now. And if DIY is more your style, you can just plug into some webhooks, wire up some durable functions, and start naming every type in your codebase some variation of xxxOrchestrator…what fun!

But there’s a problem. Most teams aren’t set up to make minions actually work, Stripe included:

Comment
by u/hronikbrent from discussion
in ExperiencedDevs

Making Minions do real work (the kind that actually moves business metrics) is hard because:

Your tickets are not ready for minions. Your minions need Linear/Jira/Whatever tasks assigned to them. But your backlog is full of tickets with cryptic titles and empty descriptions. You can’t unleash your new minion army on that Grand Queue of Ambiguity without suffering The Monkey’s Paw.
Your infrastructure is not ready for minions. Minions need to run in a container that allows them to verify their work. They need to execute your tests, especially your integration tests that require a Dockerized database. Good luck with DinD though since most cloud agent vendors just hand you a “universal image” and tell you to run shell scripts. Your minions also need to do stuff. But arming them with a nuclear stockpile of CLIs whose tokens grant them the equivalent of full admin access to everything at your company is insanity (unless you’re trying to test the accuracy of your employer’s mttr claims and/or your agent’s prompt adherence capabilities).
Your codebase is not ready for minions. LLMs are statistical pattern matchers. Every line of bad code and every bad architectural decision (whether human or clanker made) is now a prompt injection attack on your minions’ output. No, the AGENTS.md file you haven’t updated in 3 months, a linter and 100’s of snapshot “tests” you let Claude write (mostly unsupervised on auto-mode, while you did the fun part of your job) are not sufficient. That will not save you from burying your team in a mountain of “garbagio” PRs, generated by highly-parallelized minions spewing 2,000 line diffs at 100+ tokens per second. Minions are like junior engineers with ADHD, amnesia and jetpacks. The quality vector of your codebase matters because, unless chaos engineering is your goal, you want all the little yellow guys pointed in the right direction (towards higher, not lower quality).
Your team is not ready for minions. Reviewing 10+ PRs a day at 2k+ lines each isn’t sustainable for humans. Humans can’t stand at the end of the clanker PR-firehose and keep their sanity. AI code review isn’t a panacea. It doesn’t reduce work; it adds more comments humans have to read (the value is that it catches things the humans missed). Some teams have resorted to becoming machine priests who never look at code anymore. That isn’t the answer either. You need to shift the work left.
Your coding agent harness is not ready for minions. It’s a vibe-coded, Ikea jetpack that only runs on a single brand’s jet fuel (Claude, Codex etc). Context is an agent’s most precious resource, especially on long running async tasks. So why would you strap your minion into a harness whose system prompt is full of junk about stuff you don’t use (e.g. Three.js)? You’re paying for those tokens, so your harness should contain only the context and tools you actually want.
Your budget is not ready for minions. Minions incur inference costs at API rates, not the massively subsidized subscription rate you’ve become used to. Most engineering teams aren’t prepared for the sticker shock, or the “what’s the ROI on our AI spend?” discussion with the CFO that follows if you only use the biggest, most expensive frontier model for every minion-task. There’s a Pareto frontier of cost/performance with open-weight models that’s much more budget friendly. But you need to solve your harness problem 1st in order to access it.

Overcoming these hurdles is a lot of work. But async cloud agents are here to stay. And, every newly minted minion-overlord has to start somewhere. So start with fixing your tickets. The rest of this post focuses on that.

The first step to becoming an AI overlord is having a plan

You’re probably aware that the cheapest place to fix AI slop is the plan. You start most local coding sessions in plan mode and iterate with your agent until it’s up to par (if not, go read the linked post).

Working with minion-style cloud agents is a bit different. The ticket is the plan. Unlike pairing with an agent locally, this is your last chance as a human to inject context, course correct and apply “taste” before your minion fires up their jetpack and flies off to open a PR all by themselves. To give your minion the best chance of success on their voyage, you need a good ticket.

That means your tickets need to change from looking like this:

ENG-1247·Todo·MediumSteve

Fix the Bingo service

Steve-o saw a bunch of Sentry errors in our Bingo service while he was on-call. He acknowledged them all in PagerDuty. It might be a race condition.

bugbackendopened 20d ago

To something more like this:

ENG-1247·Todo·HighSteve

Fix race condition in OrderProcessor double-charging customers within the Bingo service

Context

We found a race condition in OrderProcessor (see src/workers/orderProcessor.ts) where two queue workers can pick up the same order_id due to duplicate messages being emitted when the user sees a good deal, gets excited and mashes “buy now” repeatedly in the UI. Repro in #eng-payments thread; happens roughly 1 in 5,000 orders under peak load or inexplicably 4,999 in 5,000 orders if the user is named Chad.

Reproduction

Run the Bingo service locally
Change your username to Chad
Click Buy Now several times
Your local charges table should have multiple entries.

Acceptance criteria

Add a test in src/workers/orderProcessor.tests.ts that fails on main and demonstrates the duplicate charges bug. Add another for the Chad bug.
Make the tests pass
Add OTEL telemetry for an order.confirmation.a_chad_mashed_the_button metric so we can alert if Chad gets too excited.
No new dependencies; keep changes inside src/workers/.

Out of scope

Refactoring the broader worker base class — tracked in ENG-1190.
Backfilling already-double-charged orders — finance is handling manually.

bugbackendpaymentsopened 2d ago

Writing detailed tickets like this by hand is time consuming. As a tech-lead, engineering manager and director I spent a lot of time writing tickets to give humans on my teams the context they needed to succeed. But the investment was worth it. A good ticket often saved hours, sometimes days of engineering effort (especially for junior folks). With coding agents, the benefits are similar, a good ticket prevents hours of rework.

The great news is we don’t have to write tickets 100% manually anymore. Instead you can:

Create a plan locally with your agent
Use a free, open-source tool like PlanBridge to refine the plan locally in a nice Docs-like UI.
Have your agent sync the plan to your task management system using a CLI or MCP server (e.g. via the Linear MCP server). You can do this directly with a prompt like “sync my plan to Linear issue ENG-1247” or create a reusable Skill to sync plans after approval.

If you’d like to try PlanBridge you can get up and running for free in less than a minute here:

Install PlanBridge

In a future post we’ll cover getting your infrastructure ready for minions-style cloud agents (Docker-in-Docker, safely equipping minions with tools, why CLIs are over-hyped and more)