The agent fleet that scales itself
Why we use queue depth as the scaling signal — and what it means when a large refactor finishes in minutes instead of hours.
By Platform team
A fixed-size agent fleet is wrong in both directions. Idle during small tasks, it wastes resources. Undersized during large ones, it becomes the bottleneck — and a 50-file refactor that should take fifteen minutes ends up queued behind itself.
We scale on queue depth. When the number of pending subtasks crosses a threshold, the pool expands. When the queue drains, it contracts — to zero if there is nothing left to run. The signal is simple and direct: more work waiting means more agents needed, right now.
The practical result is that task duration stops scaling linearly with task size. A refactor that touches a hundred files fans out across a proportionally larger fleet and finishes in roughly the same wall-clock time as one that touches ten. The orchestrator handles dependency ordering, so agents that can run in parallel do.
The worker pool now scales out in seconds rather than minutes. We rewrote the provisioning path to remove the latency between a queue spike and the first new pod becoming ready. For tasks that arrive in bursts — a batch of issue-triggered runs, say — the fleet meets demand before the first agent has finished its first step.
