Docker's Virtual Agent Fleet: A New Paradigm for CI/CD Automation

Introduction

Modern software development demands speed, reliability, and continuous iteration. Docker's Coding Agent Sandboxes (sbx) team faced these challenges head-on and created a groundbreaking solution: a virtual team of seven AI agents that autonomously test, triage bugs, post release notes, and even fix code. Known internally as the Fleet, this system operates both on developers' laptops and in CI, transforming how teams ship software.

Docker's Virtual Agent Fleet: A New Paradigm for CI/CD Automation — Source: www.docker.com

The Origin of the Fleet

The sbx project provides secure, microVM-based isolation for AI coding agents such as Claude Code, Gemini, Codex, Docker Agent, and Kiro. Each agent enjoys full autonomy inside a sandbox—complete with its own Docker daemon, network, and filesystem—while remaining completely isolated from the host system. This powerful foundation inspired the team to build something more: a fleet of specialized agent roles that run the product, triage issues, and manage releases without human intervention.

From Scripts to Agent Personas

Instead of writing traditional test scripts, the team designed agent skills using Claude Code's markdown-based system. A skill is not a rigid script of steps but a role description that defines a persona, responsibilities, and allowed tools. For example, a build engineer skill file tells the agent what it knows and how to make decisions. This distinction is critical because agents need judgment, not just instructions. When a test fails unexpectedly, a script stops—but an agent investigates.

Local-First Development for Faster Iteration

A core principle of the Fleet is that every skill runs on the developer's machine first. The team didn't start by writing GitHub workflows; they invoked the skill locally, watched it build binaries, exercise CLI commands, find issues, and report results. They refined the skill file until it performed correctly in their own terminal.

This approach eliminates the painful commit-push-wait-read-logs cycle of debugging CI-only agents. Local iteration takes seconds, not minutes. Developers see the agent think, observe where it gets confused, fix the skill file, re-invoke, and try again. The result is a dramatically faster feedback loop.

From Laptop to CI: Seamless Deployment

Once a skill works locally, integrating it into CI is trivial. CI becomes just another runtime for the exact same skill. The /cli-tester skill, for instance, runs nightly on macOS, Linux, and Windows runners—the same skill that developers invoke from their terminals. The workflow sets up the environment, checks out the code, and calls the skill. There is no separate "CI version" or translation layer: one skill, two runtimes.

This simplicity has a profound impact. The Fleet's agents autonomously handle tasks that previously required dedicated human effort: exploratory testing across multiple platforms, upgrade path validation, load testing to catch resource leaks, issue triage from a growing backlog, and release note generation. The team gains daily visibility into what shipped without it becoming a full-time job.

The Seven Agent Roles

The Fleet comprises seven distinct agent roles, each with a specific persona and toolset:

CLI Tester: Performs exploratory testing of the sbx CLI across platforms.
Build Engineer: Handles compilation, packaging, and dependency management.
Triage Agent: Categorizes incoming issues and prioritizes them.
Release Manager: Generates release notes and manages versioning.
Bug Fixer: Diagnoses and patches simple bugs autonomously.
Integration Tester: Verifies end-to-end workflows.
Documentation Agent: Keeps internal and external docs up-to-date.

These agents communicate through shared artifacts and logs, creating a virtual team that never sleeps.

The Impact on Shipping Speed

The Fleet doesn't just automate repetitive tasks—it changes the velocity of the entire development process. Traditional test scripts are brittle; they fail on unexpected changes and require manual rewrites. Agent skills, by contrast, adapt. They explore new scenarios, learn from failures, and provide richer feedback than a simple pass/fail.

Because the same skills run locally and in CI, developers can experiment with new testing strategies right on their machines. The boundary between development and CI blurs. This tight loop means that by the time a pull request lands, it has already been vetted by a full fleet of agents.

Practical Results

Early metrics from the sbx team show significant reductions in time spent on regression testing, issue triage, and release management. The Fleet catches resource leaks and platform-specific bugs that traditional suites missed. Most importantly, the system scales: adding new agent roles is as simple as writing a new skill file.

Conclusion

Docker's Coding Agent Sandboxes team has demonstrated that a virtual agent fleet can revolutionize CI/CD workflows. By designing skills as role descriptions rather than scripts, and by prioritizing local-first development, they created a system that is both powerful and transparent. The Fleet offers a glimpse into a future where autonomous agents not only test code but actively participate in the entire software delivery pipeline—shipping faster, with higher quality, and with less manual overhead.

For more details, explore the local-first design and agent roles sections above.

💬 Comments ↑ Share ☆ Save