Wandaeps

How Docker’s Fleet of AI Agents Accelerates Development

Published: 2026-05-01 18:45:06 | Category: AI & Machine Learning

Introduction

At Docker, the Coding Agent Sandboxes team (also known as “sbx”) has pioneered a novel approach to software development: a fleet of seven AI agents that autonomously test, triage issues, post release notes, and even fix bugs. This virtual team operates entirely within CI, yet is designed to run on developers’ local machines first. The result is faster iteration, increased reliability, and a more sustainable workflow for managing a complex CLI tool.

How Docker’s Fleet of AI Agents Accelerates Development
Source: www.docker.com

The Fleet: A Virtual Agent Team

The project is built on top of Coding Agent Sandboxes, which provides secure, microVM-based isolation for running AI coding agents like Claude Code, Gemini, Codex, and Docker Agent. Each agent gets full autonomy inside its own sandbox—complete with its own Docker daemon, network, and filesystem—without affecting the host system. The team leveraged this infrastructure to create seven distinct agent roles, collectively called The Fleet.

How Skills Work

Each agent is powered by a Claude Code skill: a markdown file that acts as a role description rather than a script. The skill defines a persona, a set of responsibilities, and the tools the agent is allowed to use. For example, a skill might say, “You are the build engineer. Here’s what you know and how you make decisions.” This distinction is crucial because agents need judgment, not just rigid instructions. When a test fails unexpectedly, a script stops; a role investigates.

Local First, CI Second

The design principle behind The Fleet is deceptively simple: every skill runs on the developer’s machine first. The team didn’t start by writing GitHub workflows. Instead, they invoked the skill locally, watched the agent build binaries, exercise CLI commands, find issues, and report them. Only after tweaking the skill until it behaved correctly in the terminal did they wire it into a CI workflow.

Benefits of Local-First Development

This approach eliminates the painful “commit-push-wait-read-logs” cycle. When troubleshooting a CI-only agent, each iteration can take minutes. In contrast, a local agent runs in seconds. Developers see the agent think in real time—they observe where it gets confused, correct the skill file, and re-invoke immediately. This rapid feedback loop is a game-changer for agent development.

One Skill, Two Runtimes

CI is simply another runtime for the same skill. The /cli-tester skill, for instance, runs nightly on macOS, Linux, and Windows runners, yet it is identical to the skill invoked from a developer’s terminal. The workflow sets up the environment, checks out the code, and calls the skill—nothing more. There is no separate “CI version” and no translation layer. This consistency ensures that behavior is predictable and debugged locally before reaching CI.

The CLI Tool at the Core

Docker’s sbx is a command-line tool that manages sandbox lifecycles: create, start, stop, remove, configure networking, mount workspaces, and more. It supports macOS, Linux, and Windows. Every release requires testing across platforms, upgrade paths, and sustained load to catch resource leaks. Traditionally, these tasks would be handled by test scripts and reporting tools. Instead, the team built agent roles that handle them autonomously.

How Docker’s Fleet of AI Agents Accelerates Development
Source: www.docker.com

The Seven Agent Roles

While the original text highlights the /cli-tester role, the full fleet includes seven distinct agents, each with a specific domain:

  • Exploratory tester: Exercises the CLI commands to find edge cases and unexpected failures.
  • Triage agent: Scans the issue backlog, categorizes, and prioritizes incoming reports.
  • Release notes writer: Automatically compiles changelogs and updates from completed work.
  • Bug fixer: Investigates reported bugs and proposes patches.
  • Integration tester: Verifies that the sandbox works correctly with different AI coding agents.
  • Performance monitor: Runs load tests to detect memory leaks or slowdowns.
  • Documentation reviewer: Checks that the project’s documentation stays current with new features.

Each role uses the same skill-based architecture, ensuring consistency and maintainability.

Shipping Faster with The Fleet

The real payoff is speed. By offloading repetitive but brain-intensive tasks to autonomous agents, the human team can focus on higher-level design decisions and urgent issues. The fleet runs continuously in CI, but because it’s built from the same skills used locally, debugging and improvement happen in seconds. This blurring of the line between local development and CI is what makes The Fleet practical—and powerful.

Future Directions

The team intends to expand the fleet with more specialized roles, such as a security auditor and a dependency updater. The modular design also allows external contributors to write their own skills for integration with other tools. As AI coding agents become more capable, Docker’s approach offers a blueprint for how human and AI teammates can collaborate seamlessly.

For more details on how the sandbox isolation works, see the introduction. To understand the skill file format, revisit how skills work.