Auto-scaling is the practice of automatically adjusting compute resources based on real-time demand. When traffic spikes, new instances spin up to handle the load. When it subsides, instances are terminated to save costs.

How It Works

Most auto-scaling systems rely on metrics like CPU utilization, memory usage, or request queue depth. You define a target metric and a range (e.g., keep CPU between 40–70%). The orchestrator handles the rest.

OpenClaw and Auto-Scaling

OpenClaw's deployment on Fly.io inherits Fly's native autoscaling. By default, Fly monitors CPU and memory per VM and adds or removes machines as needed. You can tune the thresholds in your fly.toml or override them via the Fly Machines API.

When You Need It

Auto-scaling matters most for production workloads with variable traffic — API backends, chatbots, data processing jobs. For personal projects or internal tools with steady traffic, the default settings are usually fine.

When You Don't

Side projects with predictable or low traffic don't need aggressive auto-scaling. Fly's default behavior handles occasional traffic bursts without configuration.

What Is Auto-Scaling? A Plain-English Definition

How It Works

OpenClaw and Auto-Scaling

When You Need It

When You Don't

Deploy OpenClaw in under a minute