What Is Auto-Scaling? A Plain-English Definition
Auto-scaling automatically adjusts compute resources based on demand. Here's how it works in practice and how OpenClaw handles it.
Auto-scaling is the practice of automatically adjusting compute resources based on real-time demand. When traffic spikes, new instances spin up to handle the load. When it subsides, instances are terminated to save costs.
How It Works
Most auto-scaling systems rely on metrics like CPU utilization, memory usage, or request queue depth. You define a target metric and a range (e.g., keep CPU between 40–70%). The orchestrator handles the rest.
OpenClaw and Auto-Scaling
OpenClaw's deployment on Fly.io inherits Fly's native autoscaling. By default, Fly monitors CPU and memory per VM and adds or removes machines as needed. You can tune the thresholds in your fly.toml or override them via the Fly Machines API.
When You Need It
Auto-scaling matters most for production workloads with variable traffic — API backends, chatbots, data processing jobs. For personal projects or internal tools with steady traffic, the default settings are usually fine.
When You Don't
Side projects with predictable or low traffic don't need aggressive auto-scaling. Fly's default behavior handles occasional traffic bursts without configuration.