Vertical Scaling
Every system starts with one server. Understanding where it breaks, and when to fight that limit versus redesign around it, is the foundation of every scaling conversation.
What Is Vertical Scaling?
Vertical scaling means upgrading your existing server to a more powerful one: more CPU, more RAM, faster disk, and usually a bigger bill. No new machines, no distributed coordination. Just a bigger box.
The opposite, horizontal scaling, adds more machines and distributes the load across them. That comes later.
Vertical scaling is the right first move for most systems. It is simpler to operate, easier to reason about, and usually cheaper than introducing distributed complexity too early.
The CPU Ceiling
A server has a fixed amount of capacity. Each incoming request consumes some of it. As traffic climbs, so does CPU utilization. Once you get close to saturation, latency rises sharply, queues form, and eventually requests fail.
The relationship is close to linear until you get into the danger zone. After that, the system feels much worse than the graph looks.
Instance Type Comparison
| Instance | vCPU | RAM | Cost/mo | Max req/s |
|---|---|---|---|---|
t3.micro | 2 | 1 GB | $8.5 | ~500 |
t3.small | 2 | 2 GB | $17 | ~1000 |
t3.medium | 2 | 4 GB | $34 | ~2000 |
t3.large | 2 | 8 GB | $68 | ~4000 |
t3.xlarge | 4 | 16 GB | $136 | ~8000 |
These numbers are modelled for this simulator, not copied from AWS benchmarks. CPU-heavy or memory-bound workloads will hit their ceiling differently.
Architecture: Before and After
Here is the basic traffic flow for a single-server stack:
After upgrading the server:
Dependency Structure
This dependency graph shows how a monolithic deployment still hides multiple layers inside one box:
D2 System Diagram
Interactive Architecture
This is a read-only React Flow canvas that uses the same custom node components as the simulator:
Amdahl's Law
Theoretical speedup from more compute is bounded by the part of the workload that cannot be parallelized.
For a workload where fraction is parallelizable and you add processors:
If 80% of your work is parallelizable and you add 8 cores:
Not 8x. The serial work is still your bottleneck.
Amdahl's law is one reason vertical scaling has a ceiling. Once a single-threaded bottleneck dominates, adding more compute stops buying you much.
Code: Detecting the Ceiling
import os from "os";
export function getCpuLoad(): number {
const cpus = os.cpus();
const total = cpus.reduce((sum, cpu) => {
const times = Object.values(cpu.times);
return sum + times.reduce((a, b) => a + b, 0);
}, 0);
const idle = cpus.reduce((sum, cpu) => sum + cpu.times.idle, 0);
return Math.round(((total - idle) / total) * 100);
}
export function healthStatus(cpu: number) {
if (cpu > 90) return "critical";
if (cpu > 70) return "stressed";
return "healthy";
}When to Stop Scaling Vertically
Vertical scaling is usually the first answer, not the final one. Stop when:
- You are near the largest practical instance size.
- Cost is rising faster than the headroom you gain.
- A single machine or AZ is still your whole failure domain.
- You need capacity beyond what one server can safely absorb.
At that point, the conversation shifts to load balancing, stateless services, and horizontal scaling.
Practice This Concept
The simulator already lets you see this failure mode and fix in action.