Vertical Scaling

Every system starts with one server. Understanding where it breaks, and when to fight that limit versus redesign around it, is the foundation of every scaling conversation.

What Is Vertical Scaling?

Vertical scaling means upgrading your existing server to a more powerful one: more CPU, more RAM, faster disk, and usually a bigger bill. No new machines, no distributed coordination. Just a bigger box.

The opposite, horizontal scaling, adds more machines and distributes the load across them. That comes later.

INFO

Vertical scaling is the right first move for most systems. It is simpler to operate, easier to reason about, and usually cheaper than introducing distributed complexity too early.

The CPU Ceiling

A server has a fixed amount of capacity. Each incoming request consumes some of it. As traffic climbs, so does CPU utilization. Once you get close to saturation, latency rises sharply, queues form, and eventually requests fail.

The relationship is close to linear until you get into the danger zone. After that, the system feels much worse than the graph looks.

Instance Type Comparison

Instance	vCPU	RAM	Cost/mo	Max req/s
`t3.micro`	2	1 GB	$8.5	~500
`t3.small`	2	2 GB	$17	~1000
`t3.medium`	2	4 GB	$34	~2000
`t3.large`	2	8 GB	$68	~4000
`t3.xlarge`	4	16 GB	$136	~8000

WARNING

These numbers are modelled for this simulator, not copied from AWS benchmarks. CPU-heavy or memory-bound workloads will hit their ceiling differently.

Architecture: Before and After

Here is the basic traffic flow for a single-server stack:

Rendering Mermaid diagram...

After upgrading the server:

Rendering Mermaid diagram...

Dependency Structure

This dependency graph shows how a monolithic deployment still hides multiple layers inside one box:

Rendering Graphviz diagram...

D2 System Diagram

Rendering D2 diagram...

Interactive Architecture

This is a read-only React Flow canvas that uses the same custom node components as the simulator:

Client

Users

EC2

t3.large

CPU

42%

REQ/S

320

RDS

db.t3.micro

CPU

18%

CONN

Amdahl's Law

Theoretical speedup from more compute is bounded by the part of the workload that cannot be parallelized.

For a workload where fraction $p$ is parallelizable and you add $n$ processors:

$S(n) = \frac{1}{(1 - p) + \frac{p}{n}}$

If 80% of your work is parallelizable and you add 8 cores:

$S(8) = \frac{1}{0.2 + \frac{0.8}{8}} \approx 3.3\times$

Not 8x. The serial work is still your bottleneck.

TIP

Amdahl's law is one reason vertical scaling has a ceiling. Once a single-threaded bottleneck dominates, adding more compute stops buying you much.

Code: Detecting the Ceiling

import os from "os";
 
export function getCpuLoad(): number {
  const cpus = os.cpus();
  const total = cpus.reduce((sum, cpu) => {
    const times = Object.values(cpu.times);
    return sum + times.reduce((a, b) => a + b, 0);
  }, 0);
  const idle = cpus.reduce((sum, cpu) => sum + cpu.times.idle, 0);
  return Math.round(((total - idle) / total) * 100);
}
 
export function healthStatus(cpu: number) {
  if (cpu > 90) return "critical";
  if (cpu > 70) return "stressed";
  return "healthy";
}

When to Stop Scaling Vertically

Vertical scaling is usually the first answer, not the final one. Stop when:

You are near the largest practical instance size.
Cost is rising faster than the headroom you gain.
A single machine or AZ is still your whole failure domain.
You need capacity beyond what one server can safely absorb.

At that point, the conversation shifts to load balancing, stateless services, and horizontal scaling.

Practice This Concept

The simulator already lets you see this failure mode and fix in action.

Practice Vertical Scaling in the simulator