004 · LATENCY · THROUGHPUT · BANDWIDTH

Latency vs Throughput vs Bandwidth

Understand the three core performance axes and how they trade off.

If you are new here: People say “slow” for three different problems. Bandwidth is how wide the pipe could be. Throughput is how much you actually move. Latency is how long one round trip (or one request) takes. Fixing the wrong one wastes time and money.

Term	Unit / question	Typical symptom when bad
Bandwidth	Max bits per second (ceiling)	Link pegged at 100% in graphs
Throughput	Achieved bits or requests per second	“We cap at 2k RPS no matter what”
Latency	Time (ms) per hop or end-to-end	High p95/p99, “feels laggy”

The Problem

Your dashboard shows a 10 Gbps link, green tiles everywhere, and plenty of idle CPU — yet users in another country say the app is sluggish. Meanwhile, your colleague is staring at a 100 Mbps circuit pegged at 99% utilization, and the issue is not "latency" at all: the system simply cannot move enough bytes per second to keep up.

Bandwidth, throughput, and latency are related, but they are not interchangeable. Mixing them up leads to wrong fixes: buying fatter pipes when the problem is serial request chains, or optimizing code when the NIC is already the bottleneck.

In plain terms: bandwidth is the size of the hose, throughput is water actually flowing, and latency is how long one drop takes to arrive — a wide hose does not fix long plumbing.

Bandwidth is the maximum rate a link or interface can carry — the upper bound on bits per second.

Analogy: Think of it as the width of the pipe — the NIC, the fiber strand, or the VPC egress cap your cloud provider quotes.

It does not tell you how much traffic is actually moving. A 40 Gbps uplink can sit almost empty while a 1 Gbps link next to it runs hot. Bandwidth answers: "What is the ceiling?"

Throughput

Throughput is the measured data rate under real conditions — how much useful payload crosses the link per second. It is always less than or equal to bandwidth (often much less), because of protocol overhead, contention, retransmits, and application behavior.

When ops says "we're doing 2.1 Gbps on a 10 Gbps port," that is throughput, not bandwidth. When you size a pipeline for "orders per second" or "egress to the analytics warehouse," you are usually talking throughput.

Latency

Latency is delay: time from cause to effect. For request-driven systems, people often mean round-trip time (RTT) — how long until a packet goes out and an answer comes back — or p95/p99 latency for end-to-end requests.

Crucially, latency does not get "fixed" by making the pipe wider if the work is inherently sequential or far away. You can have huge bandwidth and miserable latency for a chatty protocol across regions, or low bandwidth and acceptable latency for tiny control messages.

Worked example (latency vs bandwidth): Suppose RTT to your API is 200 ms and a page needs 50 sequential API calls (no batching). The floor time from networking alone is about 50 × 200 ms = 10 s — even on a 10 Gbps link, because each call waits for the last. Pipelining, batching, or moving data closer attacks latency; fatter pipes help when you are moving large payloads or many parallel streams.

Underutilized capacity

Underutilized capacity means high bandwidth and low throughput — you paid for a wide pipe but only a trickle uses it. That is not automatically a problem; it can be spare headroom for bursts.

Saturation

Saturation means throughput has hit the bandwidth wall (or another limiting factor). Queues build, contention appears, and perceived "slowness" often comes from waiting in line — not from each hop being slow in isolation.

Diagnosing "Slow"

Before you change architecture, decide which beast you are hunting:

Latency-heavy symptoms: high p95 on a single request path, long RTT to a distant region, cold code paths, N+1 queries, lock contention — users wait on one trip through the system.
Throughput-heavy symptoms: sustained CPU or NIC pegged, requests per second flat at a ceiling, backlog that grows under steady load — the system cannot accept more total work, even if each request were instant.

Sometimes both apply. Fixing the wrong one wastes time and money.

Why this matters

Distributed systems are built from links, queues, and clocks. Bandwidth tells you the physical ceiling; throughput tells you how close you are to it; latency tells you how long each unit of work spends in flight and in queues. Getting the vocabulary right is the first step toward the right dashboard — and the right incident response.

DIAGRAMDrag nodes · pan · pinch or double-click to zoom

FRAME 1 OF 6

Bandwidth is the maximum data rate the link can carry — the size of the pipe, not how much water is flowing right now.