044 · GZIP · SNAPPY · COMPRESSION

Data Compression

Reduce storage cost and network bandwidth by encoding data more compactly.

If you are new here: Compression encodes data in fewer bits by exploiting patterns (repeated text, predictable numbers). Lossless means you get exactly the original back — required for JSON, code, and databases. Lossy throws away detail — fine for photos and video, wrong for invoices.

TypeExamplesReversible?
Losslessgzip, zstd, SnappyYes — bit-exact
LossyJPEG, MP4No — “good enough” for humans

The Problem

Your API response is 2 MB of JSON. Your user is on mobile. You hit send — it takes 4 seconds. Now add Content-Encoding: gzip — same response, 200 KB. Compression encodes information in fewer bits by exploiting patterns — repeated strings, smooth gradients, predictable sequences.

In plain terms: compression trades CPU time on both ends for fewer bytes on the wire or disk — usually a great deal for texty payloads, a bad deal for already random data.

Analogy: Packing a suitcase with vacuum bags — same clothes, smaller volume, but you spend time squeezing and later unpacking.

Lossless compression

gzip, zstd, Snappy — decompress to exactly the original bytes. Safe for source code, JSON, logs. Typical text ratios are dramatic; data with no patterns (like encrypted files or random noise) barely shrinks — there is nothing to compress.

Analogy: Think of it like writing 5× the letter A instead of AAAAA — you encode the pattern, not every character, and the receiver reconstructs the original exactly.

Lossy compression

JPEG, video codecs discard detail humans rarely notice — much smaller files; cannot recover perfect originals — fine for photos and streaming, wrong for source archives.

CPU vs bytes

Every compress and decompress burns CPU. On 10 Gbps internal networks, lightweight codecs (or none) sometimes win over maximum gzip ratio.

KnobEffect
Higher compression levelSmaller bytes, more CPU
Snappy / LZ4Fast, modest ratio

HTTP in practice

Servers send Content-Encoding: gzip; browsers decompress automatically — reduces page weight for HTML and JSON APIs.

Response headers (conceptual):

HTTP/1.1 200 OK
Content-Type: application/json
Content-Encoding: gzip

When to skip

Do not gzip JPEG/PNG binaries or already compressed archives — you pay CPU for negligible savings.

Trade-offs

Why this matters for you

Columnar databases and TSDBs often compress columns by default — understand the knob (compression level vs scan speed) when tuning warehouses.

Next: Erasure Coding is a related storage efficiency technique — instead of compressing bytes, it reduces the overhead of replication by reconstructing lost pieces from parity shards.

DIAGRAMDrag nodes · pan · pinch or double-click to zoom
FRAME 1 OF 7

Run-length and dictionary coders exploit repetition — ‘aaaaa’ becomes a tiny token plus count; text and logs compress far better than encrypted noise.