043 · S3 · OBJECT · BUCKET

Object Storage

Infinitely scalable flat storage for blobs, images, and backups.

If you are new here: Object storage (S3, GCS, Azure Blob) stores immutable blobs addressed by bucket + key. You PUT whole objects and GET them by key — no directories in the protocol (though prefixes look like folders in UIs).

OperationTypical API
UploadPutObject, multipart for large files
DownloadGetObject
ListListObjectsV2 with prefix
SharePresigned URL (time-limited)

The Problem

You need to store billions of images, logs, and model artifacts cheaply and durably without managing RAID or NFS clusters. Object storage (e.g. S3) offers a simple API: PUT and GET bytes by key.

In plain terms: object storage is giant, cheap blob storage with HTTP verbs — great for immutable files, awkward for tiny random rewrites like a database page.

Analogy: An infinite warehouse where every box has a barcode (s3://bucket/key) — you replace whole boxes, not patch byte 7 inside a box.

PUT and GET

Upload the entire object (often multipart for large files). Download by key. Integrity is checked with checksums — the service handles replication under the hood.

CLI sketch:

aws s3 cp ./report.pdf s3://my-bucket/reports/2024/report.pdf

Durability across zones

Providers copy objects across multiple AZs (availability zones — independent data centers within the same region) — designed for extremely low loss probability for stored objects (distinct from backup against accidental deletes).

You still need: versioning or backups against DeleteObject mistakes.

Consistency and caching

After overwrite, edge caches and metadata may lag briefly — design idempotent uploads and use versioning or ETags (content fingerprints that change whenever the object changes) when you need strict read-your-writes semantics.

Lifecycle tiers

Move older objects to cheaper storage classes — archive for compliance; restore jobs take longer from Glacier-class tiers.

ClassGood for
StandardFrequent access
InfrequentCheaper, slight retrieval cost
ArchiveCompliance, rare reads

Not a POSIX filesystem

Do not hammer tiny random writes — latency and cost are wrong. Use block or file storage when the workload looks like a database or many small edits.

Trade-offs

Why this matters for you

Presigned URLs let browsers upload/download directly to S3 — your app server stops being a bandwidth bottleneck for large files.

Next: Erasure Coding is what cloud providers use inside object storage to achieve durability without storing three full copies of every byte.

DIAGRAMDrag nodes · pan · pinch or double-click to zoom
FRAME 1 OF 7

Buckets hold keys — `folder/` is only a prefix in the key string; listing by prefix is fast, but there are no real directories on disk.