Infinitely scalable flat storage for blobs, images, and backups.
If you are new here: Object storage (S3, GCS, Azure Blob) stores immutable blobs addressed by bucket + key. You PUT whole objects and GET them by key — no directories in the protocol (though prefixes look like folders in UIs).
| Operation | Typical API |
|---|---|
| Upload | PutObject, multipart for large files |
| Download | GetObject |
| List | ListObjectsV2 with prefix |
| Share | Presigned URL (time-limited) |
You need to store billions of images, logs, and model artifacts cheaply and durably without managing RAID or NFS clusters. Object storage (e.g. S3) offers a simple API: PUT and GET bytes by key.
In plain terms: object storage is giant, cheap blob storage with HTTP verbs — great for immutable files, awkward for tiny random rewrites like a database page.
Analogy: An infinite warehouse where every box has a barcode (s3://bucket/key) — you replace whole boxes, not patch byte 7 inside a box.
Upload the entire object (often multipart for large files). Download by key. Integrity is checked with checksums — the service handles replication under the hood.
CLI sketch:
aws s3 cp ./report.pdf s3://my-bucket/reports/2024/report.pdfProviders copy objects across multiple AZs (availability zones — independent data centers within the same region) — designed for extremely low loss probability for stored objects (distinct from backup against accidental deletes).
You still need: versioning or backups against DeleteObject mistakes.
After overwrite, edge caches and metadata may lag briefly — design idempotent uploads and use versioning or ETags (content fingerprints that change whenever the object changes) when you need strict read-your-writes semantics.
Move older objects to cheaper storage classes — archive for compliance; restore jobs take longer from Glacier-class tiers.
| Class | Good for |
|---|---|
| Standard | Frequent access |
| Infrequent | Cheaper, slight retrieval cost |
| Archive | Compliance, rare reads |
Do not hammer tiny random writes — latency and cost are wrong. Use block or file storage when the workload looks like a database or many small edits.
Presigned URLs let browsers upload/download directly to S3 — your app server stops being a bandwidth bottleneck for large files.
Next: Erasure Coding is what cloud providers use inside object storage to achieve durability without storing three full copies of every byte.
Buckets hold keys — `folder/` is only a prefix in the key string; listing by prefix is fast, but there are no real directories on disk.