043 · S3 · OBJECT · BUCKET

Object Storage

Infinitely scalable flat storage for blobs, images, and backups.

If you are new here: Object storage stores whole blobs behind a bucket and key. You do not mount it like a local disk. You call HTTP-style APIs such as PutObject, GetObject, and ListObjectsV2, and the service handles durability, replication, access control, and lifecycle movement behind the scenes.

Term	Plain meaning
Bucket	Top-level container, like `my-app-uploads`
Key	Object name, like `users/42/avatar.png`; prefixes look like folders but are not real directories
Object	The bytes plus metadata: content type, ETag, encryption state, tags
Presigned URL	Temporary URL that lets a browser upload or download without your server proxying the bytes
Lifecycle rule	Policy that moves or deletes objects by age, prefix, tag, or storage class

The Problem

Your product lets users upload images, invoices, exports, and logs. At first, saving files to the app server feels obvious: write to /uploads, store the path in Postgres, serve it later. Then you add a second app instance. Half the files are on one machine, half are on another. Backups become scary. Deploys risk deleting local state. Big downloads turn your API fleet into a bandwidth bottleneck.

That is the gap object storage fills. It gives you a durable blob store that scales independently from your compute tier.

In plain terms: object storage is giant, cheap blob storage with an HTTP API. It is excellent for whole-file reads and writes, and awkward for tiny random edits.

Analogy: Think of a warehouse where every box has a barcode. You can put a box on a shelf, fetch it by barcode, list boxes with a prefix, or move old boxes to cheaper storage. You do not open a box and rewrite byte 7 in place.

Tiny example: Instead of sending a 200 MB video through your API server, the backend creates a presigned upload URL. The browser uploads directly to s3://media-bucket/videos/abc.mp4, and your app stores only the object key in the database.

PUT and GET

The basic API is intentionally small: upload the whole object, then fetch it later by key. Large uploads usually use multipart upload, where the client sends chunks and the service assembles them into one object. Reads can also request byte ranges, but the object is still conceptually one immutable blob.

In plain terms: you replace objects; you do not edit them like a file opened with vim.

CLI sketch:

aws s3 cp ./report.pdf s3://my-bucket/reports/2026/report.pdf
aws s3 cp s3://my-bucket/reports/2026/report.pdf ./report.pdf

The key design choice is where the bytes flow. For small files, your app can upload them. For large files, direct browser-to-object-storage upload is usually better: less API CPU, less API bandwidth, fewer timeout problems.

Durability across zones

Object stores are designed to survive disk, node, and availability-zone failures. Providers replicate or erasure-code objects across independent failure domains so one broken rack does not erase your data.

Important distinction: durability is not the same as backup. The service can keep bytes safe from hardware loss and still faithfully preserve your accidental delete. If your code calls DeleteObject, the platform may delete it perfectly.

Concrete sketch: For user uploads, enable versioning for important buckets, block public access by default, and use lifecycle rules to expire old versions after a retention window. That gives you hardware durability plus a recovery path for human and application mistakes.

Consistency and caching

Modern object stores generally give strong read-after-write behavior for new objects, but the bigger system around them may still feel stale. CDN edges, browser caches, metadata indexes, and application-level lists can lag behind the object write.

In plain terms: the object may be saved, but every cache and listing path in front of it might not agree immediately.

Use ETags or versioned keys when freshness matters. avatar-v17.png is easier to cache safely than repeatedly overwriting avatar.png and hoping every edge forgets the old copy at the same time.

Tiny example: A user uploads a new profile photo. The object write succeeds in 200 ms, but the CDN still serves the old URL for 5 minutes because of Cache-Control. Changing the key or issuing an invalidation fixes the user-visible stale image.

Lifecycle tiers

Object storage is not one price. Hot objects cost more to store but are cheap and fast to read. Archive tiers cost less per GB-month but charge retrieval fees and may take minutes or hours to restore.

In plain terms: lifecycle policies turn storage into a time-based cost model. Fresh logs stay hot. Old compliance archives move cold.

Object age	Common choice	Why
0-30 days	Standard	Frequent reads, fast restore
30-180 days	Infrequent access	Lower storage cost, occasional reads
180+ days	Archive	Compliance or rare recovery

The trap is moving data cold before you understand access patterns. A cheap archive tier can become expensive if your analytics job rehydrates terabytes every morning.

Not a POSIX filesystem

Object storage does not provide normal filesystem semantics: no directory inode tree, no POSIX locks, no cheap append, and no low-latency random overwrites. Some tools make buckets look like mounted folders, but that is an adapter, not the native model.

Analogy: A mounted bucket is like a translation app in a conversation. Useful, but every sentence still passes through a layer that can misunderstand the culture.

Use block storage for database files, file storage for shared POSIX-ish directories, and object storage for blobs, backups, exports, logs, and data lake files.

Trade-offs

You gain	You pay
Massive durable blob storage	Higher per-operation latency than local disk
Direct browser upload/download with presigned URLs	More IAM, bucket policy, and CORS discipline
Lifecycle tiers for cost control	Retrieval delays and fees from cold classes
CDN-friendly immutable keys	Awkward random writes and filesystem adapters

Why this matters for you

Object storage is the default home for user uploads, backups, exports, logs, static assets, and data lake files. The clean mental model is: keep metadata in your database, keep bytes in object storage, and move large transfers around your API servers whenever possible.

Next: Erasure Coding is what cloud providers use inside object storage to achieve durability without storing three full copies of every byte.

DIAGRAMDrag nodes · pan · pinch or double-click to zoom

FRAME 1 OF 7

Buckets hold keys — `folder/` is only a prefix in the key string; listing by prefix is fast, but there are no real directories on disk.