Deploy functions without managing servers — pay per invocation.
If you are new here: Serverless doesn't mean there are no servers — there are always servers. It means you don't manage them. In serverless architecture (specifically Function-as-a-Service, or FaaS), you write a function, upload it to AWS Lambda (or Google Cloud Functions, Azure Functions), and the cloud provider handles everything else: provisioning servers, scaling to meet demand, patching OS vulnerabilities, and billing you only for the milliseconds your code actually runs. No idle servers, no capacity planning, no paying for compute when nothing is happening. AWS Lambda, Cloudflare Workers, and Vercel Functions are all implementations of this model.
| Term | Plain meaning |
|---|---|
| FaaS (Function-as-a-Service) | The serverless compute model — deploy individual functions, not servers |
| AWS Lambda | AWS's FaaS platform — runs functions in response to triggers |
| Trigger | The event that invokes a Lambda function (HTTP request, S3 upload, SQS message, schedule) |
| Cold start | The latency penalty when a function runs for the first time after being idle — runtime initialization |
| Warm instance | A Lambda container already initialized from a previous invocation — no cold start |
| Execution environment | The container Lambda creates to run your function — it may be reused across invocations |
| Provisioned concurrency | Pre-warming Lambda instances to eliminate cold starts for latency-sensitive functions |
| Stateless | Functions have no local memory between invocations — all state must live in external storage |
| Pay-per-invocation | Billing is based on the number of calls and the milliseconds your code runs, not idle time |
You're running a server to handle API requests. At 2am, almost no users are online, but your EC2 instances (or containers) are running and incurring costs. At 2pm, traffic spikes 10×, and you're scrambling to provision more capacity. Either you over-provision (waste money at night) or under-provision (users experience slow responses during spikes).
The mental model of "I need N servers to handle my traffic" requires constant tuning. You pay for reserved capacity whether it's used or not. Scaling events are slow — spinning up a new EC2 instance takes 1–3 minutes. And you spend engineering time patching servers, managing Kubernetes clusters, and babysitting infrastructure that isn't your product.
Serverless eliminates this entire class of problems. You don't provision servers — you upload a function. The cloud provider runs that function exactly when it's needed, scales it instantly to match demand, and charges you only for what you use.
In plain terms: serverless converts infrastructure management from "I own a fleet of servers" to "I pay for the exact compute I consume, moment to moment."
Analogy: Renting a taxi versus owning a car. Owning a car means paying insurance, maintenance, and parking 24/7 — even when you're not driving. A taxi charges you only when you're in it. For sporadic trips, taxis are cheaper and simpler. For daily commutes, ownership makes more sense. Serverless is the taxi model for compute.
The fundamental difference between serverless and traditional servers is the scaling model. With Lambda:
This scaling is essentially instant (milliseconds, not minutes) and automatic. You don't configure autoscaling rules. You don't maintain a fleet of warm instances. Lambda handles it.
The billing model matches: you pay for (number of invocations × execution duration in milliseconds). If a function runs for 100ms, you pay for 100ms. If it's idle, you pay nothing.
In plain terms: serverless scales linearly with demand, automatically, with no configuration. You can go from 1 to 10,000 requests per second without touching any infrastructure settings.
Tiny example: a startup launches a product on Hacker News. Traffic goes from 10 req/min to 50,000 req/min in 5 minutes. With a fixed server fleet, this would cause an outage. With Lambda, it just works — AWS spins up more instances to match the load. The startup pays for the spike; their servers don't fall over.
The account-level limit: AWS Lambda has a default concurrency limit of 1,000 per region (can be increased). If you genuinely need more than 1,000 simultaneous invocations, you need to request a limit increase. This is a real consideration for very high-traffic applications.
The most important serverless limitation is the cold start. When your function hasn't been invoked recently, Lambda doesn't have a warm container ready. It must:
This initialization can add 100ms to 2 seconds of latency, depending on the runtime and package size:
Cold starts happen on the first invocation after a period of inactivity (typically 5–15 minutes idle). Under steady load, most invocations hit warm containers and don't pay the cold start penalty.
In plain terms: a cold start is the cost of spinning up a new container for your function. Under load, Lambda reuses warm containers and cold starts are rare. For infrequently called functions, every call might be a cold start.
Solutions for cold start sensitivity:
node_modules if only 10% of packages are needed.Each Lambda invocation runs in an isolated container with no shared memory across invocations. You can't store data in a global variable and expect it to be there on the next call — the next call might run on a completely different container.
This is a feature, not a bug — it's what makes horizontal scaling trivial. But it forces architectural discipline:
All state must be external: if you need to persist data between invocations, it must go in a database (DynamoDB, RDS), a cache (ElastiCache/Redis), or object storage (S3). You can't use localStorage, in-memory caches, or file system writes as persistent state.
Warm container optimization: Lambda does reuse containers between invocations for performance. Code outside your handler function (database connections, SDK initialization) runs once when the container starts and is reused on subsequent warm invocations. This is worth exploiting for expensive initializations like DB connection pools.
// This runs once when the container initializes (not on every invocation)
const dbClient = new DynamoDBClient({ region: "us-east-1" });
export const handler = async (event) => {
// dbClient is already initialized — fast!
const result = await dbClient.send(new GetItemCommand(...));
return result;
};In plain terms: think of each invocation as a clean slate. Your function starts with no memory of previous calls. If it needs memory of previous calls, it must read it from an external store.
Serverless is excellent for:
Serverless is a poor fit for:
| Property | Serverless | Traditional Servers |
|---|---|---|
| Provisioning | Zero — fully managed | You provision and manage |
| Scaling | Instant — automatic per-request | Minutes — autoscaling groups |
| Cold starts | Yes — 100ms–2s on first call | No — always warm |
| Idle cost | Zero — pay only when invoked | Yes — servers run 24/7 |
| Max execution time | 15 minutes (Lambda) | Unlimited |
| State | Stateless — external storage required | Stateful in-memory possible |
| Vendor lock-in | High — Lambda-specific APIs | Lower — portable containers |
| Debugging | Harder — distributed, ephemeral logs | Easier — persistent log streams |
Serverless has fundamentally changed how teams build and operate backend systems. Understanding the FaaS model — stateless execution, pay-per-invocation billing, instant autoscaling, cold starts — lets you reason about when to reach for Lambda vs a long-running container. The most impactful mental shift: stop thinking in terms of "servers I need to manage" and start thinking in terms of "events I need to respond to." Most event-driven, asynchronous workloads map naturally to serverless. Most long-running, stateful, or latency-critical workloads don't.
Next: CQRS — what happens when your read and write patterns are so different they deserve separate models.
FaaS model — an event triggers a function; the cloud provider allocates compute, runs the function, then tears it down.