094 · LAMBDA · FAAS · SERVERLESS

Serverless Architecture

Deploy functions without managing servers — pay per invocation.

If you are new here: Serverless doesn't mean there are no servers — there are always servers. It means you don't manage them. In serverless architecture (specifically Function-as-a-Service, or FaaS), you write a function, upload it to AWS Lambda (or Google Cloud Functions, Azure Functions), and the cloud provider handles everything else: provisioning servers, scaling to meet demand, patching OS vulnerabilities, and billing you only for the milliseconds your code actually runs. No idle servers, no capacity planning, no paying for compute when nothing is happening. AWS Lambda, Cloudflare Workers, and Vercel Functions are all implementations of this model.

TermPlain meaning
FaaS (Function-as-a-Service)The serverless compute model — deploy individual functions, not servers
AWS LambdaAWS's FaaS platform — runs functions in response to triggers
TriggerThe event that invokes a Lambda function (HTTP request, S3 upload, SQS message, schedule)
Cold startThe latency penalty when a function runs for the first time after being idle — runtime initialization
Warm instanceA Lambda container already initialized from a previous invocation — no cold start
Execution environmentThe container Lambda creates to run your function — it may be reused across invocations
Provisioned concurrencyPre-warming Lambda instances to eliminate cold starts for latency-sensitive functions
StatelessFunctions have no local memory between invocations — all state must live in external storage
Pay-per-invocationBilling is based on the number of calls and the milliseconds your code runs, not idle time

The Problem

You're running a server to handle API requests. At 2am, almost no users are online, but your EC2 instances (or containers) are running and incurring costs. At 2pm, traffic spikes 10×, and you're scrambling to provision more capacity. Either you over-provision (waste money at night) or under-provision (users experience slow responses during spikes).

The mental model of "I need N servers to handle my traffic" requires constant tuning. You pay for reserved capacity whether it's used or not. Scaling events are slow — spinning up a new EC2 instance takes 1–3 minutes. And you spend engineering time patching servers, managing Kubernetes clusters, and babysitting infrastructure that isn't your product.

Serverless eliminates this entire class of problems. You don't provision servers — you upload a function. The cloud provider runs that function exactly when it's needed, scales it instantly to match demand, and charges you only for what you use.

In plain terms: serverless converts infrastructure management from "I own a fleet of servers" to "I pay for the exact compute I consume, moment to moment."

Analogy: Renting a taxi versus owning a car. Owning a car means paying insurance, maintenance, and parking 24/7 — even when you're not driving. A taxi charges you only when you're in it. For sporadic trips, taxis are cheaper and simpler. For daily commutes, ownership makes more sense. Serverless is the taxi model for compute.

The Auto-Scaling Model

The fundamental difference between serverless and traditional servers is the scaling model. With Lambda:

  • 1 concurrent request → 1 Lambda instance
  • 10 concurrent requests → 10 Lambda instances in parallel
  • 10,000 concurrent requests → up to 10,000 Lambda instances, all running simultaneously

This scaling is essentially instant (milliseconds, not minutes) and automatic. You don't configure autoscaling rules. You don't maintain a fleet of warm instances. Lambda handles it.

The billing model matches: you pay for (number of invocations × execution duration in milliseconds). If a function runs for 100ms, you pay for 100ms. If it's idle, you pay nothing.

In plain terms: serverless scales linearly with demand, automatically, with no configuration. You can go from 1 to 10,000 requests per second without touching any infrastructure settings.

Tiny example: a startup launches a product on Hacker News. Traffic goes from 10 req/min to 50,000 req/min in 5 minutes. With a fixed server fleet, this would cause an outage. With Lambda, it just works — AWS spins up more instances to match the load. The startup pays for the spike; their servers don't fall over.

The account-level limit: AWS Lambda has a default concurrency limit of 1,000 per region (can be increased). If you genuinely need more than 1,000 simultaneous invocations, you need to request a limit increase. This is a real consideration for very high-traffic applications.

Cold Starts: The Latency Tax

The most important serverless limitation is the cold start. When your function hasn't been invoked recently, Lambda doesn't have a warm container ready. It must:

  1. Allocate a container
  2. Initialize the runtime (JVM startup, Node.js module loading, Python interpreter)
  3. Load and initialize your code and its dependencies
  4. Then run your handler function

This initialization can add 100ms to 2 seconds of latency, depending on the runtime and package size:

  • Node.js: ~100–300ms cold start
  • Python: ~100–400ms
  • Java (JVM): ~500ms–2s (JVM startup is slow)
  • Go: ~50–100ms (compiled binary, fast startup)
  • Rust: ~10–50ms (extremely fast)

Cold starts happen on the first invocation after a period of inactivity (typically 5–15 minutes idle). Under steady load, most invocations hit warm containers and don't pay the cold start penalty.

In plain terms: a cold start is the cost of spinning up a new container for your function. Under load, Lambda reuses warm containers and cold starts are rare. For infrequently called functions, every call might be a cold start.

Solutions for cold start sensitivity:

  • Provisioned concurrency: tell Lambda to keep N instances warm at all times. Eliminates cold starts, but you pay for the idle compute.
  • Lightweight runtimes: use Node.js or Go instead of Java to reduce initialization time.
  • Minimize package size: smaller dependency trees mean faster loading. Avoid bundling your entire node_modules if only 10% of packages are needed.
  • Scheduled warm-up pings: a CloudWatch event that pings your function every 5 minutes keeps it warm. Hacky but free.

Stateless Execution: The Architecture Constraint

Each Lambda invocation runs in an isolated container with no shared memory across invocations. You can't store data in a global variable and expect it to be there on the next call — the next call might run on a completely different container.

This is a feature, not a bug — it's what makes horizontal scaling trivial. But it forces architectural discipline:

All state must be external: if you need to persist data between invocations, it must go in a database (DynamoDB, RDS), a cache (ElastiCache/Redis), or object storage (S3). You can't use localStorage, in-memory caches, or file system writes as persistent state.

Warm container optimization: Lambda does reuse containers between invocations for performance. Code outside your handler function (database connections, SDK initialization) runs once when the container starts and is reused on subsequent warm invocations. This is worth exploiting for expensive initializations like DB connection pools.

// This runs once when the container initializes (not on every invocation)
const dbClient = new DynamoDBClient({ region: "us-east-1" });
 
export const handler = async (event) => {
  // dbClient is already initialized — fast!
  const result = await dbClient.send(new GetItemCommand(...));
  return result;
};

In plain terms: think of each invocation as a clean slate. Your function starts with no memory of previous calls. If it needs memory of previous calls, it must read it from an external store.

When Serverless Shines vs Struggles

Serverless is excellent for:

  • Event-driven workloads: image processing on S3 upload, processing SQS messages, responding to DynamoDB stream changes
  • Spiky traffic: products that get occasional traffic bursts (Hacker News launches, marketing campaigns)
  • Scheduled jobs: replacing cron jobs — trigger a Lambda on a schedule to clean up old records, send weekly reports
  • Webhooks: handling incoming webhooks from third-party services — one function, zero server management
  • API backends: for many startups, a Lambda-backed API Gateway handles millions of requests per month for pennies

Serverless is a poor fit for:

  • Long-running processes: Lambda has a 15-minute maximum execution time. Batch jobs that run for hours don't fit.
  • Latency-sensitive APIs: if your SLA is sub-10ms and cold starts are 500ms, serverless creates too much unpredictability
  • Persistent connections: WebSockets and long-lived TCP connections don't map cleanly to stateless functions
  • High-frequency steady-state traffic: at very high volume (millions of req/min, always), the economics often favor dedicated containers or servers

The Trade-offs

PropertyServerlessTraditional Servers
ProvisioningZero — fully managedYou provision and manage
ScalingInstant — automatic per-requestMinutes — autoscaling groups
Cold startsYes — 100ms–2s on first callNo — always warm
Idle costZero — pay only when invokedYes — servers run 24/7
Max execution time15 minutes (Lambda)Unlimited
StateStateless — external storage requiredStateful in-memory possible
Vendor lock-inHigh — Lambda-specific APIsLower — portable containers
DebuggingHarder — distributed, ephemeral logsEasier — persistent log streams

Why this matters for you

Serverless has fundamentally changed how teams build and operate backend systems. Understanding the FaaS model — stateless execution, pay-per-invocation billing, instant autoscaling, cold starts — lets you reason about when to reach for Lambda vs a long-running container. The most impactful mental shift: stop thinking in terms of "servers I need to manage" and start thinking in terms of "events I need to respond to." Most event-driven, asynchronous workloads map naturally to serverless. Most long-running, stateful, or latency-critical workloads don't.

Next: CQRS — what happens when your read and write patterns are so different they deserve separate models.

DIAGRAMDrag nodes · pan · pinch or double-click to zoom
FRAME 1 OF 5

FaaS model — an event triggers a function; the cloud provider allocates compute, runs the function, then tears it down.