GAME LEVELS

Choose a level

40 system design challenges across four formats. Build, fix, optimise, design — each level teaches one clear concept. Start with 4 free sample levels; Pro unlocks the remaining 36.

BEGINNER
01
A-01SURVIVEFREE SAMPLE
First Deploy
One EC2. Traffic ramps 100→800 req/s. Keep uptime above 95% for 90 seconds.
Vertical scalingEC2RDS
02
A-02SURVIVEPRO
Scalable Web App
Traffic doubled overnight. Add a load balancer and scale horizontally.
ALBHorizontal scalingEC2 fleet
Unlock →
~10 min
01
B-01INCIDENTFREE SAMPLE
Database on Fire
RDS connections are maxed. Users timing out. Fix it before SLA breach.
Connection poolingElastiCacheRDS
02
B-02INCIDENTPRO
The Stampede
Cache expired at 3am. Every user hit the DB simultaneously. Stop the bleeding.
Cache stampedeTTL jitterElastiCache
03
B-03INCIDENTPRO
Single Point Down
Your load balancer is a single node. It just crashed. All traffic is dead.
SPOFALBRedundancy
01
C-01COSTFREE SAMPLE
The AWS Bill
Overprovisioned fleet burning $12k/mo. Cut to $6k without downtime.
Right-sizingEC2Reserved instances
02
C-02COSTPRO
Always-On Dev Env
Dev/staging stack runs 24/7. It only needs 8 hours a day.
Scheduled stop/startLambdaCost lifecycle
03
C-03COSTPRO
Reserved vs On-Demand
All your EC2 is on-demand. Baseline load is predictable. Commit and save.
Reserved instancesSavings plansOn-demand
Unlock →
~10 min
01
D-01DESIGNFREE SAMPLE
Blog Platform
50k readers/day, 100 writers. 95% reads. Budget: $500/mo. Uptime: 99.5%.
CloudFrontS3ALBRDS
02
D-02DESIGNPRO
URL Shortener
1B URLs, 10k writes/s, 100k reads/s, global P99 < 50 ms. Budget: $3k/mo.
DynamoDBElastiCacheCloudFrontLambda
Unlock →
~12 min
03
D-03DESIGNPRO
File Upload Service
10k concurrent uploads up to 5GB. Virus scan required. Budget: $1k/mo.
S3LambdaSQSAPI Gateway
Unlock →
~10 min
INTERMEDIATE
03
A-03SURVIVEPRO
Static Asset Storm
80% of requests are images. Your origin is drowning. Offload with CDN.
CloudFrontS3Origin offload
Unlock →
~12 min
04
A-04SURVIVEPRO
Cache or Die
Read-heavy traffic is hammering RDS directly. Add caching before it collapses.
ElastiCacheCache-asideRDS
Unlock →
~15 min
05
A-05SURVIVEPRO
Write Thunderstorm
A viral event sends 5,000 writes/s. RDS is the bottleneck. Buffer them.
SQSWrite bufferingAsync processing
Unlock →
~15 min
06
A-06SURVIVEPRO
Flash Sale
10× traffic spike in 60 seconds. Auto-scaling, queue buffering, rate limiting.
Auto-scalingFlash saleTraffic spike
Unlock →
~18 min
04
B-04INCIDENTPRO
Cascading Failure
One slow service is timing out callers. Timeouts are backing up everywhere.
Circuit breakerRetry stormElastiCache
Unlock →
~15 min
05
B-05INCIDENTPRO
Hot Cache Flush
Ops flushed the cache for a deploy. All 2M users hit DB cold.
Cache warmingRDS read replicaCold start
Unlock →
~15 min
06
B-06INCIDENTPRO
Queue Backup
SQS queue depth at 2 million. Workers can't keep up. Jobs are expiring.
SQS consumersDead-letter queueQueue scaling
Unlock →
~18 min
07
B-07INCIDENTPRO
Read Replica Lag
Read replica is 45 seconds behind primary. Analytics reads are stale. Reports wrong.
RDS replicationReplica lagRead topology
Unlock →
~18 min
04
C-04COSTPRO
Spot Fleet Swap
Stateless web tier is running on on-demand. Swap to Spot and save 70%.
Spot InstancesCost optimisationInterruption handling
Unlock →
~15 min
05
C-05COSTPRO
EBS to S3 Migration
50TB of user uploads on EBS at $0.10/GB. Move them to S3 at $0.023/GB.
S3EBSStorage costZero-downtime migration
Unlock →
~15 min
06
C-06COSTPRO
Lambda vs Always-On
Your report generator runs once a night. It lives on a $300/mo EC2.
LambdaEvent-drivenServerless cost
Unlock →
~15 min
07
C-07COSTPRO
Cache Pays for Itself
RDS is over-spec'd to handle read load. One cache node costs less than the DB upgrade.
ElastiCache ROIRight-sizingCache economics
Unlock →
~15 min
04
D-04DESIGNPRO
Social Feed
500k users, reads 20× writes, P95 feed load < 100ms. Budget $3k/mo.
Feed cachingRead replicasFan-out
Unlock →
~18 min
05
D-05DESIGNPRO
Notification System
Send 10M notifications/day. Guaranteed delivery. At-least-once. Budget: $2k/mo.
SNS fan-outSQS queuesLambda consumers
Unlock →
~18 min
06
D-06DESIGNPRO
Real-time Dashboard
100k events/s from IoT sensors. Dashboard refreshes every 5s. 30-day history.
KinesisLambdaDynamoDBStreaming
Unlock →
~18 min
07
D-07DESIGNPRO
Rate Limiter Service
50k tenants. Per-tenant rate limit: 1k req/s. Enforce globally. P99 overhead < 5ms.
Token bucketElastiCacheRate limiting
Unlock →
~18 min
ADVANCED
07
A-07SURVIVEPRO
Multi-AZ Under Fire
Traffic peaks. Then AZ-1 dies mid-ramp. You can't afford downtime.
Multi-AZFailoverRDS Multi-AZ
Unlock →
~20 min
08
A-08SURVIVEPRO
Hot Partition
One DynamoDB partition absorbs 90% of requests. Table throttling.
DynamoDBDAXPartition keys
Unlock →
~20 min
09
A-09SURVIVEPRO
Lambda Stampede
Serverless functions cold-starting under load. Concurrency limits hit.
LambdaProvisioned concurrencyAPI Gateway
Unlock →
~22 min
10
A-10SURVIVEPRO
Black Friday
Full-stack stress test. 50× normal traffic. Everything must hold.
CDNAuto ScalingSQSElastiCache
Unlock →
~25 min
08
B-08INCIDENTPRO
Region Down
us-east-1 is degraded. Reroute to failover region before SLA burns.
Route 53Multi-regionFailover
Unlock →
~20 min
09
B-09INCIDENTPRO
DDoS Under Way
500k req/s incoming. Bot traffic is real. Your origin is drowning.
WAFCloudFrontRate limiting
Unlock →
~20 min
10
B-10INCIDENTPRO
Split Brain
Network partition created two RDS primaries. Data is diverging.
Split brainLeader electionRDS
Unlock →
~22 min
08
C-08COSTPRO
Cross-AZ Data Transfer
App in AZ-1 reads from DB in AZ-2. $0.01/GB × 50TB/mo = surprise bill.
Cross-AZ costsAZ placementData transfer
Unlock →
~20 min
09
C-09COSTPRO
Cold Storage Tiering
200TB in S3 Standard. Only 5% accessed in last 90 days. Tier the rest.
S3 GlacierLifecycle policyStorage classes
Unlock →
~20 min
10
C-10COSTPRO
Monolith to Serverless
Always-on monolith serves bursty, low-frequency traffic. Replatform it.
LambdaDynamoDBAPI GatewayServerless
Unlock →
~22 min
08
D-08DESIGNPRO
Ride-sharing Backend
1M active riders, 100k drivers. Match in < 500 ms. Location updates 1/s.
DynamoDBLambdaSNSGeospatial
Unlock →
~25 min
09
D-09DESIGNPRO
Video Streaming Platform
10M viewers/day, 1M concurrent. Adaptive bitrate. Global < 2s start.
CloudFrontS3LambdaHLS
Unlock →
~25 min
10
D-10DESIGNPRO
The Interview
FAANG-style system design. 20-minute clock. Design for 1B users.
All node typesSynthesisTrade-offs
Unlock →
~30 min