090 · PUB/SUB · SNS · TOPIC

Pub/Sub

Broadcast events to many subscribers without the publisher knowing who they are.

If you are new here: Pub/Sub (Publish/Subscribe) is a messaging pattern where a publisher sends a message to a named channel called a topic, and every subscriber to that topic receives a copy of the message. The publisher doesn't know who is listening — it just publishes. Subscribers don't know who published — they just receive. This is fundamentally different from a message queue, where each message goes to exactly one consumer. In Pub/Sub, the same message is delivered to every subscriber independently. AWS SNS, Google Pub/Sub, and Kafka topics all implement this pattern.

TermPlain meaning
PublisherThe service that sends messages to a topic
SubscriberA service that registers to receive messages from a topic
TopicA named channel; publishers send to it, subscribers receive from it
Fan-outOne published message being delivered to many subscribers simultaneously
SubscriptionA subscriber's registration to a topic, optionally with filter rules
Push deliveryThe broker sends messages to subscribers (e.g., HTTP endpoint, Lambda trigger)
Pull deliverySubscribers poll the topic for new messages
Late subscriberA subscriber that subscribes after a message was published — it misses that message
Fan-out + SQSConnecting SNS → multiple SQS queues for durable fan-out

The Problem

An order is placed. Multiple downstream systems need to react:

  • Email service sends a confirmation
  • Analytics records the event
  • Inventory service reserves stock
  • Loyalty service awards points
  • Fraud service reviews the transaction

With direct calls, the order service must know about and call each of these systems. When you add a new downstream system, you modify the order service. When the email service goes down, the order service fails. The order service — a critical component — becomes a dependency hub for every team in the company.

Pub/Sub solves this by inverting the dependency. The order service publishes an "OrderCreated" event to a topic and forgets. Every downstream service subscribes to that topic independently. Adding a new loyalty system doesn't require changing the order service. Removing analytics doesn't break order creation. The order service is completely isolated from downstream changes.

In plain terms: with direct calls, the order service must know about every system that cares about orders. With Pub/Sub, those systems subscribe to orders — the order service doesn't need to know they exist.

Analogy: A radio broadcast. The radio station (publisher) broadcasts its signal to a channel (topic). Any radio (subscriber) tuned to that channel receives the signal. The station doesn't know or care who is listening. Listeners can tune in or out without the station doing anything. Adding a new listener doesn't require the station to do anything differently.

Queue vs Pub/Sub: The Critical Difference

This is one of the most commonly confused distinctions in distributed systems:

Message Queue: each message is consumed by one worker. If you have 3 workers reading from a queue and 1 message arrives, exactly one worker processes it. This is the competing consumers pattern — it's great for distributing load.

Pub/Sub Topic: each message is received by every subscriber. If you have 3 subscribers to a topic and 1 message is published, all 3 receive a copy. Each subscriber processes it independently.

When you want a queue: you have one type of work to do, and you want multiple workers to share the load. Example: resizing uploaded images — any worker can do it, and you only want each image resized once.

When you want a topic: different systems need to react to the same event in different ways. Example: an order is placed — email service, analytics, inventory, and loyalty all need to react differently. Each should receive their own copy.

Tiny example:

  • Queue scenario: 100 emails to send → 5 email workers each send 20 emails. Total: 100 emails sent.
  • Topic scenario: 1 order event → email worker sends confirmation, analytics records purchase, inventory reserves stock, loyalty awards points. Total: 4 different reactions to 1 event.

Subscription Filters

In practice, not every subscriber cares about every message. A topic might carry multiple event types — "OrderCreated," "OrderShipped," "OrderCancelled." The shipping service only cares about "OrderCreated." The refunds service only cares about "OrderCancelled."

Subscription filters let subscribers declare rules: "only deliver messages where status=shipped" or "only deliver messages where amount > 100." The broker evaluates these filters before delivery, so subscribers receive only the messages they care about.

This keeps subscriber code clean (no need to check message type and discard irrelevant messages) and reduces unnecessary processing.

In plain terms: filters are like email inbox rules — the broker sorts messages to the right subscribers so each subscriber only receives what it asked for.

Concrete sketch: AWS SNS subscription filter policies. Your OrderEvents topic carries all order events. The shipping Lambda has a filter: {"status": ["created"]}. The refunds Lambda has: {"status": ["cancelled"]}. The analytics Lambda has no filter (receives everything). SNS evaluates the filter at publish time — the shipping Lambda never sees cancelled orders, and the refunds Lambda never sees new orders.

Fan-Out with Queues: The Production Pattern

Raw Pub/Sub has a durability problem: if a subscriber is down when a message is published, it misses the message. The topic doesn't buffer — it just delivers to whoever is currently subscribed and discards the rest.

The production solution: SNS → SQS fan-out. Instead of delivering directly to services, the topic delivers to individual SQS queues — one per subscriber. Each service reads from its own queue, which is durable and will retry delivery if the service is temporarily down.

This gives you the best of both worlds:

  • Fan-out from the topic (one publish, many destinations)
  • Durability from the queues (messages buffered until consumers process them)
  • Independent scaling (each service scales its own SQS consumers independently)

In plain terms: the topic handles the "broadcast to everyone" part; the queues handle the "make sure each subscriber actually gets it" part.

Concrete sketch: you publish "OrderCreated" to SNS. SNS immediately delivers to three SQS queues: email-queue, analytics-queue, inventory-queue. Email service is temporarily down — its queue buffers the message. When it restarts, it processes the message from its queue. The analytics and inventory services already processed their copies. No message lost, no coupling.

The Late Subscriber Problem

Unlike a durable message queue, most Pub/Sub systems are ephemeral by default: messages are delivered once to currently active subscribers and not stored. A new subscriber that registers after a message was published doesn't receive that message.

Why this matters: if you add a new downstream service that needs to process all historical orders, you can't just subscribe to the SNS topic and expect to receive past events. The past events are gone.

Solutions:

  • Kafka with log retention: Kafka is a Pub/Sub system that stores messages on disk for a configurable retention period (days, weeks, forever). New subscribers can replay from any offset in the log.
  • Event sourcing: store all events in an append-only log (database table, S3). New subscribers replay from the log.
  • Backfill jobs: run a one-off batch job to populate the new service with historical data before it starts consuming live events.

In plain terms: classic Pub/Sub is "live TV" — if you weren't watching at broadcast time, you missed it. Kafka is DVR — you can rewind and replay.

Tiny example: you launch a new fraud detection service 6 months into production. With SNS, it only sees orders from today forward. With Kafka (1-year retention), it can replay 6 months of order events to build its fraud detection models before going live.

The Trade-offs

PropertyPub/Sub (SNS)Message Queue (SQS)Kafka
DeliveryEvery subscriberOne consumerEvery consumer group
DurabilityEphemeral (unless → SQS)Durable (configurable retention)Durable (configurable retention)
Late subscriberMisses eventsSees all buffered messagesCan replay from any offset
OrderingNo guaranteeFIFO queue availablePer partition
Fan-outNativeRequires multiple queuesNative (consumer groups)
Operational complexityLowLowHigh

When to use Pub/Sub:

  • Multiple independent services must react to the same event
  • You want publishers to be completely decoupled from subscribers
  • You're using the SNS → SQS fan-out pattern for durable delivery

When Kafka is a better fit:

  • You need message replay (new service needs historical data)
  • Multiple independent consumer groups, each with different processing speeds
  • Event streaming with exactly-once semantics
  • You're building a data pipeline or audit log that must never lose events

Why this matters for you

The Pub/Sub pattern shows up in every distributed system at scale. The SNS → SQS fan-out is the AWS idiomatic way to implement durable Pub/Sub — you'll see it in almost every serious AWS architecture. Understanding the core difference (queue = one consumer, topic = all subscribers) prevents the most common misuse. And understanding the late subscriber problem is what pushes teams toward Kafka when they need replay capability — an architectural decision that's very hard to reverse after the fact.

Next: Event-Driven Architecture — what happens when you build an entire system around events as first-class citizens.

DIAGRAMDrag nodes · pan · pinch or double-click to zoom
FRAME 1 OF 6

Pub/Sub — publisher emits to a topic; all subscribers receive a copy independently.