Event Routing & Kafka Integration

This guide covers the routing layer of a CDC Pipeline Implementation with Python & Debezium: how row changes leaving the PostgreSQL logical decoding subsystem are named into Kafka topics, partitioned for deterministic ordering, and consumed by Python ETL with delivery guarantees intact. Scope is the boundary between a running Debezium connector and the downstream consumers — topic naming, partition-key design, exactly-once semantics, and the failure modes that surface specifically at this hand-off.

Get routing wrong and the pipeline stays “green” while corrupting downstream state. Hash a topic on the wrong key and two updates to the same order land on different partitions, so a stale UPDATE overwrites a newer one at the sink — silent, per-row data loss that no connector metric reports. Under-provision partitions and a single hot tenant serializes an entire domain behind one consumer thread, driving lag into the hundreds of thousands of records until the replication slot it feeds pins restart_lsn and fills pg_wal. Skip transactional producers and a mid-batch consumer crash replays events a downstream service already applied. Every one of these is a routing decision made once at deploy time and paid for continuously in production.

Prerequisites & Configuration Objects

Before routing is meaningful, the extraction side must already be stable: wal_level = logical, a narrowly scoped publication, and a persistent slot created through initializing replication slots. The routing layer then adds three coordinated objects — the connector’s topic/transform config, the Kafka topic set, and the Schema Registry subject policy.

Server-side GUCs that the routing layer depends on:

sql

-- Primary must decode logically; requires a restart.
ALTER SYSTEM SET wal_level = 'logical';
-- One slot per bounded domain (orders, inventory, users), plus failover headroom.
ALTER SYSTEM SET max_replication_slots = 10;
ALTER SYSTEM SET max_wal_senders = 10;
-- Bound per-transaction reorder-buffer memory before it spills to pg_replslot/.
ALTER SYSTEM SET logical_decoding_work_mem = '256MB';

Provision max_replication_slots deliberately, not generously — every slot is an independent WAL-retention liability. The safe sizing procedure is documented in configuring max_replication_slots safely. The database role Debezium connects as needs REPLICATION plus SELECT on published tables and ownership (or CREATE) on the publication:

sql

CREATE ROLE cdc_router WITH LOGIN REPLICATION PASSWORD '***';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO cdc_router;
ALTER PUBLICATION cdc_prod OWNER TO cdc_router;

On the Kafka side, disable broker auto-topic-creation (auto.create.topics.enable=false) so routing is explicit and partition counts are chosen, not defaulted. Pre-create each domain topic with a partition count sized to peak parallelism and a replication factor of at least 3. Register a Schema Registry subject-compatibility policy of BACKWARD_TRANSITIVE before the first record is produced — retrofitting compatibility after producers exist forces a subject reset.

One connector fans three domain publications into prefixed topics; the record-key hash pins every event for an entity to a single partition, so ordering holds within a partition but never across them.

Step-by-Step Implementation

The following sequence takes a stable connector to a fully routed, idempotently consumed pipeline. Each step is independently verifiable.

1. Define the topic namespace via routing SMTs. Debezium defaults to <topic.prefix>.<schema>.<table>, which leaks database structure into topic names. Remap to domain-aligned, versioned topics with a RegexRouter transform so consumers bind to stable contracts:

json

{
  "transforms": "route",
  "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
  "transforms.route.regex": "pg-analytics\\.public\\.(.*)",
  "transforms.route.replacement": "$1.events.v1"
}

This turns pg-analytics.public.orders into orders.events.v1. Version the suffix (.v1) so a breaking schema change can be rolled out on a parallel topic without disrupting live consumers.

2. Pin the partition key to the ordering domain. Ordering in Kafka is guaranteed only within a partition, and Debezium hashes on the Kafka record key — which by default is the table’s primary key. That default is correct only when the primary key is the ordering domain. For a table where per-entity ordering must hold across a business identifier (for example, all events for one account_id), rewrite the key so co-located events share a partition:

json

{
  "transforms": "route,key",
  "transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
  "transforms.key.field": "account_id",
  "message.key.columns": "public.ledger_entries:account_id"
}

Changing the key changes the partition assignment, so apply key changes only on a new topic version — never in place on a topic with committed consumer offsets.

3. Enable transactional, idempotent production. Exactly-once from source to broker requires the connect worker to produce idempotently and, for multi-topic atomicity, transactionally:

json

{
  "producer.override.enable.idempotence": "true",
  "producer.override.acks": "all",
  "producer.override.max.in.flight.requests.per.connection": "5",
  "provide.transaction.metadata": "true"
}

provide.transaction.metadata=true emits a companion <prefix>.transaction topic carrying BEGIN/END markers and per-event id/total_order, which lets consumers reassemble a source transaction’s events across topics.

4. Emit non-tabular signals without schema changes. For events that have no table — maintenance flags, cross-service coordination, backfill markers — inject a transactional message straight into the WAL stream rather than creating a signal table:

sql

-- PG 14+: transactional message rides the same slot, in commit order.
SELECT pg_logical_emit_message(true, 'cdc.control', 'reindex:orders');

Debezium surfaces these on the signal/message topic; consumers branch on the messagetype field of the decoded payload. Because the message is transactional, it arrives in exact commit order relative to the DML around it — usable as a fence for coordinating downstream cutovers.

5. Route and consume idempotently in Python. The consumer keys deduplication and upserts on the business identifier, never on the Kafka offset (offsets are not stable across topic re-partitioning). This parsing and coercion layer follows Python CDC Parser Development; the serialization contract follows JSON to Avro Transformation:

python

from confluent_kafka import Consumer

consumer = Consumer({
    "bootstrap.servers": brokers,
    "group.id": "orders-sink-v1",
    "enable.auto.commit": False,          # commit only after the sink transaction
    "isolation.level": "read_committed",  # skip aborted transactional batches
    "max.poll.interval.ms": 300000,
    "auto.offset.reset": "earliest",
})
consumer.subscribe(["orders.events.v1"])

while True:
    msg = consumer.poll(1.0)
    if msg is None:
        continue
    if msg.error():
        route_to_dlq(msg); continue

    event = decode(msg.value())          # Avro/JSON -> dict
    if event is None:                    # tombstone (op=d compaction marker)
        continue

    with sink_txn() as tx:               # single DB transaction per batch boundary
        tx.execute(
            """INSERT INTO orders (id, status, updated_at)
               VALUES (%(id)s, %(status)s, %(ts)s)
               ON CONFLICT (id) DO UPDATE
                 SET status = EXCLUDED.status, updated_at = EXCLUDED.updated_at
                 WHERE orders.updated_at < EXCLUDED.updated_at""",  # LWW guard
            event,
        )
    consumer.commit(msg)                 # offset advances only on sink success

The WHERE orders.updated_at < EXCLUDED.updated_at clause makes the upsert a last-write-wins merge, so an at-least-once replay of an older event can never overwrite newer state — the property that makes redelivery safe.

Parameter Reference Table

Parameter	Layer	Recommended	Routing behavior
`topic.prefix`	connector	domain namespace	Root of every topic name; changing it re-homes all topics.
`message.key.columns`	connector	ordering-domain column(s)	Sets the Kafka record key → controls partition assignment and ordering scope.
`topic.creation.default.partitions`	connector	sized to peak parallelism	Max effective consumers per topic; cannot be lowered later without a new topic.
`producer.override.enable.idempotence`	connector	`true`	Prevents producer-retry duplicates on the broker.
`provide.transaction.metadata`	connector	`true`	Emits the `.transaction` topic for cross-topic transaction reassembly.
`isolation.level`	consumer	`read_committed`	Hides records from aborted transactional batches.
`enable.auto.commit`	consumer	`false`	Couples offset commit to sink-transaction success (at-least-once).
`max.poll.records`	consumer	tuned to sink throughput	Batch size; oversizing risks `max.poll.interval.ms` eviction mid-batch.
`compatibility.level`	Schema Registry	`BACKWARD_TRANSITIVE`	Lets new consumers read old records and old consumers read new ones.
`heartbeat.interval.ms`	connector	`10000`–`30000`	Advances `confirmed_flush_lsn` on idle tables so the slot does not pin WAL.
`slot.drop.on.stop`	connector	`false`	Preserves the slot across restarts; `true` forces a full resnapshot.

Diagnostic Queries

Routing failures show up first on the database as slot lag, because a stalled or mis-partitioned consumer stops acknowledging LSNs. Watch the slot before the broker.

sql

-- Retained-WAL lag per slot. Alert when retained_bytes > 1 GB or active = false > 300 s.
SELECT slot_name,
       active,
       pg_size_pretty(
         pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
       ) AS retained_wal,
       pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS flush_lag_bytes
FROM pg_replication_slots
WHERE slot_type = 'logical'
ORDER BY flush_lag_bytes DESC;

restart_lsn far behind pg_current_wal_lsn() with active = false means the consumer for that slot is gone and WAL is accumulating — the direct precursor to disk exhaustion. confirmed_flush_lsn lagging while active = true means the consumer is connected but falling behind, typically a hot-partition symptom.

sql

-- Confirm the sender side matches: state should be 'streaming', not 'catchup'.
SELECT application_name, state, sync_state,
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS unacked_bytes
FROM pg_stat_replication;

On the Kafka side, per-partition lag exposes the skew that slot metrics only imply:

bash

# Skewed LAG across partitions of one topic = a hot key. Rebalance or re-key on a new topic version.
kafka-consumer-groups.sh --bootstrap-server "$BROKER" \
  --describe --group orders-sink-v1

If one partition’s LAG grows while siblings drain, the partition key is concentrating traffic — the routing defect, not a consumer-capacity problem.

Failure Modes & Gotchas

1. Cross-partition reordering after a key change. Symptom: newer updates overwritten by older ones downstream. Root cause: message.key.columns was changed on a live topic, so historical events sit on partitions assigned by the old key and new events by the new key; ordering is no longer preserved within any single key. Remediation: never re-key in place — produce to a new topic version and cut consumers over, or add a last-write-wins guard (as in the upsert above) so ordering violations are absorbed rather than applied.

2. Slot pins WAL on an idle table. Symptom: retained_wal climbs on a low-traffic domain even though nothing is changing. Root cause: with no DML there is nothing to advance confirmed_flush_lsn, so the slot holds WAL at its last position. Remediation: set heartbeat.interval.ms to 10000–30000; the connector periodically advances the flush LSN even on silent tables.

3. Hot partition serializes a domain. Symptom: one partition’s consumer lag rises unbounded while others stay near zero. Root cause: a monolithic tenant or a low-cardinality partition key concentrates records. Remediation: salt the key (hash(pk) % N), shard the tenant onto its own topic, or raise partition count on a new topic version — partition count is immutable downward on an existing topic.

4. Duplicate application after a consumer crash. Symptom: downstream side effects (emails, ledger postings) fire twice. Root cause: enable.auto.commit=true advanced the offset before the sink transaction committed, so redelivered records re-run. Remediation: enable.auto.commit=false and commit the offset only inside/after the sink transaction; make the sink itself idempotent with ON CONFLICT upserts keyed on the business identifier.

5. Deserialization wall after an unregistered DDL change. Symptom: every consumer in a group fails on the same offset after a source ALTER TABLE. Root cause: the producer registered a new schema the consumers cannot read, or compatibility was not BACKWARD_TRANSITIVE. Remediation: enforce BACKWARD_TRANSITIVE, pre-register schemas in CI before the DDL ships, and route unreadable records to a quarantine/dead-letter topic with the original offset preserved for replay rather than blocking the partition.

Integration Touchpoints

Routing is the middle of the pipeline, so its correctness depends on the objects upstream and the consumers downstream. Upstream, topic and partition guarantees are only as strong as the WAL stream mechanics feeding them — commit order in Kafka is inherited from commit order in the WAL, which is why one slot per bounded domain (not one per table, not one for everything) is the partitioning unit. Durability latency at the source is tunable through synchronous_commit for logical replication; relaxing it trades stricter delivery timing for lower commit latency on the primary.

Downstream, the consumer contract is defined by the parser and serializer layers: Python CDC Parser Development owns op-field routing (c/u/d/r), tombstone handling, and type coercion, while JSON to Avro Transformation owns the Schema Registry compatibility contract that keeps routed records deserializable across evolution. Operationally, the slot-lag and consumer-lag thresholds above should be exported to the alerting stack described in async monitoring integration rather than watched by hand. When a failover breaks routing entirely, recovery starts by rebuilding the slot and re-establishing subscription sync from a known-good LSN before consumers resume.

Partition-by-key routing preserves per-entity ordering across parallel consumers: hashing on account_id pins each account to one partition, and a partition is owned by exactly one consumer in the group.

For authoritative references, consult the official PostgreSQL Logical Decoding documentation, the Apache Kafka documentation on exactly-once semantics, and the Confluent Schema Registry Avro guidelines.

Prerequisites & Configuration Objects #

Step-by-Step Implementation #

Parameter Reference Table #

Diagnostic Queries #

Failure Modes & Gotchas #

Integration Touchpoints #

Prerequisites & Configuration Objects

Step-by-Step Implementation

Parameter Reference Table

Diagnostic Queries

Failure Modes & Gotchas

Integration Touchpoints