CDC Pipeline Implementation with Python & Debezium

Change Data Capture pipelines built on PostgreSQL logical replication and Debezium fail in production for structural reasons, not exotic ones: a disconnected consumer pins restart_lsn and floods pg_wal until the primary runs out of disk, an unregistered DDL change silently breaks every downstream deserializer, or a failover promotes a replica that never carried the replication slot and forces a full resnapshot. This reference treats the pipeline as one observable data flow — spanning the logical decoding subsystem, the Debezium connector, Kafka, and Python consumers — and specifies the configuration, state mechanics, security boundaries, and diagnostic thresholds required to run it safely on PostgreSQL 15, 16, and 17.

End-to-end CDC pipeline — PostgreSQL changes flow through Debezium and Kafka to Python consumers and downstream sinks.

Core Architecture

A CDC pipeline is a chain of durability handoffs. PostgreSQL commits a transaction to the Write-Ahead Log; the WAL stream mechanics decode those records into logical row changes; a replication slot guarantees the WAL is retained until the consumer acknowledges it; Debezium serializes the change into a Kafka record; and Python consumers apply it to a warehouse or downstream service. Every link either preserves at-least-once delivery or breaks it, so the architecture must be reasoned about end to end rather than component by component.

Logical decoding operates strictly at the WAL layer. Every committed transaction emits WAL records describing tuple-level modifications; the pgoutput plugin reads those records, reassembles them into INSERT/UPDATE/DELETE change events with relation metadata and commit timestamps, and streams them in commit order. For CDC-heavy workloads this decoding is the throughput ceiling. Three server parameters govern it directly:

sql

-- Required on the primary; changing wal_level needs a restart.
ALTER SYSTEM SET wal_level = 'logical';
-- Cap the memory a single decoding session buffers before spilling to disk.
ALTER SYSTEM SET logical_decoding_work_mem = '256MB';
-- Headroom for total WAL between checkpoints under decoding pressure.
ALTER SYSTEM SET max_wal_size = '4GB';

logical_decoding_work_mem is the parameter most operators miss. When a single transaction’s reorder buffer exceeds it, PostgreSQL spills the transaction to disk in pg_replslot/<slot>/, adding I/O latency to every large batch. PG 16+ added parallelized reorder-buffer handling and streaming of in-progress transactions (streaming = on in the subscription/connector), which lets Debezium receive changes before COMMIT and keeps memory bounded on multi-million-row transactions. On PG 15 the same workload buffers the entire transaction, so a nightly bulk UPDATE of 5M rows can spill several GB before the first change event reaches Kafka.

Topology follows from the decoding model. A slot is single-consumer and ordered, so parallelism comes from partitioning the source — one slot and publication per bounded domain (orders, inventory, users) rather than one slot for the whole database. This isolates a high-churn table from stalling a low-churn one (head-of-line blocking) and lets you scale Debezium tasks and Kafka partitions per domain. The trade-off is more slots to monitor, which raises the stakes on the state mechanics covered below. For the underlying replication model — how publishers expose changes and how physical and logical streaming differ — see the architecture fundamentals reference.

Declarative Configuration Model

The pipeline is defined by three declarative objects: a PostgreSQL publication, a replication slot, and a Debezium connector. Getting these three to agree — same tables, same slot name, same plugin — is what separates a reproducible deployment from one that drifts on every restart.

A publication declares which tables and operations enter the WAL decoding stream. Create a narrow, explicitly enumerated publication rather than FOR ALL TABLES; a broad publication decodes churn from tables no consumer wants and inflates decoding CPU. The full mechanics of predicate filtering and column lists live in creating publications, but the CDC-relevant shape is:

sql

-- Scope the stream to exactly the CDC surface. PG 15+ supports row filters
-- and column lists, evaluated on the publisher during decoding.
CREATE PUBLICATION cdc_orders
  FOR TABLE public.orders (id, status, total, updated_at),
            public.order_items
  WHERE (status <> 'draft');

-- REPLICA IDENTITY controls what the UPDATE/DELETE change event carries.
-- DEFAULT emits only the primary key in "before"; FULL emits every column.
ALTER TABLE public.orders REPLICA IDENTITY FULL;

REPLICA IDENTITY is a correctness decision, not a tuning knob. With the default identity, an UPDATE that does not touch a TOASTed column emits null for that column, and a DELETE carries only the key — which breaks any consumer that needs the prior row state or reconstructs strict Avro records. FULL fixes this at the cost of larger WAL volume; the JSON to Avro transformation layer documents the hydration fallback when FULL is too expensive.

The connector is the second declarative object. Deploy it idempotently via the Kafka Connect REST API so repeated applies converge to one connector rather than spawning duplicates:

yaml

# PUT /connectors/pg-cdc-orders/config  — idempotent apply
connector.class: io.debezium.connector.postgresql.PostgresConnector
database.hostname: pg-primary.internal
database.dbname: analytics_prod
plugin.name: pgoutput          # decoderbufs is dropped in Debezium 2.x
publication.name: cdc_orders   # must match the publication above
slot.name: dbz_orders          # static; never let Debezium auto-generate
slot.drop.on.stop: false       # keep the slot across restarts in production
snapshot.mode: initial         # snapshot once, then stream
heartbeat.interval.ms: 10000   # advance the slot during idle periods
topic.prefix: pg-analytics

Three settings here are load-bearing. plugin.name must be pgoutput — Debezium 2.x dropped the legacy decoderbufs plugin, so a config carried over from 1.x will fail to start. A static slot.name prevents orphaned slots accumulating on every restart. And heartbeat.interval.ms is what keeps the slot’s confirmed_flush_lsn advancing when the captured tables are idle but the rest of the database is busy — without it, a low-traffic table’s slot pins WAL generated by unrelated writes. The complete parameter matrix, including snapshot.mode variants and schema-history retention, is documented in the Debezium connector configuration reference. When you operate the raw PostgreSQL side directly rather than through Debezium, the equivalent publisher/subscriber objects are covered under logical replication setup and management.

State Persistence & Lifecycle

The replication slot is the single most consequential piece of state in the pipeline. It records restart_lsn (the oldest WAL the consumer might still need) and confirmed_flush_lsn (the position the consumer has acknowledged). PostgreSQL will not remove or recycle any WAL segment newer than the minimum restart_lsn across all slots. That guarantee is exactly why an abandoned slot is dangerous: a consumer that stops acknowledging freezes restart_lsn, and pg_wal grows without bound until the volume fills and the primary refuses writes.

One logical slot's LSN timeline: WAL from the frozen restart_lsn to the write head is pinned on disk; if it reaches the max_slot_wal_keep_size cap the slot is invalidated rather than filling the volume.

The lifecycle has four states worth designing around: created, active-and-advancing, active-but-lagging, and inactive. A slot is created once (by Debezium’s first snapshot, or explicitly during replication slot initialization) and must survive every subsequent connector restart. The bounded-safety control is max_slot_wal_keep_size: it caps how much WAL a lagging slot can pin before PostgreSQL invalidates the slot rather than exhausting disk. Invalidation is a controlled failure — the slot becomes unusable and the consumer must resnapshot — which is almost always preferable to a full-disk outage that takes down every writer.

sql

-- Bound the blast radius of a stuck consumer. PG 13+.
ALTER SYSTEM SET max_slot_wal_keep_size = '10GB';
-- Fail a dead replication connection quickly so the slot goes inactive
-- and alerting fires, instead of silently pinning WAL.
ALTER SYSTEM SET wal_sender_timeout = '60s';

The slot type also encodes durability semantics. Persistent logical slots survive restarts and are the correct choice for a production connector; temporary slots vanish when their session ends and suit ephemeral backfills; and PG 16+ can create failover-aware slots synchronized to a standby so a promotion does not lose the CDC position. The full comparison, including two_phase decoding and slot synchronization, is in replication slot types. On the consumer side, durable progress means idempotent apply keyed on the primary key plus an LSN or source ts_ms high-water mark, so replaying from confirmed_flush_lsn after a restart produces no duplicates — the pattern the Python CDC parser implements. When you drive subscribers natively instead of through Kafka, the equivalent resume-and-resync flow is documented in subscription sync procedures.

Security & Privilege Boundaries

CDC extracts the entire committed history of the tables it captures, so the replication identity is effectively a read-everything credential for that surface. Treat it as least-privilege from the start rather than reusing a superuser. The connector needs REPLICATION (or, on PG 15+, membership in pg_create_subscription / the ability to use logical decoding) plus SELECT on the captured tables and CREATE on the database to build its schema-history and publication objects if you let it manage them.

sql

-- Dedicated, non-superuser CDC role.
CREATE ROLE cdc_replicator WITH LOGIN REPLICATION PASSWORD 'from-secrets-manager';
GRANT USAGE ON SCHEMA public TO cdc_replicator;
GRANT SELECT ON public.orders, public.order_items TO cdc_replicator;
-- Let the role own only the objects it must create.
GRANT CREATE ON DATABASE analytics_prod TO cdc_replicator;

Two network boundaries matter. First, pg_hba.conf must scope the replication connection type to the CDC role from the connector’s address only — a broad replication all line is a common and serious over-grant; the exact host-based rules are covered in security boundaries and permissions. Second, both hops carry the raw row data in the clear unless you force TLS: PostgreSQL-to-Debezium with sslmode=require (prefer verify-full with a pinned CA), and Debezium-to-Kafka with security.protocol=SSL or SASL_SSL. Never inline the database password in the connector config — reference it through a Kafka Connect config provider backed by a secrets manager, as shown in the connector reference. Finally, decide per topic whether PII columns should be masked or dropped in a transform before they land in Kafka, where retention and consumer sprawl make deletion hard.

Observability & Diagnostics

You cannot operate this pipeline blind; the difference between a five-minute incident and a full-disk outage is whether slot lag alerts fired. Instrument three layers — PostgreSQL slots, Debezium/Kafka Connect, and consumer lag — and alert on concrete thresholds, not vibes. The cross-cutting collectors and dashboards are covered under async monitoring integration; the queries below are the primitives those dashboards are built on.

The primary slot-health query watches retained WAL and inactivity. Alert when retained bytes climb past a few GB or when a production slot sits inactive for more than a minute:

sql

SELECT
  slot_name,
  active,
  pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
  ) AS retained_wal,
  pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS flush_lag_bytes
FROM pg_replication_slots
WHERE slot_type = 'logical';
-- ALERT: active = false on a production slot  -> consumer down, WAL pinning
-- ALERT: retained_wal > 4 GB                  -> approaching max_slot_wal_keep_size
-- ALERT: flush_lag_bytes > 256 MB sustained   -> consumer falling behind

pg_stat_replication shows the live streaming connection and its per-stage lag, which tells you whether the bottleneck is the network (sent→write), the consumer’s disk (write→flush), or apply (flush→replay):

sql

SELECT
  application_name,
  state,                              -- expect 'streaming'
  pg_wal_lsn_diff(sent_lsn, replay_lsn) AS total_lag_bytes,
  write_lag, flush_lag, replay_lag    -- time-based; PG 10+
FROM pg_stat_replication;
-- ALERT: state <> 'streaming'   -> connection stalled or catching up
-- ALERT: replay_lag > 60 s      -> downstream apply cannot keep pace

On the pipeline side, scrape Debezium’s MilliSecondsBehindSource and records-sent rate from its JMX metrics, and Kafka consumer-group lag per partition. A useful composite alert: page when consumer lag exceeds five minutes of ingest and is still rising, since a lag that is draining needs no intervention. Export all three into the same time series so an incident responder can see at a glance whether the stall originates in PostgreSQL, in Connect, or in the Python consumers rather than guessing.

Resilience Patterns & Failure Modes

Production CDC fails in a small, well-understood set of ways. Design the runbook around these signatures before you meet them at 3 a.m.

WAL exhaustion from an inactive slot. Signature: active = false in pg_replication_slots, retained_wal climbing, disk alerts on the primary. Root cause: the consumer died or the network partitioned and the slot froze restart_lsn. Remediation: restore the consumer to resume acknowledgment; if the consumer is gone for good, SELECT pg_drop_replication_slot('dbz_orders'); to release the WAL immediately — accepting that the next connector start must resnapshot. The preventive control is max_slot_wal_keep_size plus the active=false alert above.

Failover loses the slot. Signature: after promoting a replica, Debezium cannot find its slot and triggers a full snapshot. Root cause: before PG 16, logical slots did not fail over — they existed only on the old primary. Remediation on PG 16/17: create failover-enabled slots and synchronize them to the standby so the position survives promotion. On PG 15 and earlier, the practical pattern is a controlled snapshot with the when_needed snapshot mode and an idempotent consumer that tolerates the replayed rows.

Schema drift breaks deserialization. Signature: consumers throw on an unknown field or a type mismatch after a source DDL change. Root cause: a column was added, dropped, or retyped without a compatible schema evolution path. Remediation: enforce BACKWARD_TRANSITIVE compatibility in the schema registry, register schemas in CI before the DDL ships, and route non-conforming records to a dead-letter queue instead of blocking the stream — the routing and DLQ patterns live in event routing and Kafka integration.

TOAST columns arrive null on update. Signature: large text/JSON columns are null in UPDATE events but populated on INSERT. Root cause: REPLICA IDENTITY DEFAULT omits unchanged out-of-line values. Remediation: set REPLICA IDENTITY FULL on the affected tables, or add a primary-key-keyed hydration SELECT in the transform layer.

Consumer restart replays committed changes. Signature: duplicate rows or double-counted aggregates after a pod restart or rebalance. Root cause: at-least-once delivery plus non-idempotent apply. Remediation: upsert on the primary key and drop any event whose source LSN/ts_ms is not newer than the stored high-water mark. Backpressure belongs in the same runbook: when consumer lag crosses its threshold, tune max.poll.records and fetch.min.bytes, scale consumer replicas, or briefly throttle the connector — never let unbounded lag translate into unbounded slot growth on the primary.

Conclusion

A resilient PostgreSQL CDC pipeline is the product of a few non-negotiable disciplines: narrow, explicitly enumerated publications; static persistent slots bounded by max_slot_wal_keep_size; a least-privilege REPLICATION role over TLS on both hops; idempotent, LSN-aware Python consumers; and slot-lag alerting wired before the first byte flows. Get those right and the pipeline degrades gracefully — a dead consumer trips an alert and, at worst, invalidates a slot, instead of filling a disk and taking the primary offline.

Version choice shapes the operational envelope. PostgreSQL 15 buffers whole transactions during decoding and has no logical-slot failover, so plan for larger logical_decoding_work_mem and controlled resnapshots on promotion. PG 16 adds transaction streaming and parallel decoding that keep memory bounded on bulk writes, plus the foundations for failover slots. PG 17 hardens slot invalidation and slot synchronization to standbys and enriches pg_stat_progress_subscription, materially simplifying failover runbooks. Pin your Debezium version to the PostgreSQL minor release you run, standardize on pgoutput, and validate every configuration change against a staging replica before promotion.

Debezium Connector Configuration — the full connector parameter matrix, snapshot modes, and secrets handling.
Python CDC Parser Development — decoding change envelopes and building idempotent, LSN-aware consumers.
JSON to Avro Transformation — schema-registry governance, TOAST hydration, and binary serialization.
Event Routing & Kafka Integration — partitioning for ordering, dead-letter queues, and exactly-once delivery.
PostgreSQL Logical Replication Architecture & Fundamentals — the decoding, slot, and publication internals underpinning every CDC pipeline.
Replication Topologies & Failover Operations — running this pipeline across nodes: failover slot recovery, conflict resolution, and online schema change under an active slot.

← Back to Logical Replication Setup & Management

Core Architecture #

Declarative Configuration Model #

State Persistence & Lifecycle #

Security & Privilege Boundaries #

Observability & Diagnostics #

Resilience Patterns & Failure Modes #

Conclusion #

Related #