For database engineers, data platform teams, and Python ETL developers automating change data capture (CDC) pipelines, pg_create_logical_replication_slot is the deterministic entry point for streaming logical WAL changes. This guide isolates the exact operational sequence required to provision, tune, and validate a logical replication slot for production CDC workloads. Proper execution prevents WAL bloat, ensures consumer offset consistency, and aligns with enterprise Logical Replication Setup & Management standards.
Prerequisite Parameter Tuning
Before invoking the slot creation function, the PostgreSQL instance must be configured to retain and decode WAL records at the cluster level. The following parameters require precise tuning and a full service restart:
wal_level = logical(mandatory; cannot be hot-reloaded)max_replication_slots >= N + 2(whereNis the number of concurrent CDC consumers; default10is often insufficient for multi-tenant data platforms)max_wal_senders >= max_replication_slots + 2(must exceed slot count to allow connection overhead and background workers)max_slot_wal_keep_size(set to a strict upper bound, e.g.,10GBor20GB, to enforce automatic slot invalidation if a consumer stalls and prevents unbounded disk exhaustion)
Apply these via postgresql.conf or ALTER SYSTEM, restart the cluster, and verify with SHOW wal_level; and SHOW max_replication_slots;.
Step 1: Privilege & Database Boundary Validation
Logical replication slots are strictly database-scoped. The executing role must hold REPLICATION privileges and connect directly to the target database. Cross-database slot creation is unsupported by the PostgreSQL architecture.
Grant privileges explicitly:
ALTER ROLE cdc_pipeline_user REPLICATION;
GRANT CONNECT ON DATABASE target_db TO cdc_pipeline_user;
Verify role capabilities before proceeding. Attempting slot creation from a superuser context without explicit REPLICATION grants on the target database will fail with ERROR: must be replication role to use logical decoding.
Step 2: Execute Slot Creation
The core invocation follows this exact signature. Parameter selection must align with your CDC consumer architecture and PostgreSQL version baseline (14+):
SELECT pg_create_logical_replication_slot(
slot_name := 'cdc_etl_pipeline_v1',
plugin := 'pgoutput',
temporary := false,
two_phase := false,
failover := false
);
plugin:pgoutputis the native PostgreSQL 10+ output plugin. It streams changes in binary format, supports publication filtering, and minimizes CPU overhead compared to JSON-based decoders.temporary: Set tofalsefor persistent pipelines. Temporary slots drop on disconnect, which destroys offset tracking and forces full resyncs in Python ETL consumers.two_phase: Enable (true) only if your pipeline requires decodingPREPARE TRANSACTION/COMMIT PREPAREDsequences. Defaults tofalse.failover: Available in PostgreSQL 16+. Set totrueif the slot must survive physical standby promotion. Defaults tofalse.
For architectural context on how this function fits into the broader replication topology, consult Initializing Replication Slots before proceeding to consumer binding.
Step 3: Validate Offset Anchoring & State
Immediately after creation, verify the slot’s initial state and anchor LSNs. The function returns (slot_name, xlog_position), where xlog_position represents the restart_lsn at creation time.
SELECT slot_name, plugin, slot_type, active, restart_lsn, confirmed_flush_lsn, wal_status
FROM pg_replication_slots
WHERE slot_name = 'cdc_etl_pipeline_v1';
restart_lsn: The oldest WAL position the slot requires. The server will retain WAL from this point forward.confirmed_flush_lsn: Initially matchesrestart_lsn. Advances only when the consumer explicitly acknowledges processed changes.wal_status: Monitor forextendedorlost. A value ofextendedindicates the slot is retaining more WAL thanmax_wal_keep_sizeallows and will be invalidated if the consumer does not advance.
Step 4: Consumer Integration & Acknowledgment Flow
Python ETL pipelines must consume via the logical replication protocol and explicitly advance the confirmed LSN. Using psycopg2’s replication interface or asyncpg’s logical replication support, the consumer must send periodic StandbyStatusUpdate messages with the latest processed LSN. Failure to send these ACKs causes confirmed_flush_lsn to stall, triggering WAL accumulation.
# Conceptual psycopg2 replication loop
cur.start_replication(slot_name='cdc_etl_pipeline_v1', options={'proto_version': '1', 'publication_names': 'cdc_pub'})
for msg in cur.consume_stream():
process_change(msg)
cur.send_feedback(flush_lsn=msg.data_start) # Critical: advances confirmed_flush_lsn
Refer to the official Logical Decoding documentation for protocol message formats and LSN arithmetic rules.
Step 5: Async Monitoring & WAL Guardrails
Deploy asynchronous monitoring that queries pg_replication_slots and pg_stat_replication at 15–30 second intervals. Track the delta between restart_lsn and confirmed_flush_lsn to calculate consumer lag in bytes or WAL segments.
SELECT
slot_name,
pg_wal_lsn_diff(restart_lsn, confirmed_flush_lsn) AS pending_bytes,
wal_status,
active
FROM pg_replication_slots
WHERE slot_name = 'cdc_etl_pipeline_v1';
Alert thresholds:
pending_bytes > 5GB: Trigger PagerDuty/Slack alert for consumer lag investigation.wal_status = 'extended': Immediate intervention required. The slot is approachingmax_slot_wal_keep_sizeand will be invalidated if not advanced.active = false: Consumer disconnected. Iftemporary = false, the slot persists but WAL retention continues.
Step 6: Error Handling, Retry Logic & Emergency Failover
Production CDC pipelines must handle slot invalidation, network partitions, and LSN gaps gracefully.
- Slot Invalidation Recovery: If
max_slot_wal_keep_sizeis breached, PostgreSQL marks the slot invalid and drops it. The ETL pipeline must detect theERROR: replication slot "..." does not existexception, trigger a full snapshot resync, and recreate the slot. - Retry Logic: Implement exponential backoff for connection drops. Do not attempt to recreate an existing slot; use
SELECT pg_replication_slot_advance('slot_name', 'target_lsn')to manually reconcile offsets if the consumer processed changes but failed to ACK. - Emergency Failover Procedures: During primary failover, logical replication slots do not automatically migrate to the new primary unless
failover = true(PG 16+). For earlier versions, you must manually recreate the slot on the promoted standby and coordinate a snapshot-based catch-up. Ensure your orchestration layer handles publication recreation and subscription sync procedures before restarting the ETL consumer.
Maintain strict idempotency in your deployment scripts. Slot creation is not idempotent by default; wrap it in DO $$ BEGIN IF NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'cdc_etl_pipeline_v1') THEN PERFORM pg_create_logical_replication_slot('cdc_etl_pipeline_v1', 'pgoutput'); END IF; END $$; to prevent race conditions during parallel pipeline deployments.