pg_create_logical_replication_slot step-by-step

For database engineers, data platform teams, and Python ETL developers automating change data capture (CDC) pipelines, pgcreatelogicalreplicationslot is the…

For database engineers, data platform teams, and Python ETL developers automating change data capture (CDC) pipelines, pg_create_logical_replication_slot is the deterministic entry point for streaming logical WAL changes. This guide isolates the exact operational sequence required to provision, tune, and validate a logical replication slot for production CDC workloads. Proper execution prevents WAL bloat, ensures consumer offset consistency, and aligns with enterprise Logical Replication Setup & Management standards.

Prerequisite Parameter Tuning

Before invoking the slot creation function, the PostgreSQL instance must be configured to retain and decode WAL records at the cluster level. The following parameters require precise tuning and a full service restart:

  • wal_level = logical (mandatory; cannot be hot-reloaded)
  • max_replication_slots >= N + 2 (where N is the number of concurrent CDC consumers; default 10 is often insufficient for multi-tenant data platforms)
  • max_wal_senders >= max_replication_slots + 2 (must exceed slot count to allow connection overhead and background workers)
  • max_slot_wal_keep_size (set to a strict upper bound, e.g., 10GB or 20GB, to enforce automatic slot invalidation if a consumer stalls and prevents unbounded disk exhaustion)

Apply these via postgresql.conf or ALTER SYSTEM, restart the cluster, and verify with SHOW wal_level; and SHOW max_replication_slots;.

Step 1: Privilege & Database Boundary Validation

Logical replication slots are strictly database-scoped. The executing role must hold REPLICATION privileges and connect directly to the target database. Cross-database slot creation is unsupported by the PostgreSQL architecture.

Grant privileges explicitly:

sql
ALTER ROLE cdc_pipeline_user REPLICATION;
GRANT CONNECT ON DATABASE target_db TO cdc_pipeline_user;

Verify role capabilities before proceeding. Attempting slot creation from a superuser context without explicit REPLICATION grants on the target database will fail with ERROR: must be replication role to use logical decoding.

Step 2: Execute Slot Creation

The core invocation follows this exact signature. Parameter selection must align with your CDC consumer architecture and PostgreSQL version baseline (14+):

sql
SELECT pg_create_logical_replication_slot(
    slot_name := 'cdc_etl_pipeline_v1',
    plugin := 'pgoutput',
    temporary := false,
    two_phase := false,
    failover := false
);
  • plugin: pgoutput is the native PostgreSQL 10+ output plugin. It streams changes in binary format, supports publication filtering, and minimizes CPU overhead compared to JSON-based decoders.
  • temporary: Set to false for persistent pipelines. Temporary slots drop on disconnect, which destroys offset tracking and forces full resyncs in Python ETL consumers.
  • two_phase: Enable (true) only if your pipeline requires decoding PREPARE TRANSACTION / COMMIT PREPARED sequences. Defaults to false.
  • failover: Available in PostgreSQL 16+. Set to true if the slot must survive physical standby promotion. Defaults to false.

For architectural context on how this function fits into the broader replication topology, consult Initializing Replication Slots before proceeding to consumer binding.

Step 3: Validate Offset Anchoring & State

Immediately after creation, verify the slot’s initial state and anchor LSNs. The function returns (slot_name, xlog_position), where xlog_position represents the restart_lsn at creation time.

sql
SELECT slot_name, plugin, slot_type, active, restart_lsn, confirmed_flush_lsn, wal_status
FROM pg_replication_slots
WHERE slot_name = 'cdc_etl_pipeline_v1';
  • restart_lsn: The oldest WAL position the slot requires. The server will retain WAL from this point forward.
  • confirmed_flush_lsn: Initially matches restart_lsn. Advances only when the consumer explicitly acknowledges processed changes.
  • wal_status: Monitor for extended or lost. A value of extended indicates the slot is retaining more WAL than max_wal_keep_size allows and will be invalidated if the consumer does not advance.

Step 4: Consumer Integration & Acknowledgment Flow

Python ETL pipelines must consume via the logical replication protocol and explicitly advance the confirmed LSN. Using psycopg2’s replication interface or asyncpg’s logical replication support, the consumer must send periodic StandbyStatusUpdate messages with the latest processed LSN. Failure to send these ACKs causes confirmed_flush_lsn to stall, triggering WAL accumulation.

python
# Conceptual psycopg2 replication loop
cur.start_replication(slot_name='cdc_etl_pipeline_v1', options={'proto_version': '1', 'publication_names': 'cdc_pub'})
for msg in cur.consume_stream():
    process_change(msg)
    cur.send_feedback(flush_lsn=msg.data_start)  # Critical: advances confirmed_flush_lsn

Refer to the official Logical Decoding documentation for protocol message formats and LSN arithmetic rules.

Step 5: Async Monitoring & WAL Guardrails

Deploy asynchronous monitoring that queries pg_replication_slots and pg_stat_replication at 15–30 second intervals. Track the delta between restart_lsn and confirmed_flush_lsn to calculate consumer lag in bytes or WAL segments.

sql
SELECT
    slot_name,
    pg_wal_lsn_diff(restart_lsn, confirmed_flush_lsn) AS pending_bytes,
    wal_status,
    active
FROM pg_replication_slots
WHERE slot_name = 'cdc_etl_pipeline_v1';

Alert thresholds:

  • pending_bytes > 5GB: Trigger PagerDuty/Slack alert for consumer lag investigation.
  • wal_status = 'extended': Immediate intervention required. The slot is approaching max_slot_wal_keep_size and will be invalidated if not advanced.
  • active = false: Consumer disconnected. If temporary = false, the slot persists but WAL retention continues.

Step 6: Error Handling, Retry Logic & Emergency Failover

Production CDC pipelines must handle slot invalidation, network partitions, and LSN gaps gracefully.

  1. Slot Invalidation Recovery: If max_slot_wal_keep_size is breached, PostgreSQL marks the slot invalid and drops it. The ETL pipeline must detect the ERROR: replication slot "..." does not exist exception, trigger a full snapshot resync, and recreate the slot.
  2. Retry Logic: Implement exponential backoff for connection drops. Do not attempt to recreate an existing slot; use SELECT pg_replication_slot_advance('slot_name', 'target_lsn') to manually reconcile offsets if the consumer processed changes but failed to ACK.
  3. Emergency Failover Procedures: During primary failover, logical replication slots do not automatically migrate to the new primary unless failover = true (PG 16+). For earlier versions, you must manually recreate the slot on the promoted standby and coordinate a snapshot-based catch-up. Ensure your orchestration layer handles publication recreation and subscription sync procedures before restarting the ETL consumer.

Maintain strict idempotency in your deployment scripts. Slot creation is not idempotent by default; wrap it in DO $$ BEGIN IF NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'cdc_etl_pipeline_v1') THEN PERFORM pg_create_logical_replication_slot('cdc_etl_pipeline_v1', 'pgoutput'); END IF; END $$; to prevent race conditions during parallel pipeline deployments.