Subscription initialization in PostgreSQL logical replication is a deterministic synchronization phase where the subscriber establishes a replication slot, negotiates publication metadata, and optionally executes a full initial table data copy. When this phase fails, downstream CDC pipelines stall, Python ETL workers encounter connection resets, and data platform teams face inconsistent LSN tracking across distributed environments. This document isolates the exact failure vectors encountered during CREATE SUBSCRIPTION execution and provides reference-grade remediation procedures aligned with Logical Replication Setup & Management standards.
Prerequisite Validation & Configuration Baseline
Initialization failures frequently originate from publisher-side infrastructure constraints rather than subscriber configuration. Validate the following parameters on the publisher before provisioning subscriptions. PostgreSQL 13+ defaults wal_level to replica, which explicitly blocks logical decoding.
wal_levelmust equallogical. Transitioning fromreplicaorminimalrequires a full cluster restart.max_replication_slotsmust exceed the total number of active and planned subscriptions. Each subscription consumes exactly one slot at initialization.max_wal_sendersmust accommodate the subscription apply worker plus any physical replication streams, backup clients, or streaming replication slots.max_worker_processesmust be sufficient to handle the table-sync workers spawned during the initial copy whencopy_data = true(the default). Insufficient workers causeERROR: out of background worker slots.
If these thresholds are breached, the publisher rejects the connection with FATAL: number of requested standby connections exceeds max_wal_senders or ERROR: could not create replication slot: slot limit reached. Adjust parameters in postgresql.conf, reload configuration via SELECT pg_reload_conf();, and verify allocation with SHOW max_wal_senders; and SELECT slot_name, slot_type, active FROM pg_replication_slots;.
Diagnosing Initialization State via System Catalogs
The initialization lifecycle is tracked across pg_subscription, pg_subscription_rel, and pg_stat_subscription. Query pg_stat_subscription to observe the apply worker pid, relid, and last_msg_send_time. A NULL PID indicates the subscription launcher failed to spawn the apply worker, typically due to authentication rejection, missing REPLICATION role privilege, or publication mismatch.
If srsubstate in pg_subscription_rel remains i (initialize) or d (data copy) beyond expected throughput windows, the initial synchronization is blocked by lock contention, network fragmentation, or I/O saturation. Cross-reference pg_stat_activity for wait_event_type = Lock or wait_event = LogicalReplicationTableSyncWorker to identify blocking transactions. For comprehensive state reconciliation and safe re-synchronization workflows, consult Subscription Sync Procedures before executing manual recovery commands.
stateDiagram-v2 state "i — initialize" as I state "d — data copy" as D state "s — synchronized" as S state "r — ready / streaming" as R [*] --> I: CREATE SUBSCRIPTION I --> D: start table sync D --> S: initial copy complete S --> R: caught up to apply R --> [*]: DROP SUBSCRIPTION
Exact Error Resolution Matrix
| Error Signature | Root Cause | Deterministic Resolution |
|---|---|---|
FATAL: number of requested standby connections exceeds max_wal_senders |
max_wal_senders exhausted by apply workers + physical replication/backup streams. |
Increase max_wal_senders by at least N + 2 (where N = planned subscriptions). Reload and verify with SHOW max_wal_senders;. |
ERROR: could not create replication slot "sub_slot": slot limit reached |
max_replication_slots exhausted or orphaned slots retained after failed CREATE SUBSCRIPTION. |
Run SELECT pg_drop_replication_slot(slot_name) FROM pg_replication_slots WHERE active = false;. Increase max_replication_slots if required. |
ERROR: publication "pub_name" does not exist |
Publication missing on publisher, case-sensitivity mismatch, or search path resolution failure. | Verify with SELECT pubname FROM pg_publication;. Recreate publication with exact casing. Ensure subscriber connection uses matching dbname and search_path. |
ERROR: relation "schema.table" does not exist |
Subscriber lacks target table, schema mismatch, or publication filter excludes the table. | Pre-create table schema on subscriber. Verify publication includes table: SELECT * FROM pg_publication_tables WHERE pubname = 'pub_name';. |
ERROR: could not receive data from WAL stream: SSL error: decryption failed or bad record mac |
MTU mismatch, aggressive TCP keepalive timeouts, or TLS renegotiation failure during large initial copy. | Set tcp_keepalives_idle, tcp_keepalives_interval, tcp_keepalives_count in postgresql.conf on both nodes. Disable ssl_renegotiation_limit if applicable. |
ERROR: logical decoding requires wal_level >= logical |
Publisher wal_level set to replica or minimal. |
Update postgresql.conf, restart cluster, verify with SHOW wal_level;. |
Operational Recovery & State Reconciliation
When initialization fails mid-copy, PostgreSQL leaves the replication slot in an inactive state and marks pg_subscription_rel rows as i or d. Do not manually delete rows from system catalogs. Follow this deterministic recovery sequence:
-
Halt the apply worker:
ALTER SUBSCRIPTION sub_name DISABLE; -
Verify slot inactivity:
SELECT slot_name, active FROM pg_replication_slots WHERE slot_name = 'sub_slot'; -
If the slot retains stale LSNs, advance it to the current publisher WAL position to prevent immediate WAL bloat upon restart:
sql -- On publisher SELECT pg_replication_slot_advance('sub_slot', pg_current_wal_lsn()); -
Reset subscriber state:
ALTER SUBSCRIPTION sub_name REFRESH PUBLICATION; -
Re-enable:
ALTER SUBSCRIPTION sub_name ENABLE;
If the initial copy must be skipped entirely (e.g., for pre-seeded subscribers), recreate the subscription with copy_data = false. Monitor pg_stat_subscription for state transitions to r (ready).
Integration Considerations for Python ETL & DevOps
Python ETL workers consuming logical replication streams must implement idempotent connection initialization and exponential backoff for transient network resets. When using psycopg2: Logical Replication Connection or asyncpg, wrap CREATE SUBSCRIPTION in a try/except block that catches psycopg2.errors.DuplicateObject and psycopg2.errors.ObjectInUse. Implement retry logic with jitter to prevent thundering herd during cluster restarts.
DevOps monitoring pipelines should alert on pg_stat_subscription where pid IS NULL or last_msg_send_time < NOW() - INTERVAL '5 minutes'. Integrate these checks into Prometheus exporters or Datadog custom metrics. For production deployments, enforce connect_timeout and keepalives parameters in connection strings to surface network fragmentation before it triggers silent initialization stalls. Reference official PostgreSQL Documentation: CREATE SUBSCRIPTION for parameter precedence and transactional guarantees.