Subscription Sync Procedures

A subscription is the consuming half of a PostgreSQL logical replication topology, and driving its synchronization correctly is the operational core of the Logical Replication Setup & Management workflow — the moment CREATE SUBSCRIPTION runs it opens a replication slot on the publisher, negotiates the publication metadata, spawns table-sync workers to COPY a consistent baseline, and only then transitions to streaming decoded WAL. For database engineers, data platform teams, Python ETL developers, and DevOps operators, subscription sync is where a Change Data Capture (CDC) pipeline either reaches a clean, verifiable steady state or silently strands a table in a half-copied condition that corrupts every downstream consumer reading from it.

The failure consequences are asymmetric. A subscription that never finishes its initial copy holds a slot open on the publisher, pinning WAL that pg_wal cannot recycle until the consumer drains it — a stalled sync is therefore also a disk-exhaustion risk on the primary. A subscription that appears healthy but has one relation stuck in data-copy state (srsubstate = 'd') will stream changes for every other table while that one relation serves stale rows, so analytics and replicas diverge without any error in the logs. Because the initial COPY runs under a single exported snapshot while incremental changes queue behind it, getting the sequence, the parallelism, and the retry semantics right is what separates a repeatable runbook from a 3 a.m. manual recovery. This page is the reference for driving, monitoring, and recovering subscription synchronization against PostgreSQL 14 through 17.

Prerequisites & Configuration Objects

Subscription sync failures overwhelmingly originate in configuration that was never validated before CREATE SUBSCRIPTION ran, and most of them are on the publisher, not the subscriber. Confirm the following before provisioning any subscription; the deep catalog-level triage for a sync that has already failed lives in resolving subscription initialization failures.

On the publisher, the decoding surface and connection ceilings must already be in place:

sql

-- Publisher-side prerequisites. wal_level change requires a full restart.
ALTER SYSTEM SET wal_level = 'logical';
ALTER SYSTEM SET max_replication_slots = 10;   -- >= number of subscriptions + headroom
ALTER SYSTEM SET max_wal_senders     = 12;     -- >= slots, plus physical standbys/backups
SELECT pg_reload_conf();

On the subscriber, the initial COPY is executed by background table-sync (tablesync) workers, and their concurrency is bounded by two GUCs that are easy to under-provision:

sql

-- Subscriber-side prerequisites governing the initial COPY.
ALTER SYSTEM SET max_worker_processes = 16;                 -- restart required
ALTER SYSTEM SET max_logical_replication_workers = 8;       -- apply worker + tablesync pool
ALTER SYSTEM SET max_sync_workers_per_subscription = 4;     -- parallel table copies per sub
SELECT pg_reload_conf();

If max_logical_replication_workers is too low, CREATE SUBSCRIPTION succeeds but tables queue for a sync worker slot and sit at srsubstate = 'i'; if max_worker_processes is exhausted the log shows ERROR: out of background worker slots and the copy never starts. Provision a dedicated, least-privilege replication role on the publisher rather than reusing an application superuser — the full privilege model, including the PG 16+ pg_create_subscription predefined role, is covered in security boundaries and permissions.

Finally, every table in the subscribed publication must exist on the subscriber with a compatible schema and a REPLICA IDENTITY that lets UPDATE/DELETE apply — logical replication never ships DDL, so the subscriber schema is your responsibility. The decoding mechanics that produce the change stream the apply worker consumes are documented in WAL stream mechanics.

Step-by-Step Implementation

The sequence below takes a subscription from creation through a verified steady state, then covers the two operations you will run most often afterward: re-synchronizing a single divergent table and refreshing the publication set. Each step is safe against a live subscriber.

1. Create the subscription and drive the initial COPY. With copy_data = true (the default) the apply worker exports a snapshot and dispatches one tablesync worker per relation to COPY the baseline. streaming = 'parallel' (PG 16+) lets large in-progress transactions apply without waiting for commit; binary = true avoids text encode/decode on both ends.

sql

-- On the subscriber. Creates slot "orders_sub" on the publisher and starts
-- the initial per-table COPY under a single exported snapshot.
CREATE SUBSCRIPTION orders_sub
  CONNECTION 'host=pub.internal port=5432 dbname=app user=repl sslmode=verify-full'
  PUBLICATION orders_pub
  WITH (copy_data = true, streaming = 'parallel', binary = true, origin = 'none');
-- origin = 'none' (PG 16+): skip changes that originated elsewhere — prevents
-- loops in bidirectional / multi-origin topologies.

2. Watch every relation converge to ready. The per-table state machine in pg_subscription_rel.srsubstate walks i (init) → d (data copy) → f (finished copy) → s (synced, catching up) → r (ready, streaming). The subscription is only fully consistent once every row reads r.

sql

-- Poll until no rows remain below 'r'. Empty result = sync complete.
SELECT sr.srrelid::regclass AS table_name, sr.srsubstate, sr.srsublsn
FROM pg_subscription_rel sr
JOIN pg_subscription s ON s.oid = sr.srsubid
WHERE s.subname = 'orders_sub' AND sr.srsubstate <> 'r'
ORDER BY sr.srsubstate;

3. Pre-create the slot for a zero-orphan rollout (optional). If a network partition interrupts CREATE SUBSCRIPTION mid-handshake it can leave an orphaned slot on the publisher. For controlled cutovers, create the slot first and attach with create_slot = false, slot_name = … so the slot’s lifecycle is decoupled from the DDL statement.

sql

-- On the publisher: pre-create the slot.
SELECT pg_create_logical_replication_slot('orders_sub', 'pgoutput');

sql

-- On the subscriber: attach without letting CREATE SUBSCRIPTION create the slot.
CREATE SUBSCRIPTION orders_sub
  CONNECTION 'host=pub.internal port=5432 dbname=app user=repl sslmode=verify-full'
  PUBLICATION orders_pub
  WITH (copy_data = true, create_slot = false, slot_name = 'orders_sub');

4. Re-synchronize a single divergent table without touching the rest. When one relation drifts — after a schema fix, a manual write to the subscriber, or a filtered row that never arrived — you do not need to drop the subscription. ALTER SUBSCRIPTION … REFRESH PUBLICATION re-COPIES only tables newly added to the publication; to force a re-copy of an existing table, remove it and re-add it on the publisher, or truncate-and-refresh on the subscriber.

sql

-- Force a clean re-copy of one table: truncate locally, then re-refresh.
-- The apply worker re-runs COPY for tables it sees as not-yet-synced.
TRUNCATE public.order_items;
ALTER SUBSCRIPTION orders_sub REFRESH PUBLICATION WITH (copy_data = true);

5. Pause and resume safely for maintenance or DDL. Logical replication does not propagate DDL, so a schema change is a coordinated, subscription-aware operation: disable the subscription, apply the compatible DDL on publisher and subscriber, then re-enable. Disabling stops the apply worker but keeps the slot, so streamed changes accumulate as retained WAL — keep the window short.

sql

ALTER SUBSCRIPTION orders_sub DISABLE;
-- ... apply the same forward-compatible DDL on both sides ...
ALTER SUBSCRIPTION orders_sub ENABLE;

6. Refresh after the publication set changes. When the publisher runs ALTER PUBLICATION … ADD TABLE, subscribers ignore the new table until they refresh. A refresh with copy_data = true snapshots only the newly added relations and leaves everything already at r streaming uninterrupted.

sql

ALTER SUBSCRIPTION orders_sub REFRESH PUBLICATION WITH (copy_data = true);

7. Reconcile sequences at cutover. Logical replication never streams sequence advances. Before promoting a subscriber to primary, read the publisher high-water mark and advance the local sequence past it, or the promoted node reissues primary-key values that collide with already-emitted rows.

sql

-- Publisher: read the current value.  Subscriber (at cutover): advance past it.
SELECT last_value FROM public.orders_id_seq;         -- on publisher
SELECT setval('public.orders_id_seq', 9000000, true); -- on subscriber

8. Drive the whole procedure from Python for repeatable, idempotent runs. Orchestration layers (Airflow, Kubernetes Jobs, custom controllers) should treat sync as a poll-to-ready loop with bounded backoff, never a fire-and-forget DDL call.

python

import time
import psycopg2

def wait_for_sync(dsn: str, subname: str, timeout_s: int = 1800) -> None:
    """Block until every relation of `subname` reaches srsubstate='r'."""
    deadline = time.monotonic() + timeout_s
    backoff = 2.0
    with psycopg2.connect(dsn) as conn:
        conn.autocommit = True
        while time.monotonic() < deadline:
            with conn.cursor() as cur:
                cur.execute(
                    """
                    SELECT count(*) FROM pg_subscription_rel sr
                    JOIN pg_subscription s ON s.oid = sr.srsubid
                    WHERE s.subname = %s AND sr.srsubstate <> 'r';
                    """,
                    (subname,),
                )
                pending = cur.fetchone()[0]
            if pending == 0:
                return
            time.sleep(min(backoff, 30.0))
            backoff *= 1.5   # exponential backoff, capped at 30 s
    raise TimeoutError(f"{subname}: sync did not reach ready within {timeout_s}s")

Parameter Reference Table

Parameter / clause	Object	Default	Logical-replication behavior
`copy_data`	subscription	`true`	Runs the initial per-table COPY. Set `false` only when the subscriber already holds an identical baseline; a wrong `false` leaves tables permanently empty.
`streaming`	subscription	`off`	`on` spills large in-progress txns to disk; `parallel` (PG 16+) applies them concurrently, cutting apply lag on bulk writes.
`binary`	subscription	`false`	Transfers values in binary, skipping text encode/decode. Requires matching types on both ends.
`origin`	subscription	`any`	`none` (PG 16+) skips changes not originating on the publisher — required to break loops in bidirectional topologies.
`create_slot`	subscription	`true`	When `false`, attaches to a pre-existing slot named by `slot_name`, decoupling slot lifecycle from the DDL.
`two_phase`	subscription	`false`	Decodes prepared transactions at PREPARE time (PG 15+). Cannot be toggled after creation without a re-sync.
`synchronous_commit`	subscription	`off`	Per-subscription apply durability; see tuning synchronous_commit.
`run_as_owner`	subscription	`false`	PG 17: apply as each table’s owner (`false`) vs the subscription owner (`true`), tightening the privilege boundary.
`max_sync_workers_per_subscription`	server	`2`	Parallel tablesync workers per subscription during initial COPY. Raise for wide schemas; bounded by `max_logical_replication_workers`.

Diagnostic Queries

Sync state lives across three catalogs: pg_subscription (definition), pg_subscription_rel (per-table state), and pg_stat_subscription (live worker + lag). Query all three; a subscription can be enabled in the first while a relation is wedged in the second.

sql

-- 1. Per-table sync state. Any row not 'r' is not yet consistent.
--    i=init  d=data copy  f=finished copy  s=synced  r=ready(streaming)
SELECT srrelid::regclass AS table_name, srsubstate, srsublsn
FROM pg_subscription_rel sr
JOIN pg_subscription s ON s.oid = sr.srsubid
WHERE s.subname = 'orders_sub'
ORDER BY (srsubstate = 'r'), table_name;

sql

-- 2. Live apply + tablesync workers and end-to-end lag. A NULL pid on the
--    main apply row means the worker is not running (auth/slot/priv failure).
SELECT subname, pid, relid::regclass AS syncing_table,
       received_lsn, latest_end_lsn,
       pg_size_pretty(pg_wal_lsn_diff(latest_end_lsn, received_lsn)) AS apply_gap,
       last_msg_send_time, last_msg_receipt_time
FROM pg_stat_subscription
WHERE subname = 'orders_sub';

sql

-- 3. Publisher-side slot retention for this subscription. Alert if the
--    retained WAL keeps growing or active flips to false while syncing.
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
WHERE slot_name = 'orders_sub';

Threshold guidance: any relation stuck below srsubstate = 'r' for longer than the table’s expected COPY time (benchmark it — roughly 1 GB/min on commodity storage) is a paging condition, as is retained_wal climbing past 1 GB or pg_stat_subscription.pid being NULL on an enabled subscription. The full streaming-lag metric set and alert-rule templates are wired up in asynchronous monitoring integration.

Failure Modes & Gotchas

Initial COPY never starts; tables sit at srsubstate = 'i'. Signature: CREATE SUBSCRIPTION returns instantly but no data arrives and pg_stat_subscription.pid is NULL. Root cause: the subscriber ran out of logical-replication or background-worker slots, or the apply worker cannot authenticate to the publisher. Remediation: raise max_logical_replication_workers / max_worker_processes (restart), confirm the replication role and pg_hba.conf, then ALTER SUBSCRIPTION … ENABLE. Full triage is in resolving subscription initialization failures.

One table wedged in data-copy while others stream. Signature: query 1 shows a single relation at d for far longer than its size warrants; the rest are r. Root cause: a lock conflict on the subscriber target (a long transaction or an exclusive lock held by a migration) blocks the tablesync worker’s COPY. Remediation: inspect pg_stat_activity for wait_event = LogicalReplicationTableSyncWorker and clear the blocking session; the worker resumes automatically.

UPDATE/DELETE abort the apply worker after a clean initial COPY. Signature: the baseline loaded fine, then the apply worker crash-loops with logical replication target relation … has no replica identity. Root cause: the subscriber table lacks a primary key or explicit REPLICA IDENTITY, so it cannot locate the row to change. Remediation: set ALTER TABLE … REPLICA IDENTITY USING INDEX <unique> or FULL on the subscriber, then the stalled change re-applies.

Stalled sync exhausts publisher disk. Signature: pg_wal on the publisher grows steadily while a subscription is mid-sync and its slot shows a frozen restart_lsn. Root cause: the open slot cannot advance restart_lsn until the initial COPY completes, so WAL past that point cannot be recycled. Remediation: complete or abandon the sync — either unblock the copy, or DROP SUBSCRIPTION (which drops the slot) and re-seed. Prevention: bound the exposure with max_slot_wal_keep_size on the publisher and alert on retained WAL (query 3).

Duplicate-key violations immediately after promotion. Signature: the subscriber applies fine until it is promoted, then throws unique-violation errors on inserts. Root cause: sequence values were never replicated, so the promoted node reissues ids the old primary already used. Remediation: reconcile with setval() against the publisher high-water mark (step 7) as a scripted, mandatory cutover step.

Frequently Asked Questions

How do I re-sync just one table without re-copying everything?

Truncate that table on the subscriber and run ALTER SUBSCRIPTION … REFRESH PUBLICATION WITH (copy_data = true). The apply worker re-COPIES only relations it does not already see as ready, so tables already at srsubstate = 'r' keep streaming untouched. Never drop the whole subscription to fix one divergent table.

Can I add tables to an active subscription without downtime?

Yes. Add them to the publication on the publisher with ALTER PUBLICATION … ADD TABLE, then run ALTER SUBSCRIPTION … REFRESH PUBLICATION WITH (copy_data = true) on the subscriber. Only the newly added tables are snapshotted; existing streaming tables are unaffected.

What does copy_data = false actually skip, and when is it safe?

It skips the entire initial COPY and starts streaming from the slot’s current position, assuming the subscriber already holds a byte-identical baseline (for example, restored from the same pg_dump snapshot the slot was created against). Use it wrong and the affected tables stay permanently empty with no error — only use false when you can prove the baseline already matches the slot’s confirmed_flush_lsn.

Why is my subscription enabled but no data is moving?

Check pg_stat_subscription.pid: a NULL pid on an enabled subscription means the apply worker cannot start — almost always authentication, a missing REPLICATION privilege, a publication-name mismatch, or exhausted worker slots. Cross-reference the subscriber log and the diagnostics in resolving subscription initialization failures.

Integration Touchpoints

Subscription sync is the consuming end of a contract whose source side is defined elsewhere. The exposure boundary it copies from is set by creating publications, and the durable WAL cursor it advances against is provisioned in initializing replication slots; the topology reasoning behind fan-out and cascading subscribers lives in publication and subscription models, part of the broader logical replication architecture fundamentals.

Downstream, once a subscription reaches steady state its apply durability is tuned through synchronous_commit for logical replication, and its slot, apply-lag, and worker metrics are exported with SLO alerting via asynchronous monitoring integration. When the consumer is an event-streaming pipeline rather than a native subscription, the same publication and slot are read instead by the Debezium connector, where the initial-snapshot and offset-tracking responsibilities of subscription sync move into the connector’s own state store.

Resolving subscription initialization failures — catalog-level triage for a sync that fails at CREATE SUBSCRIPTION or stalls before ready.
Creating publications — define the exact table, column, and row set a subscription copies and streams.
Initializing replication slots — pre-allocate the durable WAL cursor a subscription advances against.
Asynchronous monitoring integration — export apply-lag, slot, and worker metrics with alerting thresholds.
Tuning synchronous_commit for logical replication — set the apply-side durability boundary for a subscription.
Logical Replication Setup & Management — the management layer this synchronization workflow belongs to.

Prerequisites & Configuration Objects #

Step-by-Step Implementation #

Parameter Reference Table #

Diagnostic Queries #

Failure Modes & Gotchas #

Frequently Asked Questions #

Integration Touchpoints #

Related guides #