In production Change Data Capture (CDC) pipelines, synchronous_commit dictates the precise durability boundary at which the primary PostgreSQL instance acknowledges a transaction commit. For logical replication workloads, this parameter directly governs commit latency, WAL generation velocity, and downstream consumer backpressure characteristics. Misalignment between synchronous_commit and logical decoding throughput routinely manifests as replication slot lag, WAL retention exhaustion, or Python ETL consumer timeout cascades. This document isolates the operational procedure for tuning synchronous_commit specifically for logical replication, providing exact parameter mappings, diagnostic queries, and safe deployment sequences validated against PostgreSQL 14–17.
Commit Semantics and Logical Decoding Throughput
The logical decoding subsystem reads WAL records asynchronously via the pgoutput plugin, but the client-facing acknowledgment path remains strictly bound by the primary’s synchronous_commit setting. The parameter does not alter how the logical worker decodes data; it alters when the producer application receives COMMIT OK.
| Value | Durability Guarantee | Latency Impact | Logical Replication Behavior |
|---|---|---|---|
on (default) |
WAL flushed to persistent storage via fsync() before commit acknowledgment |
1–5 ms/tx | Zero committed data loss on primary crash. Optimal for financial/audit workloads. |
local |
WAL written to OS page cache; fsync() deferred to background writer |
0.1–0.5 ms/tx | Accepts up to wal_writer_delay (default 200 ms) of data loss on unclean shutdown. Ideal for high-throughput CDC where downstream consumers implement idempotent replay. |
remote_write |
Primary waits for WAL receiver on a synchronous standby to acknowledge network receipt | 1–3 ms + RTT | Guarantees at least one standby has WAL in OS cache. Rarely used for pure logical replication unless paired with physical streaming. |
off |
WAL handed to OS; no explicit sync or network wait | <0.1 ms/tx | Strictly contraindicated for production logical replication. Unpredictable commit ordering and potential wal_writer_delay loss during crashes. |
When configuring the initial topology, engineers must align publication scope with commit behavior. During Creating Publications, restrict FOR TABLE clauses to actively mutating relations. Publishing high-churn system catalogs or append-only logs under synchronous_commit = local will saturate the WAL writer, trigger wal_keep_size exhaustion, and cause consumer desynchronization.
Diagnostic Patterns and Slot Lag Thresholds
Improper tuning follows predictable diagnostic signatures. The most frequent failure mode is rapid WAL accumulation paired with slot inactivity, yielding ERROR: replication slot "cdc_slot" is not active or FATAL: too many replication slots. This occurs when the logical replication worker cannot decode and transmit WAL records faster than the primary generates them under lowered commit durability, causing restart_lsn to fall behind wal_level=logical retention boundaries.
Execute the following diagnostic sequence at 15-second intervals during peak load:
SELECT
slot_name,
plugin,
active,
restart_lsn,
confirmed_flush_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS wal_retention_gap_bytes,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS apply_lag_bytes
FROM pg_replication_slots
WHERE slot_type = 'logical';
Operational Thresholds:
apply_lag_bytes > 256 MB: Consumer serialization or network I/O bottleneck. Investigate Python ETL batch sizing orpgoutputfilter overhead.wal_retention_gap_bytes > 1 GB: WAL generation outpaces archive/consumer consumption. Increasemax_wal_sizeor throttle producer write rate.active = falsefor >300s: Worker crash or subscription disconnect. Verify Initializing Replication Slots state and network keepalives.
Safe Deployment Sequence
Never modify synchronous_commit globally without validating downstream idempotency. Follow this zero-downtime rollout:
1. Baseline measurement — Record current commit_latency_ms (via pg_stat_statements or APM) and apply_lag_bytes under synchronous_commit = on.
2. Session-level validation — Apply SET synchronous_commit = local; to a single ETL connection. Run a representative write batch. Verify downstream consumers process duplicate/missing records without error.
3. Cluster-wide application
ALTER SYSTEM SET synchronous_commit = local;
SELECT pg_reload_conf();
The change takes effect immediately for new transactions. Existing connections retain their session value until reconnect.
4. Backpressure monitoring — Track pg_stat_replication.write_lag and consumer queue depth. If Python ETL connection pools exhaust or psycopg/asyncpg raise OperationalError: server closed the connection unexpectedly, revert immediately:
ALTER SYSTEM RESET synchronous_commit;
SELECT pg_reload_conf();
Pipeline Integration: ETL, Retry Logic, and Failover
Logical replication operates independently of physical standby promotion, but synchronous_commit tuning directly impacts Subscription Sync Procedures and emergency recovery paths.
Python ETL and Error Handling
When synchronous_commit = local, the primary acknowledges commits before disk persistence. Python consumers must implement:
- Idempotent Upserts: Use
INSERT ... ON CONFLICT DO UPDATEorMERGE-equivalent patterns with deterministic primary keys. - Exponential Backoff Retry Logic: Wrap transaction fetches in a retry loop with jitter. On
psycopg.errors.ConnectionFailure, reconnect and resume from the last acknowledgedconfirmed_flush_lsn. - Async Monitoring Integration: Export
apply_lag_bytesandwal_retention_gap_bytesto Prometheus viapg_exporter. Alert when lag exceeds 500 MB for >2 minutes.
Emergency Failover Procedures
Logical replication slots are not automatically promoted during primary failover. If the primary crashes with synchronous_commit = local, unflushed WAL records are lost. The downstream consumer must:
- Query the new primary for
SELECT pg_current_wal_lsn(); - Compare against the last successfully applied LSN in the ETL state table.
- If a gap exists, trigger a full snapshot sync or recreate the subscription with
CREATE SUBSCRIPTION ... WITH (copy_data = true)to reconcile state. Document this procedure in your runbook to prevent silent data divergence.
For authoritative parameter behavior and plugin architecture, consult the official PostgreSQL WAL Configuration and Logical Replication Architecture documentation.