The publication/subscription architecture serves as the operational backbone for PostgreSQL’s logical replication framework, enabling granular, row-level change data capture (CDC) without imposing full-instance synchronization overhead. For database engineers and data platform teams, this model decouples source transactional workloads from downstream analytical, event-driven, or polyglot persistence systems. Before deploying production CDC pipelines, teams must internalize the architectural constraints and state management principles detailed in PostgreSQL Logical Replication Architecture & Fundamentals, as misconfigured routing boundaries frequently introduce unbounded replication lag or silent schema drift.
flowchart TD PUB["Publication<br/>analytics_cdc_pub"] PUB --> S1(["Slot: warehouse"]) PUB --> S2(["Slot: search"]) PUB --> S3(["Slot: cache"]) S1 --> SUB1["Warehouse subscriber"] S2 --> SUB2["Search indexer"] S3 --> SUB3["Cache invalidator"]
Idempotent Publication Configuration
A publication defines the logical boundary of replicated data. Production deployments require idempotent, declarative configurations that survive infrastructure-as-code (IaC) re-runs and cluster failovers. CREATE PUBLICATION does not support IF NOT EXISTS, so guard the definition with an explicit catalog check inside a DO block rather than relying on ad-hoc migration statements.
-- Idempotent publication creation with explicit filtering.
-- CREATE PUBLICATION has no IF NOT EXISTS, so guard it with a catalog check.
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_publication WHERE pubname = 'analytics_cdc_pub') THEN
CREATE PUBLICATION analytics_cdc_pub
FOR TABLE
public.orders (id, customer_id, total, status, updated_at),
public.inventory (sku, warehouse_id, quantity, last_synced)
WITH (publish = 'insert, update, delete', publish_via_partition_root = true);
END IF;
END $$;
Key production considerations:
- Row/Column Filtering: Use
FOR TABLE ... (col1, col2)to minimize network payload and reduce apply worker CPU on subscribers. - Partition Routing: Enable
publish_via_partition_root = trueto ensure consistent schema mapping across dynamically created child partitions. - Transaction Boundaries: Logical replication preserves transactional atomicity. Partial commits are never streamed, but large transactions will block downstream apply workers until fully decoded.
Subscription Lifecycle & Slot Management
Subscriptions establish persistent connections to publishers and automatically provision logical replication slots. Slot retention guarantees are non-negotiable for data integrity but require strict operational oversight. As documented in Replication Slot Types, logical slots retain WAL segments until the subscriber explicitly acknowledges receipt. During network partitions or consumer stalls, unacknowledged slots will prevent WAL recycling, eventually triggering disk full conditions on the primary.
Implement automated slot monitoring and safe decommissioning workflows:
-- Monitor slot lag and retention pressure
SELECT slot_name, restart_lsn, confirmed_flush_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS wal_retained_bytes
FROM pg_replication_slots
WHERE slot_type = 'logical';
Idempotent Subscription Setup:
-- CREATE SUBSCRIPTION supports neither IF NOT EXISTS nor execution inside a
-- transaction block while it connects and creates a slot. Enforce idempotency
-- in deployment tooling by checking pg_subscription before issuing this command.
CREATE SUBSCRIPTION analytics_cdc_sub
CONNECTION 'host=pub-primary port=5432 dbname=source_db user=replicator'
PUBLICATION analytics_cdc_pub
WITH (
copy_data = false,
create_slot = true,
slot_name = 'analytics_cdc_slot',
synchronous_commit = off
);
Always set copy_data = false for existing tables to avoid initial snapshot bottlenecks, and explicitly name slots to prevent orphaned auto-generated identifiers during IaC drift.
WAL Decoding & Performance Boundaries
When a transaction commits, the publisher routes modified tuples through the pgoutput logical decoding plugin, which translates binary WAL records into a consumable stream. This transformation pipeline relies heavily on WAL Stream Mechanics to guarantee ordered delivery, crash recovery, and transactional consistency.
DevOps teams must account for measurable CPU and I/O overhead introduced by logical decoding. Filtering large tables, applying row-level security (RLS) policies, or streaming high-throughput UPDATE workloads will increase primary node load. Mitigate this by:
- Isolating replication connections via dedicated
replicationroles andpg_hba.confCIDR restrictions. - Tuning
wal_level = logicalandmax_replication_slotsduring cluster provisioning. - Monitoring
pg_stat_replicationforwrite_lag,flush_lag, andreplay_lagspikes.
Debugging & Troubleshooting Workflows
Production CDC pipelines require deterministic debugging procedures. Common failure modes include apply worker crashes, schema mismatch errors, and slot exhaustion.
1. Apply Worker Stalls & Conflicts
-- Check subscription state and apply-worker status.
-- subname/subenabled/subconninfo live on pg_subscription; runtime LSNs on pg_stat_subscription.
SELECT s.subname, s.subenabled, st.pid, st.received_lsn, st.latest_end_lsn
FROM pg_subscription s
LEFT JOIN pg_stat_subscription st ON st.subid = s.oid;
If a worker crashes due to a constraint violation or missing column, PostgreSQL logs the exact failing transaction ID and LSN. Use pg_replication_origin_status to identify the stuck origin, then either patch the downstream schema or use pg_replication_origin_advance() to skip irrecoverable transactions (use with extreme caution).
2. Slot Exhaustion & Disk Pressure
When restart_lsn falls significantly behind pg_current_wal_lsn(), WAL archives accumulate. Implement a runbook that:
- Alerts when
wal_retained_bytesexceeds 20% of available disk. - Archives pending WAL to object storage via
pg_waldumporwal-g. - Safely drops orphaned slots only after verifying downstream consumers have migrated or are permanently decommissioned.
Strategic Integration & Migration Paths
Engineering teams frequently evaluate logical versus physical replication for CDC pipelines. The trade-offs are comprehensively outlined in Logical vs physical replication differences. For modern data platforms requiring heterogeneous targets, in-flight transformations, or selective table routing, logical replication remains the default. Physical replication is reserved for read-scaling, high-availability failover, and exact binary copies.
Time-Series & Hypertable Integration:
When replicating partitioned time-series workloads, ensure the subscriber schema matches the publisher’s partitioning strategy. Detailed guidance on handling hypertable metadata and continuous aggregates is available in Integrating logical replication with TimescaleDB. Always validate that publish_via_partition_root aligns with your retention policies to prevent duplicate inserts during partition swaps.
Physical-to-Logical Migration: Transitioning from streaming replication to logical CDC requires careful state synchronization. Follow the established procedures in Migrating from physical to logical replication safely to prevent data divergence. The standard approach involves:
- Taking a consistent base backup.
- Restoring to the target cluster.
- Creating a logical slot at the exact LSN of the backup.
- Initializing the subscription with
copy_data = falseto resume streaming from the backup point.
Security Boundaries & Fallback Routing
Replication credentials must adhere to the principle of least privilege. Grant only SELECT on published tables and REPLICATION at the role level. Never use superuser accounts for subscription connections. For failover scenarios, implement fallback routing strategies that redirect subscribers to a promoted standby once the new primary assumes the pgoutput role. Configure connection parameters with target_session_attrs=read-write and implement exponential backoff in Python ETL consumers using libraries like psycopg or asyncpg to gracefully handle transient network partitions.
For protocol-level message formatting and stream parsing, consult the official PostgreSQL Logical Replication Protocol documentation. When building custom consumers, validate message boundaries against the PostgreSQL Logical Replication Reference to ensure idempotent apply logic and correct transaction commit sequencing.