Logical vs physical replication differences

For database engineers, data platform teams, and Python ETL developers architecting continuous data pipelines, the divergence between physical streaming and…

For database engineers, data platform teams, and Python ETL developers architecting continuous data pipelines, the divergence between physical streaming and logical decoding dictates fault tolerance boundaries, schema evolution strategies, and downstream consumer compatibility. Physical replication operates at the block level, transmitting Write-Ahead Log (WAL) records verbatim to standby instances. Logical replication decodes WAL entries into row-level change events. When the operational intent is zero-downtime CDC extraction for heterogeneous sinks—message brokers, object storage, or Python-based ETL consumers—the replication model directly impacts slot retention policies, conflict resolution, and pipeline observability.

flowchart LR
  subgraph PHYS["Physical replication"]
    direction LR
    PW["WAL blocks"] --> PS["Standby<br/>exact binary copy"]
  end
  subgraph LOG["Logical replication"]
    direction LR
    LW["WAL"] --> LD["Decode to rows"]
    LD --> LT["Any target<br/>filtered or transformed"]
  end

WAL Stream Mechanics & Decoding Overhead

The primary instance processes transaction logs differently depending on the replication mode. Physical replication requires wal_level = replica (the default in PostgreSQL 14+) and streams binary WAL segments directly to standby servers via walsender processes. This mechanism bypasses SQL parsing, delivering near-zero serialization overhead and enabling high-throughput read scaling or synchronous high-availability failover. However, it enforces strict schema parity, identical major versions, and identical collation settings between primary and standby.

Logical replication mandates wal_level = logical and activates the pgoutput decoding plugin by default. A background worker parses WAL records into structured logical messages (BEGIN, INSERT, UPDATE, DELETE, COMMIT, TRUNCATE). This decoding layer introduces measurable CPU overhead on the primary during high-write workloads but unlocks cross-version compatibility, selective table routing, and heterogeneous sink integration. For Python ETL consumers, this means change events arrive as discrete, schema-annotated payloads rather than opaque binary blocks, enabling direct mapping to Avro/Protobuf schemas or DataFrame ingestion without intermediate WAL parsing.

Topology & Routing: Publisher/Subscriber vs. Block Mirroring

Physical replication enforces a rigid 1:1 or cascading standby topology. Every change on the primary is mirrored identically to the standby, leaving no room for selective filtering or schema transformation at the replication layer.

Logical replication implements a decoupled publisher-subscriber topology. Publishers define change sets via CREATE PUBLICATION, applying granular filters (WHERE clauses, column lists, or operation types), while subscribers instantiate CREATE SUBSCRIPTION to consume and apply those changes. This model enables precise data partitioning and schema transformation at the edge, but introduces latency and memory pressure during large transaction decoding. Understanding the Publication/Subscription Models is critical when designing Python ETL consumers that must handle out-of-order delivery, partial transaction commits, or DDL drift. Unlike physical replication, where the standby applies WAL in strict commit order, logical replication can be configured with streaming = on (PostgreSQL 14+) to flush large transactions incrementally, reducing pg_replication_origin memory footprint and preventing subscriber OOM conditions.

Replication Slot Types & Retention Policies

Replication slots govern WAL retention and prevent premature log recycling. Physical slots (type = physical) retain WAL until the standby acknowledges receipt via pg_last_wal_replay_lsn(). Logical slots (type = logical) retain WAL until the subscriber confirms application via confirmed_flush_lsn.

Operationalizing logical replication for CDC requires strict slot lifecycle management. Unacknowledged logical slots cause unbounded WAL accumulation on the primary, triggering disk exhaustion. Mitigation requires:

  • Setting max_slot_wal_keep_size to cap retention.
  • Tuning logical_decoding_work_mem to spill large transactions to disk rather than RAM.
  • Implementing heartbeat mechanisms in Python consumers to advance confirmed_flush_lsn during idle periods.
  • Monitoring pg_replication_slots for restart_lsn lag and triggering automated alerts before max_wal_size thresholds are breached.

Security Boundaries & Permissions

Physical replication relies on the REPLICATION system privilege and pg_hba.conf entries scoped to standby IP ranges. Data in transit is protected via TLS, but no row-level filtering occurs at the replication layer.

Logical replication introduces finer-grained security boundaries. Subscribers require REPLICATION plus SELECT privileges on published tables, or the pg_read_all_data role (PostgreSQL 14+). Row-Level Security (RLS) is bypassed by default for replication roles to maintain consistency, which necessitates explicit audit logging for sensitive tables. Network segmentation must isolate logical replication traffic from public endpoints, and ssl = on with certificate verification is mandatory for cross-VPC or cloud-to-on-prem pipelines. Additionally, logical decoding exposes plaintext payloads to decoding plugins; ensure pgoutput is restricted to authorized consumers and that downstream sinks enforce encryption at rest.

Fault Tolerance & Fallback Routing Strategies

Physical replication supports synchronous commit (synchronous_standby_names) for zero-data-loss guarantees. Failover is instantaneous via pg_ctl promote or automated orchestrators (Patroni, Stolon), with no slot migration required.

Logical replication is inherently asynchronous. During primary failover, logical slots must be migrated or recreated on the new primary. The standard fallback routing strategy involves:

  1. Capturing the last known confirmed_flush_lsn on the failed primary.
  2. Creating a new logical slot on the promoted standby.
  3. Using pg_replication_origin_advance to skip already-applied transactions and prevent duplicate delivery.
  4. If slot migration fails, falling back to snapshot-based extraction (pg_dump or COPY) with WHERE updated_at > last_sync to bridge the gap.

Conflict resolution in logical replication relies on origin tracking to prevent replication loops in multi-primary topologies. For ETL consumers, application-level idempotency (upserts, ON CONFLICT DO UPDATE, or deduplication keys) is mandatory, as logical replication does not guarantee exactly-once delivery under network partitions.

Operational Decision Matrix

Dimension Physical Replication Logical Replication
WAL Processing Binary block streaming Row-level decoding via pgoutput
Schema Requirements Strict parity, same major version Cross-version, selective table/column routing
Topology 1:1 or cascading standbys Decoupled publisher/subscriber mesh
Slot Retention restart_lsn tracks replay confirmed_flush_lsn tracks application
Failover Behavior Instant promotion, no slot migration Requires pg_replication_origin_advance or snapshot fallback
CDC Suitability Low (requires WAL parsing) High (native change events)
Primary Overhead Minimal Moderate (CPU/memory for decoding)

For high-availability clusters and read replicas, physical replication remains the production standard. For data platform teams building CDC pipelines, Python ETL consumers, or heterogeneous sink integrations, logical replication provides the necessary routing flexibility and schema evolution support, provided slot retention, security boundaries, and fallback routing are explicitly engineered into the pipeline architecture.