Replication Slot Types

A replication slot is the durable, server-side position marker that tells a PostgreSQL primary how much Write-Ahead Log (WAL) it must retain for a given consumer, and choosing the right slot type is a foundational decision within the PostgreSQL Logical Replication Architecture & Fundamentals that decides whether a change-data-capture pipeline survives a consumer outage or takes the primary offline. PostgreSQL exposes two families — physical and logical — plus orthogonal modifiers (temporary, two-phase, failover) that change lifecycle and durability semantics. This reference covers the prerequisites, the exact creation procedure, the parameters that govern each type, the diagnostic queries that expose slot lag, and the failure signatures that recur in production.

The consequence of getting slot type or lifecycle wrong is rarely visible at creation time. A logical slot left active = false after a Python consumer crashes will pin restart_lsn in place and grow pg_wal without bound until the volume fills and the primary refuses new writes — a full-cluster outage caused by an idle downstream. Conversely, a physical slot pointed at a workload that actually needs row-level, cross-version decoding will silently ship opaque byte ranges that a Debezium connector or a hand-written consumer cannot parse at all. Slot type is therefore not an implementation detail; it is the contract that binds WAL retention on the primary to delivery guarantees on the consumer.

Prerequisites & Configuration Objects

Before any logical slot can be created, the primary must be running with the right server-level configuration. These are set in postgresql.conf (or via ALTER SYSTEM) and, for wal_level, require a full restart — they cannot be hot-reloaded. The precise batching and retention behaviour these parameters drive is documented in WAL stream mechanics; the minimum viable set is:

wal_level = logical — instructs PostgreSQL to log the extra relation and old-tuple metadata that logical decoding needs to reconstruct row changes. replica (the default in many managed offerings) supports physical slots only. Changing this requires a restart.
max_replication_slots — the hard ceiling on concurrent slots (logical + physical) held in shared memory. Exceeding it returns FATAL: all replication slots are in use. Size it deliberately; see configuring max_replication_slots safely.
max_wal_senders — the ceiling on concurrent walsender processes. Each active streaming consumer needs one; keep it at least one above max_replication_slots to leave headroom for base backups.
A role with the REPLICATION attribute (or the pg_create_logical_replication_slot privilege on managed platforms) plus SELECT on the published tables. Privilege scoping is covered under security boundaries and permissions.

sql

-- Verify the running configuration before building anything on top of it.
-- pending_restart = true means the value is staged but not yet live.
SELECT name, setting, pending_restart
FROM pg_settings
WHERE name IN ('wal_level', 'max_replication_slots',
               'max_wal_senders', 'logical_decoding_work_mem');

A logical slot also binds to exactly one output plugin at creation and one database. pgoutput (built in) is the default for native subscriptions and Debezium; wal2json and decoderbufs are contrib/third-party alternatives that change the wire format but not the decoding guarantees. The plugin cannot be changed after creation — a plugin switch means dropping and recreating the slot, which resets its starting position.

Step-by-Step Implementation

The following procedure creates, inspects, advances, and drops each slot type safely. Every step is idempotent or explicitly guarded, because pg_create_logical_replication_slot has no IF NOT EXISTS form and a blind second call raises duplicate_object.

1. Create a physical slot (for a streaming standby or pg_basebackup). Physical slots carry no decoding cost and track only restart_lsn.

sql

-- On the primary. The standby references this name via primary_slot_name.
SELECT * FROM pg_create_physical_replication_slot('standby_1', true);
-- The second arg (immediately_reserve) pins WAL from creation, not first connect.

2. Create a logical slot bound to pgoutput. This is the slot a native subscription or a Python CDC parser attaches to.

sql

-- Returns the slot name and the LSN at which decoding will start.
SELECT * FROM pg_create_logical_replication_slot('cdc_orders', 'pgoutput');

3. Make creation idempotent in automation. Query the catalog first and create only when absent — the safe pattern for infrastructure-as-code and Python ETL bootstrap.

python

import psycopg2

def ensure_logical_slot(conn, slot_name, plugin="pgoutput"):
    with conn.cursor() as cur:
        cur.execute(
            "SELECT 1 FROM pg_replication_slots WHERE slot_name = %s",
            (slot_name,),
        )
        if cur.fetchone():
            return  # already present — do not recreate (that would reset position)
        cur.execute(
            "SELECT pg_create_logical_replication_slot(%s, %s)",
            (slot_name, plugin),
        )
    conn.commit()

4. Advance the slot only after the consumer has durably persisted the change to its target. Blind advancement discards unacknowledged data; delayed advancement bloats WAL.

sql

-- Move confirmed_flush_lsn forward to a position the consumer has committed.
SELECT pg_replication_slot_advance('cdc_orders', '0/1A2B3C4D');

5. Drop a slot that is genuinely dead — but only after confirming it is inactive, or the drop raises replication slot is active.

sql

-- active_pid must be NULL. Terminate a zombie walsender first if needed.
SELECT pg_drop_replication_slot('cdc_orders');

For the equivalent DDL-driven and Ansible-driven flows that a native subscription uses (where the slot is created implicitly from the subscriber side), see initializing replication slots.

Parameter Reference Table

Parameter / modifier	Type	Default	Logical-replication behavior
`wal_level`	GUC (`enum`)	`replica`	Must be `logical` for any logical slot; `replica` allows physical slots only. Restart required.
`max_replication_slots`	GUC (`int`)	`10`	Hard ceiling on total slots in shared memory. Restart required; undersizing → `all replication slots are in use`.
`max_wal_senders`	GUC (`int`)	`10`	Ceiling on concurrent walsenders. One per active streaming slot; keep above `max_replication_slots`.
`temporary`	slot flag	`false`	If `true`, the slot is dropped automatically when the creating session ends — useful for one-off snapshots, fatal for durable CDC (survives nothing).
`two_phase`	slot flag	`false`	PG 14+: decode `PREPARE`/`COMMIT PREPARED` at prepare time. Required when the source uses two-phase commit; must match the subscription’s `two_phase` setting.
`failover`	slot flag	`false`	PG 17+: synchronize the logical slot to standbys so it survives a failover; requires `sync_replication_slots = on` and a physical slot feeding the standby.
`plugin`	slot property	—	Output plugin, fixed at creation (`pgoutput`, `wal2json`, `decoderbufs`). Immutable — a change means drop + recreate.
`logical_decoding_work_mem`	GUC (`mem`)	`64MB`	Per-walsender reorder-buffer budget before decoded transactions spill to disk. Raise for large transactions to cut spill I/O.

Diagnostic Queries

Every slot problem is visible in pg_replication_slots and pg_stat_replication. These are the copy-paste queries that isolate the failure domain, with the thresholds that should trigger action.

sql

-- 1. Full slot inventory with retained-WAL size and health flag.
--    Alert when retained_wal exceeds ~25% of the pg_wal volume, or > 10 GB.
SELECT
    slot_name,
    slot_type,                                   -- 'physical' | 'logical'
    plugin,
    active,
    active_pid,
    wal_status,                                  -- reserved | extended | unreserved | lost
    pg_size_pretty(
        pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
    ) AS retained_wal
FROM pg_replication_slots
ORDER BY pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) DESC;

sql

-- 2. Per-consumer byte lag for logical slots (PG 10+ layout).
--    confirmed_flush_lsn far behind pg_current_wal_lsn() = a stalled consumer.
--    Page when lag > 256 MB or grows monotonically for > 300 s.
SELECT
    slot_name,
    pg_size_pretty(
        pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
    ) AS flush_lag
FROM pg_replication_slots
WHERE slot_type = 'logical';

sql

-- 3. Catalog xmin retention — the silent WAL/bloat driver.
--    A pinned catalog_xmin blocks VACUUM from cleaning catalog tuples.
SELECT slot_name, xmin, catalog_xmin,
       age(catalog_xmin) AS catalog_xmin_age
FROM pg_replication_slots
WHERE catalog_xmin IS NOT NULL
ORDER BY age(catalog_xmin) DESC;

sql

-- 4. wal_status = 'lost' means the slot fell behind max_slot_wal_keep_size
--    and required WAL was removed — the slot is now unrecoverable.
SELECT slot_name, wal_status, safe_wal_size
FROM pg_replication_slots
WHERE wal_status IN ('unreserved', 'lost');

The wal_status and safe_wal_size columns exist from PG 13+; safe_wal_size reports how many bytes can still be written before this slot enters the danger zone. A negative or zero value is an imminent-loss alarm.

Failure Modes & Gotchas

Inactive slot pins WAL and fills the volume. Signature: pg_wal grows steadily, pg_replication_slots.active = false, retained_wal climbing. Root cause: a consumer crashed or was decommissioned without dropping its slot, so restart_lsn never advances. Remediation: set max_slot_wal_keep_size (PG 13+, e.g. 10GB) so PostgreSQL invalidates a runaway slot rather than exhausting disk, and run a scheduled reaper that drops slots active = false for longer than a defined window (for example 24 h) after confirming the consumer is truly gone.

Zombie walsender blocks the drop. Signature: pg_drop_replication_slot raises replication slot "…" is active, but no consumer is connected. Root cause: a walsender backend survived a network partition and still holds active_pid. Remediation: confirm the PID in pg_stat_activity, then SELECT pg_terminate_backend(active_pid) before retrying the drop.

Long-running transaction freezes catalog_xmin. Signature: consumer keeps up on LSN, yet catalog_xmin and table bloat both climb. Root cause: an open long transaction on the primary holds the snapshot the slot needs for catalog visibility, so catalog_xmin cannot advance even though confirmed_flush_lsn does. Remediation: hunt idle-in-transaction sessions (state = 'idle in transaction' in pg_stat_activity), enforce idle_in_transaction_session_timeout, and treat unbounded catalog-xmin age as a page.

Wrong slot type for the workload. Signature: a consumer expecting row-level INSERT/UPDATE/DELETE receives opaque WAL, or a standby cannot attach. Root cause: a physical slot was created where logical was needed (or vice versa), or wal_level is replica rather than logical. Remediation: physical and logical are not interchangeable and cannot be converted in place — drop and recreate as the correct type, accepting that a new logical slot starts decoding from the current LSN and needs an initial snapshot to backfill.

Temporary slot vanishes on reconnect. Signature: after a consumer restart the slot is simply gone and CDC restarts from an empty state. Root cause: the slot was created with temporary = true and dropped when its session ended. Remediation: durable pipelines must always use non-temporary slots; reserve temporary for ad-hoc pg_recvlogical inspection.

Frequently Asked Questions

When should I use a physical slot instead of a logical one?

Use a physical slot when the consumer is another PostgreSQL instance that needs a byte-exact copy of the whole cluster — a streaming replica, a pg_basebackup target, or a pg_rewind source. Physical slots carry no decoding CPU and track only restart_lsn. Use a logical slot the moment you need row-level changes, selective tables or columns, cross-major-version delivery (PG 15 → 17), or delivery into a non-PostgreSQL target such as Kafka or a warehouse. If you find yourself wanting to filter or transform the stream, you need logical.

Does dropping and recreating a logical slot lose data?

Yes, unless you re-snapshot. A new logical slot begins decoding from the current WAL position, so any change committed before its restart_lsn will never be streamed. Recreating a slot therefore requires a fresh initial copy of the affected tables (via the subscriber’s copy phase or a manual snapshot) to backfill the gap before resuming streaming. Never treat a slot drop as a benign reset for a live pipeline.

What is the difference between restart_lsn and confirmed_flush_lsn?

restart_lsn is the oldest WAL position the slot still needs and is what pins WAL retention on disk. confirmed_flush_lsn is the last position the consumer has acknowledged as durably flushed. On a healthy logical slot the two advance together with a small gap; a widening gap between confirmed_flush_lsn and pg_current_wal_lsn() means the consumer is falling behind, while a restart_lsn that stops moving entirely means it has stopped confirming.

How do I keep a logical slot alive across a failover?

Before PG 17, logical slots did not follow a promotion — a failover orphaned the slot and forced a re-snapshot on the new primary. PG 17+ adds failover slots: create the slot with failover = true, run standbys with sync_replication_slots = on, and PostgreSQL synchronizes confirmed_flush_lsn to the standby so the promoted node can resume the same slot. On older versions, plan an explicit re-initialization step into your failover runbook.

Integration Touchpoints

Slot type is the hinge between several adjacent topics. Upstream, the WAL that a slot pins is governed by the retention and decoding rules in WAL stream mechanics, and the set of rows a logical slot ever sees is fixed by the publication and subscription models and the concrete DDL in creating publications. The privileges required to create and consume a slot are scoped in security boundaries and permissions. Downstream, a native subscription’s subscription sync drives the copy-then-stream lifecycle a slot anchors, while a Debezium connector or a Python CDC parser attaches to the same slot to receive the decoded stream. Sizing the shared-memory ceiling for all of these lives in configuring max_replication_slots safely.

For column-level semantics and state transitions, consult the official PostgreSQL pg_replication_slots view and the replication slot functions reference.

WAL Stream Mechanics — how WAL is generated, retained, and decoded behind every slot.
Publication/Subscription Models — what a logical slot is allowed to stream and to whom.
Security Boundaries & Permissions — the REPLICATION privilege and access scoping slot creation requires.
Configuring max_replication_slots safely — deterministic sizing of the slot ceiling.
Initializing Replication Slots — the hands-on creation and Ansible automation workflow.

← Back to PostgreSQL Logical Replication Architecture & Fundamentals

Prerequisites & Configuration Objects #

Step-by-Step Implementation #

Parameter Reference Table #

Diagnostic Queries #

Failure Modes & Gotchas #

Frequently Asked Questions #

Integration Touchpoints #

Related #