Setting up pg_hba.conf for replication users

Configuring pg_hba.conf for logical replication consumers is a discrete, high-impact operational task that directly governs the stability and security of…

Configuring pg_hba.conf for logical replication consumers is a discrete, high-impact operational task that directly governs the stability and security of continuous data capture (CDC) pipelines. For database engineers, data platform teams, Python ETL developers, and DevOps practitioners, an improperly scoped host-based authentication rule will immediately manifest as connection rejections during logical decoding initialization or cause silent replication slot starvation. This reference isolates the exact configuration parameters, cryptographic enforcement, and validation workflows required to provision replication users without exposing the primary cluster to unauthorized network traversal.

Logical replication operates at the Write-Ahead Log (WAL) layer, streaming transactional changes through publication/subscription mechanisms or external logical decoding plugins. The authentication layer must explicitly permit replication-type connections while enforcing strict cryptographic verification. Understanding how the server evaluates connection requests against the PostgreSQL Logical Replication Architecture & Fundamentals model ensures that pg_hba.conf entries align with the underlying streaming protocol rather than standard client query routing. Misalignment here typically results in FATAL: no pg_hba.conf entry errors during the initial START_REPLICATION handshake.

Role Provisioning & Cryptographic Baseline

Before modifying the host-based authentication file, the replication role must be provisioned with precise, isolated privileges. Logical replication streams at the WAL level; explicit table-level SELECT grants are not required for the replication stream itself. The publishing role must own the publication or hold appropriate schema permissions.

Execute the following on the publisher node (PostgreSQL 14+):

sql
CREATE ROLE cdc_pipeline_user WITH LOGIN REPLICATION PASSWORD 'strong_scram_password';
GRANT CONNECT ON DATABASE target_db TO cdc_pipeline_user;

-- Enforce modern cryptographic standards cluster-wide
ALTER SYSTEM SET password_encryption = 'scram-sha-256';
SELECT pg_reload_conf();

Verify baseline parameters: wal_level must be logical, and max_replication_slots should exceed the total number of active CDC consumers plus a 20% buffer for rolling deployments and failover rotation.

pg_hba.conf Syntax & Sequential Evaluation

The pg_hba.conf parser evaluates entries top-to-bottom, terminating at the first match. A misplaced or overly permissive rule will either block the replication connection or violate least-privilege mandates. For logical replication consumers, the database column must explicitly reference replication or the specific target database, depending on the connection string used by the CDC agent.

The recommended production entry format is:

code
hostssl replication cdc_pipeline_user 10.0.5.0/24 scram-sha-256

Precise parameter tuning requires strict adherence to the following operational constraints:

  • Connection Type: Use hostssl exclusively. Logical replication streams sensitive transactional metadata, DDL statements, and raw WAL segments; unencrypted transmission violates compliance baselines and exposes payload data to network sniffing.
  • Database Column: Specify replication to match the virtual database name used during the WAL streaming handshake. If your Python ETL worker connects directly to dbname=target_db for subscription initialization, add a parallel entry for that specific database.
  • Authentication Method: scram-sha-256 is mandatory. MD5 is deprecated, cryptographically weak, and unsupported by modern connection drivers.
  • CIDR Scoping: Restrict to the exact subnet of your replication workers or CDC proxies. Never use 0.0.0.0/0.

WAL Stream Mechanics & Connection Routing

Logical replication consumers establish two distinct connection phases. The first phase authenticates against the target database to initialize the subscription, validate table mappings, and register a logical replication slot. The second phase connects to the replication virtual database to begin streaming WAL changes via START_REPLICATION.

If the replication database entry is missing, incorrectly ordered, or uses an incompatible authentication method, the consumer will authenticate successfully during initialization but fail immediately upon requesting the WAL stream. This dual-phase routing behavior is why pg_hba.conf must contain both database-specific and replication-specific entries when deploying stateful CDC agents.

Publication/Subscription Models & Slot Management

Each active consumer requires a dedicated logical replication slot (pg_create_logical_replication_slot). Slots prevent WAL truncation until changes are acknowledged by the subscriber. The pg_hba.conf configuration must support concurrent connections from multiple consumers without cross-contamination.

When deploying Python ETL workers (e.g., using psycopg or pg8000), ensure connection pooling is disabled for replication streams. Connection multiplexing breaks slot assignment guarantees and causes ERROR: replication slot "slot_name" is active conflicts. Refer to the official PostgreSQL documentation on logical replication for slot lifecycle management and pg_replication_slots monitoring patterns.

Security Boundaries & Network Isolation

Network segmentation and cryptographic enforcement must operate in tandem. The Security Boundaries & Permissions framework dictates that replication endpoints should reside in isolated subnets, accessible only via dedicated load balancers or direct VPC peering. Implement pg_hba.conf rules that deny all traffic by default at the end of the file:

code
# Explicit deny-all fallback (must be the final rule)
hostssl all all 0.0.0.0/0 reject

This ensures that any unscoped or misrouted connection attempt fails explicitly rather than falling through to legacy authentication methods or unintended rules.

Fallback Routing Strategies & Validation Workflows

Production deployments require deterministic validation before promoting configuration changes. Use the following operational workflow to verify routing and authentication:

  1. Syntax Validation: pg_ctl reload -D /path/to/data will fail fast on malformed entries. Always run this before expecting live traffic.
  2. Connection Verification: psql "host=publisher dbname=target_db user=cdc_pipeline_user sslmode=require replication=database" -c "IDENTIFY_SYSTEM" (the replication=database keyword opens a replication connection, which is required for IDENTIFY_SYSTEM)
  3. Slot Monitoring: Query SELECT slot_name, active, restart_lsn FROM pg_replication_slots; to confirm active = true and restart_lsn is advancing.
  4. Driver-Level Validation: For Python ETL pipelines, implement exponential backoff with explicit FATAL parsing. Catch psycopg2.OperationalError or pg8000.exceptions.DatabaseError and log the exact pg_hba.conf rejection code.

If a consumer fails to connect, verify the pg_hba.conf order. Place the most specific replication rules above general hostssl all entries. Enable log_connections = on and log_disconnections = on to trace authentication routing in real-time. For high-availability setups, ensure standby nodes inherit identical pg_hba.conf rules to support seamless failover without consumer reconfiguration.