Building a Python logical decoding plugin

A logical decoding output plugin is a C shared library PostgreSQL loads into the backend to translate WAL changes into your own wire format, and this page shows how to embed a CPython interpreter inside that plugin so change transformation runs in Python without leaving the walsender process — the in-process counterpart to the client-side decoder covered in Python CDC parser development. Get the memory and reference-counting boundaries wrong and you do not lose an event; you corrupt the backend that owns the replication slot and take the primary down with it.

PostgreSQL’s logical decoding subsystem is architecturally constrained to C because output plugins run in the backend with direct access to reorder-buffer memory, strict transaction isolation, and zero-copy tuple pointers. Embedding CPython buys you native Python transformation — schema-evolution handling, per-table routing, contract enforcement — with none of the IPC or serialization cost of shipping raw WAL to an external process first. The price is that your Python code now executes under PostgreSQL’s memory-context and error-handling rules, not CPython’s, and every PyObject* you create crosses a boundary the backend does not manage for you.

Callback and Memory Semantics

An output plugin exports _PG_output_plugin_init(OutputPluginCallbacks *cb) and declares PG_MODULE_MAGIC. PostgreSQL invokes the callbacks you populate for every transaction the slot decodes; the plugin must target PostgreSQL 14+ and Python 3.10+ (Python 3.9 reached end of life in October 2025) for a stable C-API and the current OutputPluginCallbacks shape. The table below fixes what each callback guarantees and where its allocations must live — the semantics that decide whether an embedded interpreter is safe or a slow-motion memory corruption.

Callback	Invoked	Memory / durability guarantee	Embedded-Python behavior
`startup_cb`	Once, when the slot begins streaming	Allocate long-lived state in `TopMemoryContext` via `MemoryContextAlloc`; store on `ctx->output_plugin_private`	`Py_Initialize()` here; import the transform module once; cache the module `PyObject*` for the session
`begin_cb`	Start of each decoded transaction	Per-transaction context; nothing durable yet	Open a per-transaction Python buffer/list; record `commit_lsn` placeholder
`change_cb`	Per row change (`ReorderBufferChange`)	Tuple memory is PostgreSQL-owned and valid only for this call	Build a dict from `Datum` values, push to the queue, then `Py_DECREF` before returning
`truncate_cb`	On `TRUNCATE` of a published table (PG 14+)	Same per-change lifetime	Emit a truncate marker; do not treat as a row image
`commit_cb`	Transaction commit, in commit order	The change is durable on the primary; safe to release WAL only after your sink confirms	Serialize the buffered batch, hand it to the sink, clear the buffer
`shutdown_cb`	Slot detaches / session ends	Free everything allocated in `startup_cb`	`Py_FinalizeEx()`; clear the interpreter to prevent bloat across reconnects

Two rules are non-negotiable. First, memory ownership never crosses without a context: state that outlives a single callback must be allocated in TopMemoryContext, never with malloc or CPython’s default allocator, or it is silently reclaimed under a long-running session. Second, every PyObject* created inside change_cb must be reference-counted to zero before the callback returns — call Py_DECREF immediately after pushing the payload downstream. A missed decrement is not a leak you notice next week; it is unbounded heap growth inside the backend process that decodes your WAL stream mechanics. Resolve column types with TupleDescAttr and the DatumGet* macros, assemble the row with Py_BuildValue("{s:O,s:O}", "schema", py_schema, "data", py_row), and keep a Python-side schema cache keyed by relid and relfilenode so a RELATION shape change triggers a registry reload rather than a decode error.

Diagnostic Patterns

Because the interpreter runs inside the backend, its health shows up in server-side views, not application logs. Watch the slot and the backend’s memory contexts; a plugin that leaks or stalls is visible before it is fatal.

Slot liveness and retained WAL. An embedded plugin that blocks in change_cb pins restart_lsn exactly like a stalled client would:

sql

SELECT
  slot_name,
  active,                                              -- expect true while streaming
  pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
  ) AS retained_wal,                                   -- ALERT: > 25% of max_slot_wal_keep_size
  pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
  ) AS unconfirmed
FROM pg_replication_slots
WHERE slot_name = 'python_cdc_slot';
-- active = false while retained_wal climbs means the plugin stalled inside a callback.

Backend memory growth — the refcount-leak signature. The walsender backend running the plugin should hold steady memory across transactions. A monotonically rising total for a single PID is the fingerprint of a missed Py_DECREF:

sql

-- PG 14+: inspect the backend running the slot (join pg_stat_activity to find the pid).
SELECT name, pg_size_pretty(total_bytes) AS total, pg_size_pretty(used_bytes) AS used
FROM pg_backend_memory_contexts
ORDER BY total_bytes DESC
LIMIT 10;
-- ALERT when TopMemoryContext (or a plugin-owned child) grows across many commits
-- while transaction volume is flat — that is a leaked PyObject or C allocation.

Isolate a decode bug without acknowledging. peek reads what the slot would emit next without advancing it, so you can tell a plugin defect from a slot problem:

sql

SELECT lsn, xid, left(data, 120) AS preview
FROM pg_logical_slot_peek_changes('python_cdc_slot', NULL, 10);
-- Clean output here but a crashing plugin means the fault is in your C/Python code path.

Track the interpreter’s own view with gc.get_stats() and sys.getrefcount() on a debug endpoint, and export both server metrics to the dashboards described under async monitoring integration so backend memory and slot lag alert together.

Safe Deployment Sequence

A plugin is loaded into every backend that decodes the slot, so a bad build is a database-wide event. Roll it out deterministically with an explicit revert at each step.

Enable logical decoding (one-time, requires restart). Provision slots and senders before the first consumer attaches, and cap WAL retention so a stalled plugin cannot fill the disk:

sql

ALTER SYSTEM SET wal_level = 'logical';           -- restart required
ALTER SYSTEM SET max_replication_slots = '10';
ALTER SYSTEM SET max_wal_senders = '10';
ALTER SYSTEM SET max_slot_wal_keep_size = '10GB'; -- PG 13+: bound retention for a lagging slot

Revert: reset wal_level to replica and restart only after every logical slot is dropped.

Compile against the exact server and Python ABI. Mismatched pg_config or python3-config output is the most common load failure:
bash
```
gcc -shared -fPIC \
  -I"$(pg_config --includedir-server)" \
  $(python3-config --includes) \
  plugin.c -o plugin.so \
  $(python3-config --ldflags --embed)
```
Revert: keep the previous plugin.so; deployment is a file swap, so rollback is restoring the prior artifact.
Stage on a replica or throwaway slot first. Install to the library directory and validate against a disposable slot before any production consumer sees it:
bash
```
cp plugin.so "$(pg_config --pkglibdir)/"
pg_recvlogical -d app --slot=plugin_smoke --create-slot -P plugin
pg_recvlogical -d app --slot=plugin_smoke --start -f - | head
```
Revert: pg_recvlogical --slot=plugin_smoke --drop-slot leaves production untouched.
Cut the production slot over. Create the real slot only once the smoke slot decodes cleanly, matching the provisioning discipline in initializing replication slots:
sql
```
SELECT * FROM pg_create_logical_replication_slot('python_cdc_slot', 'plugin');
```
Revert: stop the consumer, SELECT pg_drop_replication_slot('python_cdc_slot'), and redeploy the prior .so. Never restart the backend to force-unload a plugin mid-transaction.

Restrict the connecting role to REPLICATION plus SELECT on the published tables — the same least-privilege boundary detailed under security boundaries and permissions — and confirm the publication enumerates only the tables the plugin is built to transform.

Pipeline Integration

Inside commit_cb, the embedded interpreter hands each decoded batch to Python ETL code, and the same failover discipline a standalone consumer needs applies here — except a stall now blocks the backend directly, so backpressure and a fallback path are mandatory, not optional.

python

import queue

_q = queue.SimpleQueue()          # bounded by an explicit high-water mark
_HIGH_WATER = 10_000

def on_change(row: dict) -> None:
    """Called from change_cb via the FFI boundary, once per row."""
    # PostgreSQL pauses WAL decoding while this callback runs, so blocking here
    # applies natural backpressure instead of dropping changes (there is no
    # 'skip output' return path in the decoding API).
    while _q.qsize() >= _HIGH_WATER:
        _drain_once()             # push a batch to the sink before accepting more
    _q.put(row)

def _drain_once() -> None:
    batch, n = [], 0
    while n < 500:
        try:
            batch.append(_q.get_nowait()); n += 1
        except queue.Empty:
            break
    if batch:
        sink.upsert(batch)        # idempotent: ON CONFLICT (pk) DO UPDATE, keyed by __lsn

Route decoded events by relid to Kafka topics with a static map or a metadata service, produce with confluent-kafka using enable.idempotence=true and acks=all, and align partition keys to the source primary key so commit order survives the network — the partitioning and dead-letter mechanics are covered under event routing and Kafka integration. Make the sink write idempotent (ON CONFLICT ... DO UPDATE gated on the carried __lsn) so a replay after a crash is a no-op rather than a duplicate. For a registry-validated typed contract, hand the batch to the JSON to Avro transformation stage before it leaves the process; to interoperate with existing consumers, shape the envelope to match the Debezium connector output.

Wrap interpreter startup in a circuit breaker: if Py_Initialize() fails or the transform module raises during import, log via ereport(ERROR, ...), detach the slot cleanly, and fall back to a pre-compiled C-only passthrough decoder rather than crashing the backend. The byte-level tuple decoding those callbacks depend on is documented in parsing pgoutput format with psycopg2, which applies identically whether the tuples are read in-process or over a replication connection.

Authoritative references

PostgreSQL logical decoding output plugins — the OutputPluginCallbacks contract and callback invocation order.
PostgreSQL logical decoding concepts — slots, reorder buffering, and WAL retention semantics behind the plugin.
Python C-API initialization — Py_Initialize/Py_FinalizeEx, thread state, and sub-interpreter isolation for embedding.
PostgreSQL server programming: memory contexts — why TopMemoryContext allocation, not malloc, is mandatory for long-lived plugin state.

Parsing pgoutput Format with psycopg2 — byte-level decoding of BEGIN/RELATION/INSERT/UPDATE/DELETE/COMMIT tuples the plugin’s callbacks produce.
JSON to Avro Transformation — turning the decoded change envelope into a registry-validated, typed contract.
Event Routing & Kafka Integration — partitioning for ordering, dead-letter queues, and delivery guarantees for the plugin’s output.
Replication Slot Types — the slot mechanics that determine WAL retention behind an output plugin.

← Back to Python CDC Parser Development

Callback and Memory Semantics #

Diagnostic Patterns #

Safe Deployment Sequence #

Pipeline Integration #

Authoritative references #

Related #

Callback and Memory Semantics

Diagnostic Patterns

Safe Deployment Sequence

Pipeline Integration

Authoritative references

Related