pg_durable: Durable Workflows Inside PostgreSQL Without External Infrastructure

Every backend team eventually builds the same thing: a rickety scaffolding of cron jobs, background workers, message queues, and status tables to keep long-running processes reliable. You’ve probably written it yourself — a pg_cron schedule that kicks off a procedure, a polling loop that checks a status column, retry counters, and a separate worker process that ties it all together. When the database restarts mid-job, you reconstruct state from whatever survived. When one step fails, you figure out which parts completed and which need replay.

Microsoft’s new open-source extension, pg_durable, takes a different approach. Instead of orchestrating workflows from outside PostgreSQL, it brings durable execution inside the database. You define multi-step workflows in SQL, the runtime checkpoints each step to disk, and if anything crashes, execution resumes from the last successful checkpoint — no external infrastructure required.

The Core Idea: Durable Execution in SQL

Durable execution is a pattern popularized by orchestration tools like Temporal and Azure Durable Functions. The key insight is simple: treat each step of a workflow as a recoverable unit. Before executing step N, save enough state that if the process dies, you can pick up at step N instead of step 1.

pg_durable implements this as a PostgreSQL extension built with pgrx (Rust). It registers a background worker process inside PostgreSQL that executes a graph of SQL steps, persisting state to dedicated df.* and duroxide.* schemas. The DSL uses composable operators to chain, parallelize, branch, and loop SQL steps — all with automatic checkpointing between them.

A Quick First Example

After installing the extension and adding pg_durable to shared_preload_libraries, the simplest possible workflow is a single SQL statement:

CREATE EXTENSION pg_durable;

-- Returns an instance ID like 'a1b2c3d4'
SELECT df.start('SELECT ''Hello, durable world!'' AS message');

That one-liner registers the query as a durable function. A background worker picks it up, executes it, and persists the result. If PostgreSQL crashes between submission and completion, the worker resumes execution on restart — no data loss, no manual replay.

Chaining Steps with Operators

Real workflows have multiple steps. The ~> operator chains them sequentially — each step runs only after the previous one completes:

-- ETL pipeline: cleanup old data, mark staging rows, load to target
SELECT df.start(
    'DELETE FROM target WHERE loaded_at < now() - interval ''7 days'''
    ~> 'UPDATE staging SET processed_at = now() WHERE processed_at IS NULL'
    ~> 'INSERT INTO target (data, source_id)
        SELECT data, source_id FROM staging WHERE processed_at IS NOT NULL',
    'etl-pipeline'
);

If the INSERT fails, only step 3 retries — steps 1 and 2 already completed and their results are checkpointed. No need to re-run the cleanup or the staging update.

Passing Data Between Steps

The => operator binds a step’s result to a named variable, and $variable_name substitutes it in subsequent steps:

SELECT df.start(
    'SELECT id FROM orders WHERE status = ''pending'' LIMIT 1'
        |=> 'order_id'

    ~> 'UPDATE orders SET status = ''processing'' WHERE id = $order_id'

    ~> df.sleep(2)

    ~> 'UPDATE orders SET status = ''completed'', processed_at = now()
        WHERE id = $order_id',
    'process-order'
);

Variables are scoped per user via row-level security, so different sessions can run the same workflow template with different data without colliding. You can also set variables before starting a function with df.setvar('name', 'value').

Parallel Execution and Racing

The & operator fans out parallel branches and waits for all of them to complete. This is useful for dashboard data gathering, multi-source aggregation, or any workload where independent queries can run simultaneously:

SELECT df.start(
    (
        'SELECT COUNT(*) AS user_count FROM users'
        & 'SELECT COUNT(*) AS order_count FROM orders'
        & 'SELECT SUM(amount) AS total_revenue FROM orders'
    )
    ~> 'INSERT INTO dashboard_cache (metric, value, computed_at)
        VALUES (''refresh_complete'', now(), now())',
    'dashboard-parallel'
);

The | operator runs branches in parallel too, but returns as soon as any branch completes — a race. This is useful for timeouts and cancellation patterns:

-- Race: whichever signal arrives first wins
df.wait_for_signal('approval', 3600)
| df.wait_for_signal('cancel')

Conditionals and Loops

The ?> and !> operators create if-then-else branches based on a query’s truthiness:

SELECT df.start(
    'SELECT balance > 100 FROM accounts WHERE id = 1'
        ?> 'UPDATE accounts SET tier = ''gold'' WHERE id = 1'
        !> 'UPDATE accounts SET tier = ''standard'' WHERE id = 1',
    'tier-check'
);

For repeating work, the @> prefix creates an infinite loop, and df.loop() supports conditional loops with df.break() for exiting:

-- Scheduled data sync: fetch from an API every 30 minutes, forever
SELECT df.start(
    @> (
        (df.http('https://api.example.com/data', 'GET') |=> 'response')
        ~> 'INSERT INTO external_data (payload) VALUES ($response::jsonb)'
        ~> df.wait_for_schedule('*/30 * * * *')
    ),
    'scheduled-sync'
);

Because the loop state is durably persisted, a PostgreSQL restart mid-loop doesn’t lose progress — the next iteration picks up where it left off.

Human-in-the-Loop Workflows

One of the most practical patterns is pausing for external input. df.wait_for_signal() suspends a workflow until another session sends a signal with df.signal(instance_id, signal_name). This enables approval workflows without any external orchestrator:

-- Start a workflow that waits for human approval
SELECT df.start(
    'INSERT INTO deployments (service, version)
        VALUES (''api'', ''v2.3.1'') RETURNING id' |=> 'deploy_id'
    ~> df.wait_for_signal('approval', 86400) |=> 'sig'  -- 24-hour timeout
    ~> (
        '$sig.timed_out'
            ?> 'UPDATE deployments SET status = ''expired'' WHERE id = $deploy_id'
            !> 'UPDATE deployments SET status = ''approved'' WHERE id = $deploy_id'
    ),
    'deploy-api'
);

-- Later, from another session:
SELECT df.signal('a1b2c3d4', 'approval', '{"approved": true}');

Operational Visibility

All running and completed instances are queryable through standard PostgreSQL tables. No separate dashboard, no separate monitoring system — just SQL:

-- Check status of a specific workflow
SELECT df.status('a1b2c3d4');

-- Get the result of a completed instance
SELECT df.result('a1b2c3d4');

-- List all recent instances with their status
SELECT instance_id, label, status, execution_count
FROM df.list_instances()
ORDER BY created_at DESC
LIMIT 10;

-- Inspect individual step execution (for debugging)
SELECT execution_id, node_id, node_type, status
FROM df.instance_nodes('a1b2c3d4');

When pg_durable Fits (and When It Doesn’t)

pg_durable is designed for workflows that live next to the data they touch. It excels at ETL pipelines, scheduled maintenance, fan-out aggregation, and any multi-step process where the state already lives in PostgreSQL. The zero-infrastructure model — no Redis, no Temporal server, no queue broker — is a significant operational simplification.

It’s less suitable for workflows that span heterogeneous systems outside PostgreSQL, require sub-millisecond synchronous handling, or need arbitrary application logic that doesn’t map cleanly to SQL steps. For those cases, a general-purpose orchestrator is still the right tool.

The extension currently supports PostgreSQL 17 and 18, is licensed under the PostgreSQL License, and is in preview status. The project provides Debian packages, Docker support, and a GitHub Codespace configuration for quick experimentation.

Getting Started

Install the extension, add pg_durable to shared_preload_libraries in your PostgreSQL configuration, restart, and create the extension:

-- In postgresql.conf:
shared_preload_libraries = 'pg_durable'

-- After restart:
CREATE EXTENSION pg_durable;

-- Grant access to application roles (not PUBLIC by default)
SELECT df.grant_usage('app_role');

From there, every workflow starts with df.start() and every step is ordinary SQL. The DSL operators handle the orchestration, checkpointing, and recovery — the parts you’d otherwise build yourself.