Stream PostgreSQL WALs with Zero Data Loss
pgrwl is a PostgreSQL write-ahead log (WAL) receiver written in Go. It’s a drop-in, container-friendly alternative
to pg_receivewal, supporting streaming replication, encryption, compression, and remote storage (S3, SFTP).
Designed for disaster recovery and PITR (Point-in-Time Recovery), pgrwl ensures zero data loss (RPO=0) and seamless
integration with Kubernetes environments.
- About
- Operating Modes
- Usage
- Configuration Reference
- Installation
- Disaster Recovery Use Cases
- Architecture
- Contributing
- Links
- License
-
The project is a production-ready tool for streaming WAL archiving, designed to achieve an RPO of 0 during recovery.
-
It's primarily designed for use in containerized environments.
-
The utility replicates all key features of
pg_receivewal, including automatic reconnection on connection loss, streaming into partial files, extensive error checking and more. -
Install as a single binary. Debug with your favorite editor and a local PostgreSQL container (local-dev-infra).
basic dashboard
pgrwl intentionally uses three separate operating modes:
receiveservebackup
This separation is important.
The WAL receiver should stay focused on one critical job: receiving WAL from PostgreSQL and safely writing it to disk/storage. Running base backups alongside the receiver may look convenient, but it can create unnecessary contention for network bandwidth, disk I/O, CPU, compression, encryption, and object-storage uploads. During heavy backup activity, the receiver must still keep up with WAL streaming, otherwise replication lag can grow or WAL retention pressure can increase on PostgreSQL.
The serve mode is different from receive mode.
It is used when the receive loop is stopped and recovery needs access to the WAL files that are still present in the
receiver directory. This is especially important in Kubernetes setups where the receiver uses a ReadWriteOnce volume.
Only that pod can access the local files, and the directory may contain completed
WAL files or .partial files that have not been uploaded yet.
In this situation, pgrwl can be switched to serve mode to safely expose the available WAL files for recovery.
The pod becomes a controlled WAL file server for recovery purposes.
The backup mode keeps base backup creation isolated from the WAL receive loop. It can be scheduled separately, run as
a Kubernetes StatefulSet, and interact with the receiver through the API when needed. This keeps the architecture
simpler, safer, and easier to reason about.
pgrwl running in receive mode
pgrwl running in serve mode
pgrwl running in backup mode
See examples
Expand the docker-compose.yml section below, copy the file content into docker-compose.yml,
then run: docker compose up -d
docker-compose.yml
# docker-compose.yml
#
# Local end-to-end pgrwl playground.
#
# It starts:
# - PostgreSQL primary
# - WAL traffic generator
# - pgrwl receiver
# - pgrwl backup worker
# - pgrwl dashboard UI
# - SeaweedFS S3-compatible storage
# - SeaweedFS admin dashboard
#
# Useful URLs:
# pgrwl dashboard: http://localhost:8585/ui
# SeaweedFS admin: http://localhost:23646
# SeaweedFS filer: http://localhost:8888
# SeaweedFS bucket view: http://localhost:8888/buckets/backups/
# SeaweedFS S3 API: http://localhost:8333
# PostgreSQL: localhost:15432
services:
# ---------------------------------------------------------------------------
# PostgreSQL primary
# ---------------------------------------------------------------------------
pg-primary:
image: postgres:17.9-bookworm
container_name: pg-primary
restart: unless-stopped
environment:
TZ: "Asia/Aqtau"
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
ports:
- "15432:5432"
volumes:
- pg-primary-data:/var/lib/postgresql/17/main
command: >
-c config_file=/etc/postgresql/postgresql.conf
-c hba_file=/etc/postgresql/pg_hba.conf
configs:
- source: pg_hba.conf
target: /etc/postgresql/pg_hba.conf
mode: "0755"
- source: postgresql.conf
target: /etc/postgresql/postgresql.conf
mode: "0755"
healthcheck:
test: [ "CMD", "pg_isready", "-U", "postgres" ]
interval: 2s
timeout: 2s
retries: 10
# ---------------------------------------------------------------------------
# WAL generator
#
# This service continuously writes data into PostgreSQL and forces WAL switches.
# It exists only to make local testing visible and active.
# ---------------------------------------------------------------------------
wal-generator:
image: postgres:17.9-bookworm
container_name: wal-generator
restart: unless-stopped
environment:
TZ: "Asia/Aqtau"
PGHOST: pg-primary
PGPORT: 5432
PGUSER: postgres
PGPASSWORD: postgres
PGDATABASE: postgres
INTERVAL_SECONDS: 5
configs:
- source: wal-generator.sh
target: /scripts/generate-wal.sh
mode: "0755"
command:
- /bin/sh
- /scripts/generate-wal.sh
depends_on:
pg-primary:
condition: service_healthy
# ---------------------------------------------------------------------------
# pgrwl receiver
#
# Streams PostgreSQL WAL files and uploads completed WAL segments to S3.
# ---------------------------------------------------------------------------
pgrwl-receive:
container_name: pgrwl-receive
image: quay.io/pgrwl/pgrwl:1.0.33
restart: unless-stopped
environment:
TZ: "Asia/Aqtau"
PGHOST: pg-primary
PGPORT: 5432
PGUSER: postgres
PGPASSWORD: postgres
ports:
- "7070:7070"
command: daemon -c /etc/pgrwl-receive-config.yaml -m receive
configs:
- source: pgrwl-receive-config.yaml
target: /etc/pgrwl-receive-config.yaml
mode: "0755"
volumes:
- pgrwl-wal-archive-data:/mnt
depends_on:
pg-primary:
condition: service_healthy
seaweedfs-provision:
condition: service_completed_successfully
# ---------------------------------------------------------------------------
# pgrwl backup worker
#
# Runs scheduled base backups and stores them in the same S3 bucket.
# ---------------------------------------------------------------------------
pgrwl-backup:
container_name: pgrwl-backup
image: quay.io/pgrwl/pgrwl:1.0.33
restart: unless-stopped
environment:
TZ: "Asia/Aqtau"
PGHOST: pg-primary
PGPORT: 5432
PGUSER: postgres
PGPASSWORD: postgres
ports:
- "7071:7070"
command: daemon -c /etc/pgrwl-backup-config.yaml -m backup
configs:
- source: pgrwl-backup-config.yaml
target: /etc/pgrwl-backup-config.yaml
mode: "0755"
depends_on:
pg-primary:
condition: service_healthy
seaweedfs-provision:
condition: service_completed_successfully
# ---------------------------------------------------------------------------
# pgrwl dashboard
#
# Reads receiver/backup status over the internal Docker Compose network.
# Open: http://localhost:8585/ui
# ---------------------------------------------------------------------------
pgrwl-ui:
container_name: pgrwl-ui
image: quay.io/pgrwl/ui:1.0.33
restart: unless-stopped
environment:
TZ: "Asia/Aqtau"
PGRWL_UI_CONFIG_PATH: /etc/pgrwl-ui-config.yaml
ports:
- "8585:8585"
configs:
- source: pgrwl-ui-config.yaml
target: /etc/pgrwl-ui-config.yaml
mode: "0755"
# ---------------------------------------------------------------------------
# SeaweedFS
#
# Runs SeaweedFS in all-in-one mode with S3 support enabled.
# This is convenient for local testing and behaves like a lightweight
# S3-compatible object storage service.
# ---------------------------------------------------------------------------
seaweedfs:
image: chrislusf/seaweedfs:4.21
container_name: seaweedfs
restart: unless-stopped
command:
- server
- -s3
- -dir=/data
- -ip=seaweedfs
- -ip.bind=0.0.0.0
- -master.port=9333
- -volume.port=8080
- -filer.port=8888
- -s3.port=8333
- -s3.config=/etc/seaweedfs/s3.json
ports:
- "9333:9333" # master UI/API
- "8080:8080" # volume UI/API
- "8888:8888" # filer UI/API
- "8333:8333" # S3 API
volumes:
- seaweedfs-data:/data
configs:
- source: seaweedfs-config.json
target: /etc/seaweedfs/s3.json
mode: "0444"
healthcheck:
test: [ "CMD", "wget", "-q", "-O", "-", "http://127.0.0.1:8888/" ]
interval: 3s
timeout: 2s
retries: 40
# ---------------------------------------------------------------------------
# SeaweedFS admin dashboard
#
# Open: http://localhost:23646
# ---------------------------------------------------------------------------
seaweedfs-admin:
image: chrislusf/seaweedfs:4.21
container_name: seaweedfs-admin
restart: unless-stopped
command:
- admin
- -port=23646
- -port.grpc=33646
- -master=seaweedfs:9333
- -dataDir=/data
ports:
- "23646:23646"
- "33646:33646"
volumes:
- seaweedfs-admin-data:/data
depends_on:
seaweedfs:
condition: service_healthy
# ---------------------------------------------------------------------------
# SeaweedFS bucket provisioning
#
# Creates the S3 bucket used by pgrwl.
# ---------------------------------------------------------------------------
seaweedfs-provision:
image: chrislusf/seaweedfs:4.21
container_name: seaweedfs-provision
restart: "no"
depends_on:
seaweedfs:
condition: service_healthy
environment:
BUCKETS: "backups"
entrypoint: [ "/bin/sh" ]
command:
- -ec
- |
echo "waiting for SeaweedFS shell..."
until echo "cluster.ps" | weed shell \
-master=seaweedfs:9333 \
-filer=seaweedfs:8888 >/dev/null 2>&1; do
echo "SeaweedFS shell is not ready yet..."
sleep 2
done
for bucket in ${BUCKETS}; do
echo "creating bucket: ${bucket}"
echo "s3.bucket.create -name ${bucket}" | weed shell \
-master=seaweedfs:9333 \
-filer=seaweedfs:8888 || true
done
echo "created buckets:"
echo "s3.bucket.list" | weed shell \
-master=seaweedfs:9333 \
-filer=seaweedfs:8888
volumes:
pgrwl-wal-archive-data:
pg-primary-data:
seaweedfs-data:
seaweedfs-admin-data:
configs:
pg_hba.conf:
content: |
local all all trust
local replication all trust
host all all all trust
host replication all all trust
postgresql.conf:
content: |
# log_error_verbosity:
# TERSE, DEFAULT, VERBOSE
# log_min_messages:
# DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1,
# INFO, NOTICE, WARNING, ERROR, LOG, FATAL, PANIC
listen_addresses = '*'
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'pg.log'
log_lock_waits = on
log_temp_files = 0
log_checkpoints = on
log_connections = off
log_destination = 'stderr'
log_error_verbosity = 'DEFAULT'
log_hostname = off
log_min_messages = 'WARNING'
log_timezone = 'Asia/Aqtau'
log_line_prefix = '%t [%p-%l] %r %q%u@%d '
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 64MB
log_replication_commands = on
datestyle = 'iso, mdy'
timezone = 'Asia/Aqtau'
shared_preload_libraries = 'pg_stat_statements'
seaweedfs-config.json:
content: |
{
"identities": [
{
"name": "pgrwl",
"credentials": [
{
"accessKey": "pgrwl",
"secretKey": "pgrwl-secret"
}
],
"actions": [
"Admin",
"Read",
"Write",
"List",
"Tagging"
]
}
]
}
pgrwl-backup-config.yaml:
content: |
backup:
cron: '* * * * *'
retention:
enable: true
type: time
value: 10m
log:
format: text
level: info
main:
directory: "/mnt/wal-archive"
listen_port: 7070
storage:
compression:
algo: gzip
name: s3
s3:
url: http://seaweedfs:8333
access_key_id: pgrwl
secret_access_key: pgrwl-secret
bucket: backups
region: us-east-1
use_path_style: true
disable_ssl: true
pgrwl-receive-config.yaml:
content: |
main:
listen_port: 7070
directory: "/mnt/wal-archive"
receiver:
slot: pgrwl_v5
no_loop: true
uploader:
sync_interval: 10s
max_concurrency: 4
retention:
enable: false
sync_interval: 10s
keep_period: "5m"
log:
level: trace
format: text
add_source: true
metrics:
enable: false
storage:
name: s3
compression:
algo: gzip
encryption:
algo: aes-256-gcm
pass: qwerty123
s3:
url: http://seaweedfs:8333
access_key_id: pgrwl
secret_access_key: pgrwl-secret
bucket: backups
region: us-east-1
use_path_style: true
disable_ssl: true
pgrwl-ui-config.yaml:
content: |
listen_addr: ":8585"
receivers:
- label: localhost
addr: http://pgrwl-receive:7070
wal-generator.sh:
content: |
#!/usr/bin/env sh
set -eu
echo "starting WAL generator"
wait_for_postgres() {
echo "waiting for PostgreSQL to become ready..."
until pg_isready; do
echo "PostgreSQL is not ready yet, sleeping..."
sleep 2
done
until psql \
-v ON_ERROR_STOP=1 \
-c "SELECT 1;"; do
echo "PostgreSQL accepts connections, but query failed, sleeping..."
sleep 2
done
echo "PostgreSQL is ready"
}
wait_for_postgres
while true; do
echo "generating WAL at $(date -Iseconds)"
psql \
-v ON_ERROR_STOP=1 \
-c "DROP TABLE IF EXISTS tmp_test_data_table_gen;" \
-c "CREATE TABLE IF NOT EXISTS tmp_test_data_table_gen (id serial, payload text);" \
-c "INSERT INTO tmp_test_data_table_gen(payload) SELECT md5(random()::text) FROM generate_series(1, 10000);" \
-c "SELECT pg_switch_wal();"
sleep "${INTERVAL_SECONDS}"
done| Service | URL | Description |
|---|---|---|
| pgrwl dashboard | http://localhost:8585/ui | Receiver and backup overview |
| SeaweedFS admin | http://localhost:23646 | SeaweedFS cluster/storage dashboard |
| SeaweedFS filer | http://localhost:8888 | Browse files stored by SeaweedFS |
| SeaweedFS bucket view | http://localhost:8888/buckets/backups/ | Browse uploaded WALs and backups |
| SeaweedFS S3 API | http://localhost:8333 | S3-compatible API endpoint |
| PostgreSQL | psql -U postgres -h localhost -p 15432 |
PostgreSQL primary instance |
restore_command example for postgresql.conf:
# where 'k8s-worker5:30266' represents the host and port
# of a 'pgrwl' instance running in 'serve' mode.
restore_command = 'pgrwl restore-command --serve-addr=k8s-worker5:30266 %f %p'The configuration file is in JSON or YML format (*.json is preferred).
It supports environment variable placeholders like ${PGRWL_SECRET_ACCESS_KEY}.
You may either use pgrwl daemon -c config.yml -m receive or provide the corresponding environment variables and run
pgrwl daemon.
main: # Required for both modes: receive/serve
listen_port: 7070 # HTTP server port (used for management)
directory: "/var/lib/pgwal" # Base directory for storing WAL files
receiver: # Required for 'receive' mode
slot: replication_slot # Replication slot to use
no_loop: false # If true, do not loop on connection loss
uploader: # Required for non-local storage type
sync_interval: 10s # Interval for the upload worker to check for new files
max_concurrency: 4 # Maximum number of files to upload concurrently
retention: # Optional
enable: true # Perform retention rules
sync_interval: 10s # Interval for the retention worker (shouldn't run frequently - 12h is typically sufficient)
keep_period: "1m" # Remove WAL files older than given period
backup: # Required for 'backup' mode
cron: "0 0 */3 * *" # Basebackup cron schedule
retention: # Optional
enable: true # Perform retention rules
type: time # One of: (time / count)
value: "48h" # Remove backups older than given period (if time), keep last N backups (if count)
keep_last: 1 # Always keep last N backups (suitable when 'retention.type = time')
walretention: # Optional (WAL archive cleanup settings)
enable: true # After basebackup is done, cleanup WAL-archive by oldest backup stop-LSN
receiver_addr: "pgrwl-receive:7070" # Address or WAL-receiver instance (required when manage_cleanup is set to true)
log: # Optional
level: info # One of: (trace / debug / info / warn / error)
format: text # One of: (text / pretty / json)
add_source: true # Include file:line in log messages (for local development)
metrics: # Optional
enable: true # Optional (used in receive mode: http://host:port/metrics)
devconfig: # Optional (various dev options)
pprof: # pprof settings
enable: true # Enable pprof handlers
storage: # Optional
name: s3 # One of: (s3 / sftp)
compression: # Optional
algo: gzip # One of: (gzip / zstd)
encryption: # Optional
algo: aes-256-gcm # One of: (aes-256-gcm)
pass: "${PGRWL_ENCRYPT_PASSWD}" # Encryption password (from env)
sftp: # Required section for 'sftp' storage
host: sftp.example.com # SFTP server hostname
port: 22 # SFTP server port
user: backupuser # SFTP username
pass: "${PGRWL_VM_PASSWORD}" # SFTP password (from env)
pkey_path: "/home/user/.ssh/id_rsa" # Path to SSH private key (optional)
pkey_pass: "${PGRWL_SSH_PKEY_PASS}" # Required if the private key is password-protected
base_dir: "/mnt/wal-archive" # Base directory with sufficient user permissions
s3: # Required section for 's3' storage
url: https://s3.example.com # S3-compatible endpoint URL
access_key_id: AKIAEXAMPLE # AWS access key ID
secret_access_key: "${PGRWL_AWS_SK}" # AWS secret access key (from env)
bucket: postgres-backups # Target S3 bucket name
region: us-east-1 # S3 region
use_path_style: true # Use path-style URLs for S3
disable_ssl: false # Disable SSL
Corresponding env-vars.
PGRWL_DAEMON_MODE # receive/serve/backup
PGRWL_MAIN_LISTEN_PORT # HTTP server port (used for management)
PGRWL_MAIN_DIRECTORY # Base directory for storing WAL files
PGRWL_RECEIVER_SLOT # Replication slot to use
PGRWL_RECEIVER_NO_LOOP # If true, do not loop on connection loss
PGRWL_RECEIVER_UPLOADER_SYNC_INTERVAL # Interval for the upload worker to check for new files
PGRWL_RECEIVER_UPLOADER_MAX_CONCURRENCY # Maximum number of files to upload concurrently
PGRWL_RECEIVER_RETENTION_ENABLE # Perform retention rules
PGRWL_RECEIVER_RETENTION_SYNC_INTERVAL # Interval for the retention worker (shouldn't run frequently - 12h is typically sufficient)
PGRWL_RECEIVER_RETENTION_KEEP_PERIOD # Remove WAL files older than given period
PGRWL_BACKUP_CRON # Basebackup cron schedule
PGRWL_BACKUP_RETENTION_ENABLE # Perform retention rules
PGRWL_BACKUP_RETENTION_TYPE # One of: (time / count)
PGRWL_BACKUP_RETENTION_VALUE # Remove backups older than given period (if time), keep last N backups (if count)
PGRWL_BACKUP_RETENTION_KEEP_LAST # Always keep last N backups (suitable when 'retention.type = time')
PGRWL_BACKUP_WALRETENTION_ENABLE # After basebackup is done, cleanup WAL-archive by oldest backup stop-LSN
PGRWL_BACKUP_WALRETENTION_RECEIVER_ADDR # Address or WAL-receiver instance (required when manage_cleanup is set to true)
PGRWL_LOG_LEVEL # One of: (trace / debug / info / warn / error)
PGRWL_LOG_FORMAT # One of: (text / pretty / json)
PGRWL_LOG_ADD_SOURCE # Include file:line in log messages (for local development)
PGRWL_METRICS_ENABLE # Optional (used in receive mode: http://host:port/metrics)
PGRWL_DEVCONFIG_PPROF_ENABLE # Enable pprof handlers
PGRWL_STORAGE_NAME # One of: (s3 / sftp)
PGRWL_STORAGE_COMPRESSION_ALGO # One of: (gzip / zstd)
PGRWL_STORAGE_ENCRYPTION_ALGO # One of: (aes-256-gcm)
PGRWL_STORAGE_ENCRYPTION_PASS # Encryption password (from env)
PGRWL_STORAGE_SFTP_HOST # SFTP server hostname
PGRWL_STORAGE_SFTP_PORT # SFTP server port
PGRWL_STORAGE_SFTP_USER # SFTP username
PGRWL_STORAGE_SFTP_PASS # SFTP password (from env)
PGRWL_STORAGE_SFTP_PKEY_PATH # Path to SSH private key (optional)
PGRWL_STORAGE_SFTP_PKEY_PASS # Required if the private key is password-protected
PGRWL_STORAGE_SFTP_BASE_DIR # Base directory with sufficient user permissions
PGRWL_STORAGE_S3_URL # S3-compatible endpoint URL
PGRWL_STORAGE_S3_ACCESS_KEY_ID # AWS access key ID
PGRWL_STORAGE_S3_SECRET_ACCESS_KEY # AWS secret access key (from env)
PGRWL_STORAGE_S3_BUCKET # Target S3 bucket name
PGRWL_STORAGE_S3_REGION # S3 region
PGRWL_STORAGE_S3_USE_PATH_STYLE # Use path-style URLs for S3
PGRWL_STORAGE_S3_DISABLE_SSL # Disable SSL
Dashboard configuration example:
PGRWL_UI_CONFIG_PATH env-var is used to discover config (default: ./pgrwl-ui.yaml)
listen_addr: ":8080"
receivers:
- label: localhost
addr: http://127.0.0.1:7070
- label: prod-db-01
addr: http://10.0.0.11:9090
docker pull quay.io/pgrwl/pgrwl:latestSee pgrwl helm-chart
helm repo add pgrwl https://pgrwl.github.io/charts
helm repo update pgrwl
helm search repo pgrwlTo install the chart with the release name pgrwl:
helm upgrade pgrwl pgrwl/pgrwl \
--install --debug --atomic --wait --timeout=10m \
--namespace=pgrwl- Download the latest binary for your platform from the Releases page.
- Place the binary in your system's
PATH(e.g.,/usr/local/bin).
requires: tar, curl, jq
(
set -euo pipefail
OS="$(uname | tr '[:upper:]' '[:lower:]')"
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')"
TAG="$(curl -s https://api.github.com/repos/pgrwl/pgrwl/releases/latest | jq -r .tag_name)"
curl -L "https://github.com/pgrwl/pgrwl/releases/download/${TAG}/pgrwl_${TAG}_${OS}_${ARCH}.tar.gz" |
tar -xzf - -C /usr/local/bin && \
chmod +x /usr/local/bin/pgrwl
)sudo apt update -y && sudo apt install -y curl
curl -LO https://github.com/pgrwl/pgrwl/releases/latest/download/pgrwl_linux_amd64.deb
sudo dpkg -i pgrwl_linux_amd64.debapk update && apk add --no-cache bash curl
curl -LO https://github.com/pgrwl/pgrwl/releases/latest/download/pgrwl_linux_amd64.apk
apk add pgrwl_linux_amd64.apk --allow-untrustedThe full process may look like this (a typical, rough, and simplified example):
-
A typical production setup runs two
pgrwlStatefulSets in the cluster: one inreceivemode for continuous WAL streaming, and another inbackupmode for scheduled base backups. -
In
receivemode,pgrwlcontinuously streams WAL files, applies optional compression and encryption, uploads them to remote storage (such as S3 or SFTP), and enforces retention policies - for example, keeping WAL files for four days. -
In
backupmode, it performs a full base backup of your PostgreSQL cluster on a configured schedule - for instance, once every three days - using streaming basebackup, with optional compression and encryption. The resulting backup is also uploaded to the configured remote storage, and subject to retention policies for cleanup. The built-in cron scheduler enables fully automated backups without requiring external orchestration. -
During recovery, the same
receiveStatefulSet can be reconfigured to run inservemode, exposing previously archived WALs via HTTP to support Point-in-Time Recovery (PITR) throughrestore_command. -
With this setup, you're able to restore your cluster - in the event of a crash - to any second within the past three days, using the most recent base backup and available WAL segments.
pgrwl is designed to always stream WAL data to the local filesystem first. This design ensures durability and
correctness, especially in synchronous replication setups where PostgreSQL waits for the replica to confirm the commit.
- Incoming WAL data is written directly to
*.partialfiles in a local directory. - These
*.partialfiles are synced (fsync) after each write to ensure that WAL segments are fully durable on disk. - Once a WAL segment is fully received, the
*.partialsuffix is removed, and the file is considered complete.
Compression and encryption are applied only after a WAL segment is completed:
- Completed files are passed to the uploader worker, which may compress and/or encrypt them before uploading to a remote backend (e.g., S3, SFTP).
- The uploader worker ignores partial files and operates only on finalized, closed segments.
This model avoids the complexity and risk of streaming incomplete WAL data directly to remote storage, which can lead to
inconsistencies or partial restores. By ensuring that all WAL files are locally durable and only completed files are
uploaded, pgrwl guarantees restore safety and clean segment handoff for disaster recovery.
In short: PostgreSQL requires acknowledgments for commits in synchronous setups, and relying on external systems for critical paths (like WAL streaming) could introduce unacceptable delays or failures. This architecture mitigates that risk.
- After each WAL segment is written, an
fsyncis performed on the currently open WAL file to ensure durability. - An
fsyncis triggered when a WAL segment is completed and the*.partialfile is renamed to its final form. - An
fsyncis triggered when a keepalive message is received from the server with thereply_requestedoption set. - Additionally,
fsyncis called whenever an error occurs during the receive-copy loop.
There’s a significant difference between using archive_command and archiving WAL files via the streaming replication
protocol.
The archive_command is triggered only after a WAL file is fully completed-typically when it reaches 16 MiB (the
default segment size). This means that in a crash scenario, you could lose up to 16 MiB of data.
You can mitigate this by setting a lower archive_timeout (e.g., 1 minute), but even then, in a worst-case scenario,
you risk losing up to 1 minute of data.
Also, it’s important to note that PostgreSQL preallocates WAL files to the configured wal_segment_size, so they are
created with full size regardless of how much data has been written. (Quote from documentation:
It is therefore unwise to set a very short archive_timeout - it will bloat your archive storage.).
In contrast, streaming WAL archiving-when used with replication slots and the synchronous_standby_names
parameter-ensures that the system can be restored to the latest committed transaction.
This approach provides true zero data loss (RPO=0), making it ideal for high-durability requirements.
Contributions are welcomed and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.
Check also the Developer Notes for additional information and guidelines.
- pg_receivewal Documentation
- pg_receivewal Source Code
- Streaming Replication Protocol
- Continuous Archiving and Point-in-Time Recovery
- Setting Up WAL Archiving
MIT License. See LICENSE for details.