Reading Time: 3 minutes

This quick guide won’t replace the documentation. It provides an example of PostgreSQL backup, PITR configuration & restore.

Table of Contents

2.2. Delete the DB data & restore full backup

2.3. Start the DB and monitor the Point-in-time recovery

2.4. Open the database

3. Conclusion

Backup configuration

This is not intended to supersede any backup configuration manual, but before recovery, let’s check how the backup is running.

The /mnt/cephfs is a shared filesystem on which I place the backups and the archivelogs. Archivelogs are saved after each WAL rotation. Let’s review the config.

Full Backup

A full backup is triggered every 2 weeks via the postgres crontab. ( crontab -e as the postgres user)

# m h  dom mon dow   command
0 4 1,15 * * /var/lib/postgresql/bin/backup_database.sh && find /mnt/cephfs/backup/postgresql/archivelogs -mtime +30 -delete

$ cat /var/lib/postgresql/bin/backup_database.sh
#!/bin/bash

DIR=/mnt/cephfs/backup/postgresql/$( date +%F_%Hh%M )
mkdir "$DIR"
pg_basebackup -z -Z8 -Ft -D "$DIR"

WAL backup

postgres@minetest:~$ grep ^archive /etc/postgresql/14/main/postgresql.conf
archive_mode = on               # enables archiving; off, on, or always
archive_command = 'nice xz -1 -zc %p > /mnt/cephfs/backup/postgresql/archivelogs/%f.xz  && nice xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz'

Restore

Check that you have the restore_command aligned with your archive_command (location, compression, etc…), and set the recovery_target_time. The stop point must be after the ending time of the base backup.

You can see here a simple one line command that

checks for the compressed WAL / archivelog integrity (-t)
decompresses the archive and ‘cat’ the output to stdout (`xz -dc`)
sends the decompressed output to the WAL file path to restore to (> %p)

postgres@minetest:~$ grep -E '^(restore|recovery)' /etc/postgresql/14/main/postgresql.conf
restore_command = 'xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && xz -dc /mnt/cephfs/backup/postgresql/archivelogs/%f.xz > %p'
recovery_target_time = '2023-04-08 16:00:00'

After checking the postgresql.conf

Stop the DB

For Debian/Ubuntu:
as root

systemctl stop postgresql

Delete the DB data & restore full backup

Make a copy if you have enough space.

mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main_original_backup
mkdir /var/lib/postgresql/14/main
tar -zxvf /mnt/cephfs/backup/postgresql/2023-04-01_04h00/base.tar.gz -C /var/lib/postgresql/14/main/
chmod 700 /var/lib/postgresql/14/main
touch /var/lib/postgresql/14/main/recovery.signal

The presence of the recovery.signal file will trigger the recovery process on the next start.

Start the DB and monitor the Point-in-time recovery

(as root) systemctl start postgresql
Watch the process progress with the log : tail -f /var/log/postgresql/postgresql-14-main.log

2023-04-09 11:00:31.389 UTC [10090] LOG:  starting PostgreSQL 14.7 (Ubuntu 14.7-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
2023-04-09 11:00:31.390 UTC [10090] LOG:  listening on IPv6 address "::1", port 5432
2023-04-09 11:00:31.390 UTC [10090] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2023-04-09 11:00:31.452 UTC [10090] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-09 11:00:31.533 UTC [10091] LOG:  database system was interrupted; last known up at 2023-04-01 04:06:15 UTC
xz: /mnt/cephfs/backup/postgresql/archivelogs/00000002.history.xz: No such file or directory
2023-04-09 11:00:31.594 UTC [10091] LOG:  starting point-in-time recovery to 2023-04-08 16:00:00+00
2023-04-09 11:00:32.422 UTC [10091] LOG:  restored log file "00000001000000A400000057" from archive
2023-04-09 11:00:32.602 UTC [10091] LOG:  redo starts at A4/57000028
2023-04-09 11:00:32.663 UTC [10091] LOG:  consistent recovery state reached at A4/577AD5A8
2023-04-09 11:00:32.664 UTC [10090] LOG:  database system is ready to accept read-only connections
2023-04-09 11:00:33.814 UTC [10091] LOG:  restored log file "00000001000000A400000058" from archive
2023-04-09 11:00:36.060 UTC [10091] LOG:  restored log file "00000001000000A400000059" from archive
2023-04-09 11:00:37.675 UTC [10091] LOG:  restored log file "00000001000000A40000005A" from archive
... [Deleted hundreds of lines]
2023-04-09 11:43:38.906 UTC [10091] LOG:  restored log file "00000001000000A7000000B9" from archive
2023-04-09 11:43:40.502 UTC [10091] LOG:  restored log file "00000001000000A7000000BA" from archive
2023-04-09 11:43:41.949 UTC [10091] LOG:  restored log file "00000001000000A7000000BB" from archive
2023-04-09 11:43:44.134 UTC [10091] LOG:  restored log file "00000001000000A7000000BC" from archive
2023-04-09 11:43:45.965 UTC [10091] LOG:  restored log file "00000001000000A7000000BD" from archive
2023-04-09 11:43:47.774 UTC [10091] LOG:  restored log file "00000001000000A7000000BE" from archive
2023-04-09 11:43:49.698 UTC [10091] LOG:  restored log file "00000001000000A7000000BF" from archive
2023-04-09 11:43:50.194 UTC [10091] LOG:  recovery stopping before commit of transaction 6522027, time 2023-04-08 16:00:00.391957+00
2023-04-09 11:43:50.194 UTC [10091] LOG:  pausing at the end of recovery
2023-04-09 11:43:50.194 UTC [10091] HINT:  Execute pg_wal_replay_resume() to promote.

Open the database

postgres@minetest:~/14/main$ psql
psql (14.7 (Ubuntu 14.7-0ubuntu0.22.04.1))
Type "help" for help.
postgres=# SELECT pg_wal_replay_resume();
 pg_wal_replay_resume 
----------------------
 
(1 row)

postgres=# SELECT pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 A7/BFE63C48
(1 row)


postgres=# \q
postgres@minetest:~/14/main$ 
postgres@minetest:~/14/main$ ls -l recovery.signal
ls: cannot access 'recovery.signal': No such file or directory

We can see the the process completely finished, and after resuming the pg_wal_replay, the recovery.signal file has been removed.

Conclusion

PostgreSQL PITR is very simple, and can be triggered easily anytime you need it
pg_basebackup is your friend
Always test recovery after configuring PITR. This will ensure you have the correct restore_command, and ensure you can quickly proceed to a PITR
It’s possible to use a dedicated recovery.conf and start the DB

2 responses to “Point-in-time recovery (PITR) of PostgreSQL database”

jeevan says:

19 December 2023 at 12:04 pm

I followed these steps, but still getting below error

2023-12-19 11:03:53.488 UTC [13179] LOG: invalid checkpoint record
2023-12-19 11:03:53.488 UTC [13179] FATAL: could not locate required checkpoint record
2023-12-19 11:03:53.488 UTC [13179] HINT: If you are restoring from a backup, touch “/data/databases/pgsql14/main/recovery.signal” and add required recovery options.
If you are not restoring from a backup, try removing the file “/data/databases/pgsql14/main/backup_label”.
Be careful: removing “/data/databases/pgsql14/main/backup_label” will result in a corrupt cluster if restoring from a backup

- François says:
  
  22 December 2023 at 12:45 am
  
  Hi Jeevan,
  Interesting case. Can you:
  – ls -lh the recovery.signal file ?
  – Did you try to restore to an earlier date (Check a couple of previous archived logs to pick a date) ? You might have to pg_resetwal, since the current WALs are of no use if you restore to an earlier date. Possibly choose a date before the last full backup (if you have the previous & WALs archives.)
  – Anyway, make sure you have a file backup of the existing situation so you can always undo what you’re doing (tar of the data folder)
  – You mentioned a corrupt cluster. Is the DB part of a cluster with replication ? How did you get there ? Unsafe power off ?
  
  Keep us posted,
  Keep me posted.

Pivert's Blog

Point-in-time recovery (PITR) of PostgreSQL database