Quantcast
Channel: Archives des PostgreSQL - dbi Blog
Viewing all 526 articles
Browse latest View live

Instant PostgreSQL Cloning with SUSE and Btrfs

$
0
0

What if you could clone a PostgreSQL database instantly without affecting the original source database, without impacting performance, without any external tool but using your linux Btrfs storage layer.
This is what I will demonstrate in the following blogpost.

Introduction

Sometimes developpers need urgently a copy of a PostgreSQL database where they can test new developments or make changes that won’t modify the source database.
The usual way is to ask for the most recent backup of the database and restore it on another server (staging or test).
The major problem of this solution is the time needed which is exponential with the database size and nowadays it is not affordable to wait every time several hours.
But with SUSE Btrfs, it is possible to circumvent this pitfal by using a nice feature called “Copy-On-Write” snapshot which is default since SLES version 12 .
Of course, and as a prerequisite, your source Postgres cluster must reside on a Btrfs filesystem.

Installation

For my demonstration that you can easily reproduce, I will use a SLES version 15 minimal installation.
As usual we start to create a PostgreSQL user and Group. We add it to the sudo configuration so we don’t need everytime to jump from postgres to root and conversely.
sles15:~ # groupadd postgres
sles15:~ # useradd -g postgres -m postgres
sles15:~ # passwd postgres

We need now to install the required packages, prepare the installation, download and install PostgreSQL 12.4
postgres@sles15:~> sudo zypper -n install wget gcc readline-devel zlib-devel libopenssl-devel pam-devel libxml2-devel libxslt-devel openldap2-devel python3-devel sysstat make systemd-devel bzip2 llvm7-devel llvm7 clang7 libicu-devel
clang7 llvm7-devel llvm7 wget

postgres@sles15:~> sudo mkdir -p /u01/app/postgres
postgres@sles15:~> sudo chown postgres:postgres /u01/app/postgres
postgres@sles15:~> wget https://ftp.postgresql.org/pub/source/v12.4/postgresql-12.4.tar.bz2
postgres@sles15:~> tar -axf postgresql-12.4.tar.bz2
postgres@sles15:~> cd postgresql-12.4/
postgres@sles15:~/postgresql-12.4> ./configure --prefix=/u01/app/postgres/product/12/db_4
postgres@sles15:~/postgresql-12.4> make all
postgres@sles15:~/postgresql-12.4> make install
postgres@sles15:~/postgresql-12.4> cd contrib
postgres@sles15:~/postgresql-12.4/contrib> make install
postgres@sles15:~/postgresql-12.4/contrib> cd ../..
postgres@sles15:~> rm -rf postgresql-12.4

We create now a new Btrfs filesystem and a subvolume for the source PostgreSQL cluster.
postgres@sles15:~> sudo mkdir -p /pgdatas
postgres@sles15:~> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 8M 0 part
├─sda2 8:2 0 18.6G 0 part /boot/grub2/i386-pc
├─sda3 8:3 0 9.4G 0 part /home
└─sda4 8:4 0 2G 0 part [SWAP] sdb 8:16 0 52.7G 0 disk
sr0 11:0 1 373M 0 rom
postgres@sles15:~> sudo mkfs.btrfs /dev/sdb
postgres@sles15:~> exit
sles15:~ # echo "/dev/sdb /pgdatas btrfs defaults" >> /etc/fstab
sles15:~ # mount -a
sles15:~ # df -h /pgdatas
/dev/sdb 53G 3.8M 53G 1% /pgdata
sles15:~ # su - postgres
postgres@sles15:~> sudo chown postgres:postgres /pgdatas/
postgres@sles15:~> sudo btrfs subvolume create /pgdatas/pg1
Create subvolume '/pgdatas/pg1'
postgres@sles15:~> sudo btrfs subvolume list /pgdatas/
ID 257 gen 8 top level 5 path pg1

Let’s create and start the PostgreSQL cluster to be cloned.
As we need a consistent database for our tests, we will populate some data by using pgbench, the PostgreSQL benchmarking tool, in order to get a 15Gb database.
postgres@sles15:~> sudo chown postgres:postgres /pgdatas/pg1
postgres@sles15:~> /u01/app/postgres/product/12/db_4/bin/initdb -D /pgdatas/pg1
postgres@sles15:~> /u01/app/postgres/product/12/db_4/bin/pg_ctl -D /pgdatas/pg1 -l /dev/null start
postgres@sles15:~> /u01/app/postgres/product/12/db_4/bin/psql -l
List of databases
Name       | Owner.   | Encoding | Collate     | Ctype       | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres   | postgres | UTF8.    | en_US.UTF-8 | en_US.UTF-8 |
template0  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
template1  | postgres | UTF8.    | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
(3 rows)
postgres@sles15:~> export PATH=/u01/app/postgres/product/12/db_4/bin/:$PATH
postgres@sles15:~> createuser --login --pwprompt dbi
postgres@sles15:~> createdb -e --owner=dbi dbi
postgres@sles15:~> pgbench --initialize --scale=1000 -U dbi dbi
postgres@sles15:~> psql -c "select pg_size_pretty(pg_database_size('dbi'))"
pg_size_pretty
----------------
15 GB

For my demonstration to be effective, I need also to generate some load on the source cluster and I will use again pgbench for that, with 60 transactions per second and 16 simultaneous users. I let it run for 10mn and meanwhile in another session, I will create two clones, staging and test to show you that it will take no resources on the original database.
postgres@sles15:~> pgbench -U dbi --rate=60 --client=16 --progress=5 --time=600 dbi
postgres@sles15:~> cd /pgdatas/
postgres@sles15:/pgdatas> time sudo btrfs subvolume snapshot PG1 staging
Create a snapshot of 'pg1' in './staging'
real 0m0.123s
user 0m0.011s
sys 0m0.034s
postgres@sles15:/pgdatas> time sudo btrfs subvolume snapshot pg1 test
Create a snapshot of 'pg1' in './test'
real 0m0.098s
user 0m0.024s
sys 0m0.014s

You can see below that with the pgbench statistics, there is absolutely no impact on performances meaning that this could be easily done on a productive cluster.
progress: 5.0 s, 51.8 tps, lat 945.726 ms stddev 203.743, lag 655.979 ms
progress: 10.0 s, 67.0 tps, lat 245.902 ms stddev 259.702, lag 105.411 ms
progress: 15.0 s, 61.8 tps, lat 77.370 ms stddev 52.207, lag 0.556 ms
progress: 20.0 s, 61.2 tps, lat 67.853 ms stddev 42.487, lag 2.345 ms
progress: 25.0 s, 60.6 tps, lat 63.429 ms stddev 71.115, lag 3.930 ms
progress: 30.0 s, 66.2 tps, lat 49.639 ms stddev 49.599, lag 5.884 ms

Our 2 clones are ready but we need still some additional work.
As it is an atomic snapshot of our PostgreSQL subvolume, and we have exactly the same content, the postmaster.pid file which contains the process id of the source cluster has to be removed because as you might know, we can’t start our new PostgreSQL instance with the same PID, so we remove it.
But it’s not enough, we have also to change the port and we do that by echoing the new port in postgresql.auto.conf.
postgres@sles15:/u02/pgdata> rm -f /pgdatas/staging/postmaster.pid
postgres@sles15:/u02/pgdata> rm -f /pgdatas/test/postmaster.pid
postgres@sles15:/u02/pgdata> echo "port=5433" > /u02/pgdata/staging/postgresql.auto.conf
postgres@sles15:/u02/pgdata> echo "port=5434" > /u02/pgdata/test/postgresql.auto.conf

The starting process will take some time because clones must be consistent and include everything that has been committed. So PostgreSQL will perform a recovery.
postgres@sles15:/u02/pgdata> pg_ctl -D /pgdatas/staging/ start
# /u01/app/postgres/product/12/db_4/bin/pg_ctl -D /pgdatas/staging start
waiting for server to start....2020-09-23 16:03:41.930 CEST [4248] LOG: starting PostgreSQL 12.4 dbi services build on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux) 7.5.0, 64-bit
2020-09-23 16:03:41.934 CEST [4248] LOG: listening on IPv6 address "::1", port 5433
2020-09-23 16:03:41.934 CEST [4248] LOG: listening on IPv4 address "127.0.0.1", port 5433
2020-09-23 16:03:41.943 CEST [4248] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2020-09-23 16:03:41.980 CEST [4249] LOG: database system was interrupted; last known up at 2020-09-23 15:48:30 CEST
2020-09-23 16:03:42.137 CEST [4249] LOG: database system was not properly shut down; automatic recovery in progress
2020-09-23 16:03:42.144 CEST [4249] LOG: redo starts at 11/6545A890
.2020-09-23 16:03:43.034 CEST [4249] LOG: invalid record length at 11/66662ED8: wanted 24, got 0
2020-09-23 16:03:43.034 CEST [4249] LOG: redo done at 11/66662EB0
............2020-09-23 16:03:55.151 CEST [4248] LOG: database system is ready to accept connections
done
server started

Test

To illustrate that our clones won’t affect the source database when performing queries, let’s make a simple test on both clones.
First we check the filler field of the pgbench_tellers on all 3 clusters. We update it on the clones and we remove most of the rows of the pgbench_telers on the test clone only and we check again.
for val in {2 3 4}
do
psql -p 543${val} -U dbi -d dbi -c "select * from pgbench_tellers order by 1 limit 2"
psql -p 543${val} -U dbi -d dbi -c "select count(*) from pgbench_tellers"
done

Source
------
tid  | bid | tbalance | filler
-----+-----+----------+--------
1    | 1   | 24028.   |
2    | 1.  | -27698.  |
count
-------
10000
Clone staging
-------------
tid  | bid | tbalance | filler
-----+-----+----------+--------
1    | 1   | 22651.   |
2    | 1   | -34706.  |
count
-------
10000
Clone test
----------
tid  | bid | tbalance | filler
-----+-----+----------+--------
1    | 1   | 22651    |
2    | 1   | -34706.  |
# psql -p 5433 -d dbi -c update pgbench_tellers set filler = I am the Clone of pg1"
UPDATE 10000
# psql -p 5434 -d dbi -c update pgbench_accounts set filler = I am the second Clone of pg1"
UPDATE 10000
# psql -p 5434 -d dbi -c "delete from pgbench_tellers where tid > 100 and tid < 9800"
DELETE 9699
Source
------
tid   | bid | tbalance | filler
------+-----+----------+--------
61    | 7   | -27082   |
8892  | 890 | 14471.   |
count
-------
10000
Clone staging
-------------
tid | bid | tbalance | filler
------+-----+----------+--------------------------------------------------------------------------------------
73    | 8 | 25292.     | I am the Clone of pg1
48    | 5.| -34248     | I am the Clone of pg1
count
-------
10000
Clone test
----------
tid  | bid | tbalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
73.  | 8   | 25292.   | I am the second Clone of pg1
48   | 5.  | -34248   | I am the second Clone of pg1
count
-------
301

So, when tests are over, it’s easy to remove the clones but first, don’t forget to stop them and check that all subvolumes have been removed.
# pg_ctl -D /pgdatas/staging stop
# pg_ctl -D /pgdatas/test stop
# sudo btrfs subvolume delete staging
Delete subvolume (no-commit): '/pgdatas/staging'
# sudo btrfs subvolume delete test
Delete subvolume (no-commit): '/pgdatas/test'
# sudo btrfs subvolume list /pgdatas
ID 258 gen 6958 top level 5 path pg1

Conclusion

Instant cloning a PostgreSQL cluster with Btrfs is easy & straight forward for developers (validation procedures), rapid testing and even PostgreSQL upgrade.
It’s simple to setup, you don’t need a licence, there is no impact on performance and it does not affect the source cluster.
So happy instant cloning.

Cet article Instant PostgreSQL Cloning with SUSE and Btrfs est apparu en premier sur Blog dbi services.


PostgreSQL 13: Trusted extensions

$
0
0

PostgreSQL 13 is just around the corner and one little, but important feature that was added are trusted extensions. When you want to add an extension to a database you need to be superuser for most of the extension or you need to implement something like this (please also note the comment from Hans at the bottom of the blog). This is where this new feature helps: If extensions are marked as “trusted” you do not need to be superuser anymore to install them into a database.

If an extension is trusted or not is specified in the extension’s control file, so all of these should be fine to install without being superuser (depending on how you installed postgres you need to check where the extension files actually are located):

postgres@centos8pg:/u01/app/postgres/product/DEV/db_1/share/extension/ [pgdev] grep trusted *.control | grep -v comment
bool_plperl.control:trusted = true
btree_gin.control:trusted = true
btree_gist.control:trusted = true
citext.control:trusted = true
cube.control:trusted = true
dict_int.control:trusted = true
fuzzystrmatch.control:trusted = true
hstore.control:trusted = true
intarray.control:trusted = true
isn.control:trusted = true
jsonb_plperl.control:trusted = true
lo.control:trusted = true
ltree.control:trusted = true
pgcrypto.control:trusted = true
pg_trgm.control:trusted = true
plperl.control:trusted = true
plpgsql.control:trusted = true
seg.control:trusted = true
tablefunc.control:trusted = true
tcn.control:trusted = true
tsm_system_rows.control:trusted = true
tsm_system_time.control:trusted = true
unaccent.control:trusted = true

Using seq as an example we should be able to install that as a normal user as it is marked as trusted:

postgres=# create user u with login password 'u';
CREATE ROLE
postgres=# \c postgres u
You are now connected to database "postgres" as user "u".
postgres=> create extension seg;
ERROR:  permission denied to create extension "seg"
HINT:  Must have CREATE privilege on current database to create this extension.

Being able to connect to the database is not enough, you need to have the create privilege on the database:

postgres=> \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# grant create on database postgres to u;
GRANT
postgres=# \c postgres u
You are now connected to database "postgres" as user "u".
postgres=> create extension seg;
CREATE EXTENSION

That is a huge help if you are using many extensions in many databases as users now can do that on their own.

What you also could do, if you trust your users is to adjust the control file of a specific extension. The bloom extension is not marked as trusted:

postgres@centos8pg:/u01/app/postgres/product/DEV/db_1/share/extension/ [pgdev] cat bloom.control
# bloom extension
comment = 'bloom access method - signature file based index'
default_version = '1.0'
module_pathname = '$libdir/bloom'
relocatable = true

If we modify that to (be aware that you’ll loose that once you patch/re-install PostgreSQL):

postgres@centos8pg:/u01/app/postgres/product/DEV/db_1/share/extension/ [pgdev] cat bloom.control
# bloom extension
comment = 'bloom access method - signature file based index'
default_version = '1.0'
module_pathname = '$libdir/bloom'
relocatable = true
trusted = true

… this one can be installed by a normal user as well:

postgres=# \c postgres u
You are now connected to database "postgres" as user "u".
postgres=> create extension bloom;
CREATE EXTENSION
postgres=> 

This is of course not recommended.

Cet article PostgreSQL 13: Trusted extensions est apparu en premier sur Blog dbi services.

Azure Database for PostgreSQL

$
0
0

This Blog is about some findings on Microsoft’s Azure service for PostgreSQL which I think needed to be noted.

Findings about the infrastructure

The Microsoft Azure Service for PostgreSQL is using Windows as OS Infrastructure, there are several points where it is possible to find out the used OS on PostgreSQL Level.

  • select version(); => PostgreSQL 11.6, compiled by Visual C++ build 1800, 64-bi
  • show dynamic_shared_memory_type; => windows
  • show archive_command; => c:\postgres\bin\xlogcopy\xlogcopy.exe archive blob “%f” “%p”

So i think it is secured knowledge that the Azure Service for PostgreSQL is using Windows as OS.

Findings about performance

Microsoft offers 3 IOPS per GB Storage, in their own Quicktips which can be found here https://techcommunity.microsoft.com/t5/azure-database-support-blog/azure-database-for-postgresql-performance-quick-tips/ba-p/369125

Microsoft describes under Point B and C:
B. IOPS throughput
Please remember that the server has 3 IOPS per 1 GB of Storage. If your application requires higher IOPs, then it is recommended that you scale up you Azure Database for PostgreSQL server storage size to get more IOPS so that your application performance is not impacted by storage throttling.

C. IO waits
If IO waits are observed from PostgreSQL performance troubleshooting, then increasing the storage size should be considered for higher IO throughput. Check the wait queries using the portal.
Means a small database with higher performance requirements needs more storage than used by the data volume to get higher performance, and for example at 100GB 300 IOPS is not as fast as PostgreSQL can be.

Collation

The default collation on Azure is US Western 1252 which is not PostgreSQL default, so be careful by creating your databases and try to use UTF8 instead.

Extensions

Not all extensions provided by the contrib package are available on Azure, this may have some consequences if your application requires one of them:

  • adminpack administrative functions for PostgreSQL
  • amcheck functions for verifying relation integrity
  • autoinc functions for autoincrementing fields
  • bloom bloom access method – signature file based index
  • dict_xsyn text search dictionary template for extended synonym processing
  • file_fdw foreign-data wrapper for flat file access
  • hstore_plperl transform between hstore and plperl
  • hstore_plperlu transform between hstore and plperlu
  • insert_username functions for tracking who changed a table
  • intagg integer aggregator and enumerator (obsolete)
  • jsonb_plperl transform between jsonb and plperl
  • jsonb_plperlu transform between jsonb and plperlu
  • lo Large Object maintenance
  • moddatetime functions for tracking last modification time
  • pageinspect inspect the contents of database pages at a low level
  • pg_freespacemap examine the free space map (FSM)
  • pg_qualstats An extension collecting statistics about quals
  • pg_stat_kcache Kernel statistics gathering
  • pg_visibility examine the visibility map (VM) and page-level visibility info
  • refint functions for implementing referential integrity (obsolete)
  • repmgr Replication manager for PostgreSQL
  • seg data type for representing line segments or floating-point intervals
  • sslinfo information about SSL certificates
  • tcn Triggered change notifications
  • timetravel functions for implementing time travel
  • tsm_system_rows TABLESAMPLE method which accepts number of rows as a limit
  • tsm_system_time TABLESAMPLE method which accepts time in milliseconds as a limit
  • xml2 XPath querying and XSLT

Findings about Security

This is a very important point for someone who has to fulfill security requirements by corporate security. On the configuration page it is possible to enabled a firewall rule to allow Azure services to connect, what does this mean in reality?
With select * from pg_hba_file_rules; it is possible to read out the pg_hba.conf used for the Azure Service for PostgreSQL, with switching on this firewall rule the pg_hba.conf contains more than 10900 Entries including 45 /16 Networks, means these 45 /16 Networks allowing to connect 45 x 65025 IP Adresses = 2.926 Million IP addresses allowed to connect with host and Password security only.

It is interesting that with enabled SSL enforcement the entries inside pg_hba.conf are still on host and not on hostssl, according to the PostgreSQL Documentation also non SSL connections are allowed: https://www.postgresql.org/docs/11/auth-pg-hba-conf.html

host
This record matches connection attempts made using TCP/IP. host records match either SSL or non-SSL connection attempts.
hostssl
This record matches connection attempts made using TCP/IP, but only when the connection is made with SSL encryption.
password
Require the client to supply an unencrypted password for authentication. Since the password is sent in clear text over the network, this should not be used on untrusted networks.

With enabled firwall rule allow Azure services to connect we found on some Services that select * from pg_stat_ssl; shows no connection using SSL! Restarting the service solved the issue for the moment.

So in my mind there are some points needed to be discussed before starting with the Azure Service for PostgreSQL, but this needs to be done on any Cloud service before using it. According to some discussions on a project we had, some following up meetings about the future of the Microsoft Azure Service for PostgreSQL.

At first, Microsoft moves to a flexible server implementation based on Linux and get rid of the performance and security issuing connection gateway by using vnet integration, so the pg_hba nightmare should be solved with the new solution.
The new solution is vm based, one vm per instance, no multiple instance setup.
I have a few concerns, Microsoft is building the installation packages on their own, using the PostgreSQL build in replication (no repmgr, no patroni) and the Linux used is Ubuntu. If it is available for met for testing, I will follow up with testing if the issues we have found on the Windows based offering are gone now. But I see the way is going into the right direction.

 

Cet article Azure Database for PostgreSQL est apparu en premier sur Blog dbi services.

Getting started with Exasol – Loading data from PostgreSQL

$
0
0

In the first post of this series we brought up an Exasol test system to have something to play with. Because playing without data is a bit boring this post is about loading data from PostgreSQL to Exasol. We’ll use pgbench on the PostgreSQL side to generate some data and then load that into Exasol.

Exasol comes with a concept called “virtual schemas”. If you know PostgreSQL you can compare that to foreign data wrappers. The purpose is to access data on foreign systems of any kind and integrate that into the system. The data on the foreign system is accessed through standard SQL commands, very much the same like PostgreSQL is doing it. In contrast to PostgreSQL, Exasol usually uses JDBC to connect to a foreign data source, here is the list of currently supported JDBC dialects. You are not limited to one of those, you can well go ahead and combine any of these and centralize your reporting (or whatever you plan to do with the foreign data) into Exasol. There is also a GitHub project which provides additional information around abadapters and virtual schemas.

Before you can connect to an external system you need to install the corresponding JDBC driver but in the case of PostgreSQL the JDBC driver is already there by default. You can check that in the web interface of Exasol:
.

We will anyway upload the latest PostgreSQL driver to show the procedure. To upload the latest driver, again go to EXAOperations and locate the drivers tab under software adn press “Add”:

Add the driver details:

Locate the driver (can be downloaded from jdbc.postgresql.org) and upload the jar file:

That’s it for the driver update.

Before we connect Exasol to PostgreSQL, let’s prepare the data on the PostgreSQL side:

postgres=# create user exasol with login password 'exasol';
CREATE ROLE
postgres=# create database exasol with owner = exasol;
CREATE DATABASE
postgres=# \! pgbench -i -s 100 -U exasol exasol
dropping old tables...
creating tables...
generating data...
100000 of 10000000 tuples (1%) done (elapsed 0.07 s, remaining 7.14 s)
200000 of 10000000 tuples (2%) done (elapsed 0.19 s, remaining 9.21 s)
...
10000000 of 10000000 tuples (100%) done (elapsed 17.42 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.

That gives as 10 million rows in the pgbench_accounts table:

postgres=# \c exasol exasol
You are now connected to database "exasol" as user "exasol".
exasol=> select count(*) from pgbench_accounts;
  count   
----------
 10000000
(1 row)

On the Exasol side, the first thing you need to do is to create a connection (In PostgreSQL that would be the foreign server and the user mapping):

dwe@dwe:~/EXAplus-7.0.0$ ./exaplus -c 192.168.22.117:8563 -u sys -p exasol
EXAplus 7.0.0 (c) EXASOL AG

Wednesday, September 30, 2020 at 9:49:38 AM Central European Summer Time
Connected to database EXAone as user sys.
EXASolution 7.0.2 (c) EXASOL AG

SQL_EXA> CREATE OR REPLACE CONNECTION JDBC_POSTGRESQL
1    >  TO 'jdbc:postgresql://192.168.22.11:5433/exasol'
2    >  USER 'exasol'
3    >  IDENTIFIED BY 'exasol';
EXA: CREATE OR REPLACE CONNECTION JDBC_POSTGRESQL...

Rows affected: 0

To test that connection you can simply do:

SQL_EXA> IMPORT FROM JDBC AT JDBC_POSTGRESQL STATEMENT ' SELECT ''OK'' ';
EXA: IMPORT FROM JDBC AT JDBC_POSTGRESQL STATEMENT ' SELECT ''OK'' ';

?column?                                                                                                                                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
OK                                                                                                                                                                                                      

1 row in resultset.

SQL_EXA> 

Now is the time to create the virtual schema so we can access the tables on the PostgreSQL side via plan SQL. Before we can do that we need to deploy the adapter. The PostgreSQL JDBC driver must be uploaded to BucketFS and I’ve created a new bucket for that:

BucketFS is an internal file system that gets synchronized automatically between the Exasol cluster nodes.

The easiest way to upload files is to use the bucket explorer but you could also use curl against the REST API:

dwe@dwe:~/Downloads$ curl --user admin -v -X PUT -T postgresql-42.2.16.jar  http://192.168.22.117:2580/bucketfs/bucketfs1/drivers/postgresql-42.2.16.jar

Also upload the latest virtual schema distribution from here, so it looks like this:

Now we need to create the so-clalled “Adapter script”:

SQL_EXA> CREATE SCHEMA ADAPTER;
EXA: CREATE SCHEMA ADAPTER;

Rows affected: 0

SQL_EXA> CREATE OR REPLACE JAVA ADAPTER.JDBC_ADAPTER AS
         %scriptclass com.exasol.adapter.RequestDispatcher;
         %jar /buckets/bucketfs1/drivers/virtual-schema-dist-5.0.4-bundle-4.0.3.jar;
         %jar /buckets/bucketfs1/drivers/postgresql-42.2.16.jar;
/
EXA: CREATE OR REPLACE JAVA ADAPTER SCRIPT ADAPTER.JDBC_ADAPTER AS...

Rows affected: 0

SQL_EXA> 

Finally we can create the virtual schema:

SQL_EXA> CREATE VIRTUAL SCHEMA POSTGRESQL_REMOTE
         USING ADAPTER.JDBC_ADAPTER 
         WITH
         SQL_DIALECT = 'POSTGRESQL'
         SCHEMA_NAME = 'public'
         CONNECTION_NAME = 'JDBC_POSTGRESQL';
EXA: CREATE VIRTUAL SCHEMA POSTGRESQL_REMOTE...

Rows affected: 0

Now, that the virtual schema is there, the tables on the PostgreSQL side are visible and can be queried:

SQL_EXA> select table_name from exa_dba_tables where TABLE_SCHEMA = 'POSTGRESQL_REMOTE';
EXA: select table_name from exa_dba_tables where TABLE_SCHEMA = 'POSTGRESQL...

TABLE_NAME                                                                                                                      
--------------------------------------------------------------------------------------------------------------------------------
PGBENCH_ACCOUNTS                                                                                                                
PGBENCH_BRANCHES                                                                                                                
PGBENCH_HISTORY                                                                                                                 
PGBENCH_TELLERS                                                                                                                 

4 rows in resultset.

SQL_EXA> select count(*) from POSTGRESQL_REMOTE.PGBENCH_ACCOUNTS;
EXA: select count(*) from POSTGRESQL_REMOTE.PGBENCH_ACCOUNTS;

COUNT(*)            
--------------------
            10000000

1 row in resultset.

Compared to PostgreSQL it is bit more work to set this up, but once you are through the process the first time it not an issue anymore. In the next post we will load that data into Exasol and get into more details how Exasol works when it comes to transactions.

Cet article Getting started with Exasol – Loading data from PostgreSQL est apparu en premier sur Blog dbi services.

Getting started with Exasol – Some words about indexes and transactions

$
0
0

If you followed Getting started with Exasol – Setting up an environment and Getting started with Exasol – Loading data from PostgreSQL you should have an Exasol test system up and running and a virtual schema pointing to a PostgreSQL schema. What we’ll be doing in this post is to load the data from PostgreSQL into Exasol and then have a look at how transactions work in Exasol and how Exasol is handling indexes.

As a quick reminder the virtual schema we’ve created in the previous post looks like this:

SQL_EXA> select table_name from exa_dba_tables where TABLE_SCHEMA = 'POSTGRESQL_REMOTE';
EXA: select table_name from exa_dba_tables where TABLE_SCHEMA = 'POSTGRESQL...

TABLE_NAME                                                                                                                      
--------------------------------------------------------------------------------------------------------------------------------
PGBENCH_ACCOUNTS                                                                                                                
PGBENCH_BRANCHES                                                                                                                
PGBENCH_HISTORY                                                                                                                 
PGBENCH_TELLERS                                                                                                                 

4 rows in resultset.

As these table are actually in PostgreSQL we’re going to load them locally into an Exasol schema:

SQL_EXA> create schema demo;
EXA: create schema demo;

Rows affected: 0

SQL_EXA> open schema demo;
EXA: open schema demo;

Rows affected: 0

SQL_EXA> create table PGBENCH_ACCOUNTS as select * from POSTGRESQL_REMOTE.PGBENCH_ACCOUNTS;
EXA: create table PGBENCH_ACCOUNTS as select * from POSTGRESQL_REMOTE.PGBEN...

Rows affected: 10000000

An alternative method to do the same is to use SELECT INTO statement in Exasol:

SQL_EXA> select * into table PGBENCH_BRANCHES from POSTGRESQL_REMOTE.PGBENCH_BRANCHES;
EXA: select * into table PGBENCH_BRANCHES from POSTGRESQL_REMOTE.PGBENCH_BR...

Rows affected: 100

SQL_EXA> select * into table PGBENCH_HISTORY from POSTGRESQL_REMOTE.PGBENCH_HISTORY;
EXA: select * into table PGBENCH_HISTORY from POSTGRESQL_REMOTE.PGBENCH_HIS...

Rows affected: 0

SQL_EXA> select * into table PGBENCH_TELLERS from POSTGRESQL_REMOTE.PGBENCH_TELLERS;
EXA: select * into table PGBENCH_TELLERS from POSTGRESQL_REMOTE.PGBENCH_TEL...

Rows affected: 1000

Of course we lost all primary keys, foreign keys and check constraints by doing this:

SQL_EXA> select count(*) from EXA_DBA_CONSTRAINTS where CONSTRAINT_SCHEMA = 'DEMO';
EXA: select count(*) from EXA_DBA_CONSTRAINTS where CONSTRAINT_SCHEMA = 'DE...

COUNT(*)             
---------------------
                    0

1 row in resultset.

Let’s add the same constraints as we have them on the PostgreSQL side:

SQL_EXA> alter table PGBENCH_ACCOUNTS add constraint PGBENCH_ACCOUNTS_PK primary key (AID);
EXA: alter table PGBENCH_ACCOUNTS add constraint PGBENCH_ACCOUNTS_PK primar...

Rows affected: 0

SQL_EXA> alter table PGBENCH_BRANCHES add constraint PGBENCH_BRANCHES_PK primary key (BID);
EXA: alter table PGBENCH_BRANCHES add constraint PGBENCH_BRANCHES_PK primar...

Rows affected: 0

SQL_EXA> alter table PGBENCH_TELLERS add constraint PGBENCH_TELLERS_PK primary key (TID);
EXA: alter table PGBENCH_TELLERS add constraint PGBENCH_TELLERS_PK primary ...

Rows affected: 0

SQL_EXA> select count(*) from EXA_DBA_CONSTRAINTS where CONSTRAINT_SCHEMA = 'DEMO';
EXA: select count(*) from EXA_DBA_CONSTRAINTS where CONSTRAINT_SCHEMA = 'DE...

COUNT(*)             
---------------------
                    3

1 row in resultset.

Now we have exactly the same setup as in PostgreSQL. Creating the primary keys should also have created some indexes in the background:

SQL_EXA> col index_schema for a30;
COLUMN   index_schema ON
FORMAT   a30
SQL_EXA> col index_table for a30;
COLUMN   index_table ON
FORMAT   a30
SQL_EXA> col index_type for a15;
COLUMN   index_type ON
FORMAT   a15
SQL_EXA> select INDEX_SCHEMA,INDEX_TABLE,INDEX_TYPE from EXA_DBA_INDICES where INDEX_SCHEMA = 'DEMO';
EXA: select INDEX_SCHEMA,INDEX_TABLE,INDEX_TYPE from EXA_DBA_INDICES where ...

INDEX_SCHEMA                   INDEX_TABLE                    INDEX_TYPE     
------------------------------ ------------------------------ ---------------
DEMO                           PGBENCH_ACCOUNTS               GLOBAL         
DEMO                           PGBENCH_BRANCHES               GLOBAL         
DEMO                           PGBENCH_TELLERS                GLOBAL         

3 rows in resultset.

SQL_EXA> 

If you’re looking for an index name, there is none:

SQL_EXA> col column_name for a30;
COLUMN   column_name ON
FORMAT   a30
SQL_EXA> col column_comment for a50;
COLUMN   column_comment ON
FORMAT   a50
SQL_EXA> desc full EXA_DBA_INDICES;
EXA: desc full EXA_DBA_INDICES;

COLUMN_NAME                    SQL_TYPE                                 NULLABLE DISTRIBUTION_KEY PARTITION_KEY    COLUMN_COMMENT                                    
------------------------------ ---------------------------------------- -------- ---------------- ---------------- --------------------------------------------------
INDEX_SCHEMA                   VARCHAR(128) UTF8                                                                   Schema of the index                               
INDEX_TABLE                    VARCHAR(128) UTF8                                                                   Table of the index                                
INDEX_OWNER                    VARCHAR(128) UTF8                                                                   Owner of the index                                
INDEX_OBJECT_ID                DECIMAL(18,0)                                                                       ID of the index                                   
INDEX_TYPE                     VARCHAR(20) UTF8                                                                    Index type                                        
IS_GEOMETRY                    BOOLEAN                                                                             States whether this is a geospatial index         
MEM_OBJECT_SIZE                DECIMAL(18,0)                                                                       Index size in bytes (at last COMMIT)              
CREATED                        TIMESTAMP                                                                           Timestamp of when the index was created           
LAST_COMMIT                    TIMESTAMP                                                                           Last time the object was changed in the DB        
REMARKS                        VARCHAR(100000) UTF8                                                                Additional information about the index            

10 rows in resultset.
SQL_EXA> 
tree (bi

There isn’t even a CREATE INDEX statement in Exasol. A “Global” index exists on all nodes of the cluster while a “Local” index only exits on the given node. Indexes are created and maintained automatically based on the queries the system is processing. We can easily see this when joining two simple tables:

SQL_EXA> create table demo1 ( a int, b int, c int );
EXA: create table demo1 ( a int, b int, c int );

Rows affected: 0

SQL_EXA> insert into demo1 select 1,1,1 from pgbench_accounts where aid < 1000;
EXA: insert into demo1 select 1,1,1 from pgbench_accounts where aid  create table demo2 ( a int, b int, c int );

SQL_EXA> create table demo2 ( a int, b int, c int );
EXA: create table demo2 ( a int, b int, c int );

Rows affected: 0

SQL_EXA> insert into demo2 select * from demo1;
EXA: insert into demo2 select * from demo1;

Rows affected: 999

SQL_EXA> select count(*) from demo1 a, demo2 b where a.a = b.a;
EXA: select count(*) from demo1 a, demo2 b where a.a = b.a;

COUNT(*)             
---------------------
               998001

1 row in resultset.

SQL_EXA> select INDEX_SCHEMA,INDEX_TABLE,INDEX_TYPE from EXA_DBA_INDICES where INDEX_SCHEMA = 'DEMO';
EXA: select INDEX_SCHEMA,INDEX_TABLE,INDEX_TYPE from EXA_DBA_INDICES where ...

INDEX_SCHEMA                                 INDEX_TABLE                    INDEX_TYPE          
-------------------------------------------- ------------------------------ --------------------
DEMO                                         PGBENCH_ACCOUNTS               GLOBAL              
DEMO                                         PGBENCH_BRANCHES               GLOBAL              
DEMO                                         PGBENCH_TELLERS                GLOBAL              
DEMO                                         DEMO1                          LOCAL               

If an index is not used for 5 weeks it will be dropped automatically.

Coming to transaction handling. First of all you need to know the auto commit is enabled by default (the same as in PostgreSQL) in the clients:

SQL_EXA> show autocommit;
AUTOCOMMIT = "ON"

Exasol supports the serializable transaction isolation level. This is the strongest level but also comes with some downsides. It can happen that transaction need to wait or even abort when the order of the transactions can not be guaranteed. DDL is transactional as well:

-- session 1
SQL_EXA> set AUTOCOMMIT off;
SQL_EXA> create table t1 ( a int );
EXA: create table t1 ( a int );


-- session 2
SQL_EXA> set AUTOCOMMIT off;
SQL_EXA> 
SQL_EXA> select * from t1;
EXA: select * from t1;
Error: [42000] object T1 not found [line 1, column 15] (Session: 1679358615345561600)

-- session 1
SQL_EXA> commit;
EXA: commit;

-- session 2
SQL_EXA> select * from t1;
EXA: select * from t1;

A                    
---------------------

0 rows in resultset.

SQL_EXA> 

The recommendation is to go with the default and let auto commit on to keep transactions as short as possible. Otherwise this can happen:

-- session 1
SQL_EXA> set AUTOCOMMIT off;
SQL_EXA> insert into t1 values(1);
EXA: insert into t1 values(1);

Rows affected: 1

SQL_EXA> insert into t1 values(2);
EXA: insert into t1 values(2);

Rows affected: 1

SQL_EXA> insert into t1 values(3);
EXA: insert into t1 values(3);

Rows affected: 1

SQL_EXA> commit;
EXA: commit;

Rows affected: 0

SQL_EXA> update t1 set a = 1 where a = 2;
EXA: update t1 set a = 1 where a = 2;

Rows affected: 1

SQL_EXA> 


-- session 2
SQL_EXA> set AUTOCOMMIT off;
SQL_EXA> insert into t1 values (1);
EXA: insert into t1 values (1);    --- must wait for session on to either commit or rollback

… or even this:

-- session 2
SQL_EXA> select * from t1;
EXA: select * from t1;

A                    
---------------------
                    1
                    1
                    3
                    1

4 rows in resultset.


-- session 1
SQL_EXA> update t1 set a = 1 where a = 2;
EXA: update t1 set a = 1 where a = 2;

Rows affected: 0

SQL_EXA> 

-- session 2
SQL_EXA> insert into t1 values (5);
EXA: insert into t1 values (5);
Error: [40001] GlobalTransactionRollback msg: Transaction collision: automatic transaction rollback. (Session: 1679360319468797952)

These kind of conflict are recorded in the catalog and you can check them:

SQL_EXA> col conflict_objects for a30;
COLUMN   conflict_objects ON
FORMAT   a30
SQL_EXA> select * from EXA_DBA_TRANSACTION_CONFLICTS;
EXA: select * from EXA_DBA_TRANSACTION_CONFLICTS;

SESSION_ID            CONFLICT_SESSION_ID   START_TIME                 STOP_TIME                  CONFLICT_TYPE        CONFLICT_OBJECTS               CONFLICT_INFO                           
--------------------- --------------------- -------------------------- -------------------------- -------------------- ------------------------------ ----------------------
  1679358615345561600                       2020-10-01 14:13:12.871000 2020-10-01 14:13:12.871000 TRANSACTION ROLLBACK DEMO.T1                        intern merged sessions                  
  1679358615345561600   1679357953088880640 2020-10-01 14:23:31.669000 2020-10-01 14:27:35.941000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679359718013206528   1679357953088880640 2020-10-01 14:24:05.772000 2020-10-01 14:27:35.941000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679359718013206528   1679357953088880640 2020-10-01 14:29:05.514000 2020-10-01 14:32:06.309000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679360160561037312   1679357953088880640 2020-10-01 14:31:11.577000 2020-10-01 14:32:06.309000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679360185071960064   1679357953088880640 2020-10-01 14:31:59.416000 2020-10-01 14:32:06.309000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679360185071960064   1679359718013206528 2020-10-01 14:32:06.352000 2020-10-01 14:32:12.255000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679359718013206528   1679357953088880640 2020-10-01 14:32:51.164000 2020-10-01 14:32:51.164000 TRANSACTION ROLLBACK DEMO.T1                                                                

8 rows in resultset.

As soon as you write to an object you are working on your own (temporary copy) of the object, Exasol calls this Multi-Copy. When a transaction commits that version of the objects becomes the valid one. Transactions that started before will still use the previous version of the object.

The recommendation therefore is to always have auto commit turned on in the client and for the cases where you need to turn it off: Keep the transactions as short as possible. That’s it for the introduction of indexes and transactions in Exasol. In the next post we’ll look at database sessions and auditing.

Cet article Getting started with Exasol – Some words about indexes and transactions est apparu en premier sur Blog dbi services.

What is a database backup (back to the basics)

$
0
0

By Franck Pachot

.
TL;DR:

  • do not consider a dump (like PostgreSQL pg_dump or Oracle expdp) as a database backup
  • do not consider that your backup is successful if you didn’t test recovery
  • databases provide physical database backups, easy and safe to restore and recover to any point-in-time between first backup and point of failure
  • managed databases provide an easy recovery interface, but don’t trust it before you try it
  • and…


I’ve written this after reading “We deleted the production database by accident 💥 by Caspar von Wrede on the keepthescore.co blog. What scares me the most is not that they dropped a production database but that they have lost 7 hours of transactions when restoring it.

I appreciate it when people write a public transparent postmortem and lessons learned. People read this and think about it. I’ve seen 3 categories of comments here:

  • The “I give lessons without knowing the context” comments: many people criticizing what happened. Yes, sure, dropping a database production should not happen. But always look at the context. One guy is running this service with a low budget. This is how IT runs today: a startup company where a single person is the CEO and the Full-Stack DevOps at the same time. Of course, nobody can operate this without risk. Do not blame the guys. Just advocate for more common sense in IT.
  • The “I don’t know but give recommendations” comments: many people thinking that, because they restored a daily backup and have lost the 7 hours of transactions between this backup and the failure, they rely on a pg_dump. And some even suggest to run a pg_dump export more frequently!
  • The “I try to understand and see if I can help” comments: here is where are the most important lessons learned and this is what I detail here.

The funny thing is that I wrote this thinking I’m in the 3rd category. Because I’ve read DigitalOcean documentation and left a comment to tell them how they can still recover those 7 hours. And while writing this blog post, and testing this idea, I realize that I was actually in the second category… not knowing how it actually works.

Database Backup

Databases not only apply the changes to their persistent storage. They also log all modifications in order to be able redo or undo the changes in case of failure. I’ll take the PostgreSQL example because that is the database used by this “we deleted the production database” article. The changes are made to the files under PGGDATA/base and the transaction log is written under PGDATA/pg_wal as Write Ahead Logging (WAL).

Then when we want a physical copy of the database, for backup or other purposes, we:

  • copy the database files, online, while the changes happen to memory and are flushed asynchronously to those files. Because databases are built to serve multiple users 24/7. The file copies are “fuzzy” because of those concurrent changes. They contain most of data but some blocks (and even partial blocks) are from different point in time because no copy is instantaneous
  • archive the WAL as it is generated, because this is what can make the copy of database files consistent, and bring them to a further point-in-time

There are really two independent threads to protect the database: backup of files (usually daily) and archival of WAL (which are smaller and backed-up more often)

Of course, there are many ways to do those copies but it is a physical copy so that a recovery is:

  • #fast: copying files with large I/O calls is faster than inserting rows and maintaining indexes
  • #predicatable: the time it takes depends on the size of the database and the storage throughput (MB/s) and noting else
  • #reliable: after recovering a physical backup, the database is exactly the same, physically, which guaranties that the behaviour, state and performance does not change

With this, we can safely throw numbers to define an SLA (Service Level Agreement). The Recovery Time Objective (RPO) is the time it takes to get the database recovered. With a physical database backup, it can be estimated from the database size (and storage specification, and number of threads). The Recovery Point Objective (RPO) is about the accepted data loss. In an ACID database, the committed transactions must be durable. This means that no data loss is accepted. When you commit a transaction, the database ensures that the WAL is fsync’ed to disk. Your RPO depends on the backup of the WAL.

That’s a common misconception. People think of a backup like when you save a document to disk: in case of crash, you can get it back to the point in time when it was saved. But that’s wrong. The SQL equivalent of “save to disk” is COMMIT, not BACKUP. The frequency of the backup of database files does not determine the RPO. That’s the backup of the WAL, which happens more frequently (and because it is sequentially written it can even be streamed to another availability zone). The frequency of the backup of database files is important only for the RTO: the less WAL you have to apply on the fuzzy datafiles you restored, the faster you can open the recovered database.

It is important to understand those two steps in database recovery:

  • 1. RESTORE the files from the latest backup (large volume but deterministic time)
  • 2. RECOVER those files by applying the WAL until the desired point-in-time (usually the point of failure) and then rollbacking the non-commited transactions

When a developer asks you to take a backup before an application release, in order to be able to restore this state if he makes a mistake, you do not need to take an additional backup. You just note this point-in-time (can be with pg_create_restore_point) and your normal backup plan is sufficient to recover to any point-in-time. A database is not like a file with data. A database is continuously moving with changes from concurrent sessions. Its state depends on how those transactions will end (commit or rollback). It holds your data (in tables), redundant data (like indexes), metadata, and a log of current and recent transactions. When you backup a database, you backup all that. When you recover a database, you restore all that and recover it to the desired point-in-time.

I’m talking about “backup of the database” here, which is different from “backup of data”.

Dump of data

When you take a pg_dump, you export some data and metadata as-of a specific point-in-time. But you don’t export the log of transactions, you don’t export the physical layout. This can be considered as a logical “backup of data”. But it is not a “backup of the database”. The nuance here is database vs. data. It is also about physical vs. logical. When you import from a dump, all tables are created, and rows inserted, and then indexes created. Even if you get the same data, you don’t have the same database. The physical layout is different (and then the performance – think about index correlation factor). The time to build the indexes is long and hard to predict (depends on memory available, cache, prefetching, CPU…) That’s about RTO: long and unpredictable. About RPO it is even worse: you will always lose some committed transactions when you import from a dump, because you can restore but not recover to a further point-in-time.

A dump is not a backup of your database. It can be useful, for sure, because a logical export is more flexible to get back a previous database object, to generate DDL, to import to a different version, different platform. It is very good for migrations and data movement. And it can be part of your recovery plan. But it cannot substitute to a database backup. Let’s take an analogy with software. If a server crashes, I can start another one with the same image and it will be exactly the same server which I can run immediately. If I have no image, I can install the OS and software again, apply the same patches, configure it… but that will not be exactly the same server: I need to test it again before opening it for production.

Database Recovery

The most important is not the backup but being able to recover it easily (because when you will need it, you will probably be at a high-stress level, and with many other problems to solve). It is like when you buy new snow chains for your car. Easy to put them on when you are in your garage. Not the same if the first time you unbox them is on a slope with 1 meter of snow and freezing cold. And this “We deleted the production database by accident 💥” article is the best example:

Today at around 10:45pm CET, after a couple of glasses of red wine, we deleted the production database by accident 😨
After 5 minutes of hand-wringing and panic, we took the website into maintenance mode and worked on restoring a backup. At around 11:15pm CET, 30 minutes after the disaster, we went back online, however 7 hours of scoreboard data was gone forever 😵.

The time to restore (RTO) was 30 minutes, which is good given the context. But the data loss (RPO) of 7 hours was quite bad. And do you know why? Initially, when looking at DigitalOcean documentation and console, I thought it was simply because they clicked “Restore to New Cluster” without changing “Latest Backup” for the point to restore to. Then it restored the database as of the time of the latest backup (which was 7 hours ago) rather than a specific point in time (just before the accidental drop). Because of the stress (and maybe the glasses of wine ;)) and because they never tested it, tried it, document it, they may have chosen the bad option instead if this one:

But I was wrong. Never rely on what you see in the documentation or console. When it is about backup recovery: test it, test it, test it.

Example on DigitalOcean

I have created the same database than the one they have lost: the $15/month PostgreSQL managed service in DigitalOcean:

You can try the same, there’s a $100 credit trial. The creation of the database is easy. Just follow the steps to secure it (which IP or CIDR can access to it) and to connect (you have the psql flags to copy/paste).

The documentation for this service says:

  • Daily point-in-time backups. Databases are automatically backed up every day, which lets you restore data to any point within the previous seven days.
  • Point-in-time-recovery (PITR) is limited to the last 7 days.
  • Point-in-time-recovery (PITR) operations are limited to the last 7 days. The date picker doesn’t restrict you from choosing an earlier date; you will receive an error if you try to recover from a date outside of the 7-day window.

I understand it as being able to restore to any point in time within the last 7 days, right? Is that simple? Did they lost 7 hours of transactions when it would have been so easy to loose nothing?

In order to test it I’ve run the following:


PGPASSWORD=l3htrqdpa0zubbgu psql -U doadmin -h db-postgresql-fra1-63512-do-user-8199131-0.a.db.ondigitalocean.com -p 25060 -d defaultdb --set=sslmode=require <<'SQL'

drop table if exists DEMO;
create table DEMO as select current_timestamp ts;
select * from DEMO;

SQL

export PGPASSWORD=l3htrqdpa0zubbgu
while true
do
psql -e -U doadmin -h db-postgresql-fra1-63512-do-user-8199131-0.a.db.ondigitalocean.com -p 25060 -d defaultdb --set=sslmode=require <<<'insert into DEMO select current_timestamp;' ; sleep 15 ; done

This creates a table and inserts the current timestamp every 15 seconds. I’ve started it on Oct 21 09:13:52 CEST 2020 (I’m running in CEST but the PostgreSQL is UTC so the first timestamp recorded is 07:13:52). Two days later, here is what I have in the table:


Oct 22 20:39:40 insert into DEMO select current_timestamp;
Oct 22 20:39:40 INSERT 0 1

defaultdb=> select count(*),min(ts),max(ts) from DEMO;

 count |              min              |             max
-------+-------------------------------+------------------------------
  8438 | 2020-10-21 07:13:52.159067+00 | 2020-10-22 18:39:25.19611+00
(1 row)

At Oct 22 20:39:40 after the last INSERT I have 8438 rows from the begining (2020-10-21 07:13:52 UTC) to now (2020-10-22 18:39:25 UTC).

I have no idea how they came to drop the database because I tried and cannot. It is a managed database and I can connect only to ‘defaultdb’ and you cannot drop a database where you are connected to. Then I just dropped the table:

defaultdb=> drop table DEMO;
DROP TABLE
defaultdb=>

I am at Oct 22 20:39:45 CEST here and the last inserted row (committed as I am in autocommit) is from Oct 22 20:39:25 CEST. The next insert fails as the table is not there anymore:


Oct 22 20:39:55 insert into DEMO select current_timestamp;
ERROR:  relation "demo" does not exist
LINE 1: insert into DEMO select current_timestamp;

Then my goal is to recover with no data loss at the point in time just before the drop of the table. I’ll restore and recover to Oct 22 20:39:42 CEST

It seems quite straightforward: just select “Restore from backup”, then “Choose point in time”, then wait and connect with the provided database host:

Oooops

But… look at the psql screenshot connecting to the restored database, there’s a problem here. Here is what was restored:


[opc@a ~]$ PGPASSWORD=l3htrqdpa0zubbgu psql -U doadmin -h db-postgresql-fra1-63512-oct-22-backup-do-user-8199131-0.a.db.ondigitalocean.com
-p 25060 -d defaultdb --set=sslmode=require
psql (12.4)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

defaultdb=> select count(*),min(ts),max(ts) from DEMO;

 count |              min              |              max
-------+-------------------------------+-------------------------------
  4866 | 2020-10-21 07:13:52.159067+00 | 2020-10-22 03:39:29.391713+00
(1 row)

defaultdb=> \q

The latest row is 2020-10-22 03:39:29 UTC which is Oct 22 05:39:42 CEST… 15 hours of data lost! I’ve taken all screenshots. I don’t see any mistake in the point in time I’ve set (actually there’s one, the screenshot here shows 10pm where it should be 8pm… at 10pm the table was already dropped, but same conclusion). Don’t rely on what you’ve heard. Don’t even trust the documentation. Always test it. Here, I clearly missed something whether it is a bug or a feature.

I tried again, and then realized that when you select a day for the “point in time” the hour/minute/seconds are set to a default. You can change it but it shows some message below it. When you select the first day available you get a “This is the minimum time you can select.” which makes sense as you cannot recover to before the first backup. But you also get a “This is the maximum time you can select.” when you choose the last day. Does this mean that you cannot recover beyond the last backup? I played with some variations moving the hour/minute/second around the proposed date:


We probably have a bug in the GUI there. If 05:51:08 is the last point in time that can be recovered, how can I select 06:50:08 without a warning?

I did a final test to see if I can recover to any point-in-time when it is between the first and last backup:


[opc@a tmp]$ PGPASSWORD=l3htrqdpa0zubbgu psql -U doadmin -h db-postgresql-fra1-63512-oct-21-backup-do-user-8199131-0.a.db.ondigitalocean.com -p 25060 -d defaultdb --set=sslmode=require
psql (12.4)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

defaultdb=> select count(*),min(ts),max(ts) from DEMO;

 count |              min              |             max
-------+-------------------------------+------------------------------
  3083 | 2020-10-21 07:13:52.159067+00 | 2020-10-21 20:10:05.42378+00
(1 row)

This is fine. I selected Oct 21 22:10:10 CEST and I get 20:10:05 UTC – no data loss here.

So, in summary, on this DigitalOcean service, you can recover to any point in time between the first backup available (7 days ago) and the last backup available (can be up to 24 hours ago). This clearly means that they have the data file backups and the WAL. Such granularity cannot be achieved with a dump. But it seems that they consider only the WAL that were there for the last backup. Technically, this is a pity because WAL is there. This could be restored to the point of failure without any data loss. But this service is cheap so maybe they do it on purpose to sell one with lower RPO? They have a service with a standby database, where the WAL is streamed to another node. In any case, I think that the documentation and the GUI are not clear and this will, unfortunately, continue to feed the myth that the last backup is the highest point in time that can be recovered, which is wrong. For user errors (like dropping a table or a database) all WAL is available and this can be recovered with no data loss.

Update on J+1

So, I was unable to recover to the point-in time just before the failure because this DigitalOcean managed database ignores the WAL after it. What is sad is that the WAL is still there. One day later (so, one backup later) the no-data-loss point-in-time I wanted to restore is within the recoverable point-in-time. Let’s try again to validate my guess:


Perfect. I recovered all my data. Even more than I thought: there was an insert at 20:39:40 just 200 milliseconds before the drop of the table.

This proves that the backup technique is perfectly ok, ready to provide a no-data-loss point-in-time recovery with low RTO and RPO. But, by lack of understanding of how database backup recovery works (the backup of files and the WAL), and buggy GUI, and documentation mistake, the user has lost 7 hours of data. If they had kept the service for one more day (for a cost of 50 cts. only) they could have recovered it on the next day (actually a few hours later as they got the problem at night) and pg_dump it to the production (yes, pg_dump is not a backup but can help data movement after a backup recovery). But they terminated the service thinking all was lost, and all recovery files (backups and WAL) are gone with it. That’s another problem: many managed database cloud services consider the database backups as pertaining to the database service and remove them when you terminate the service. That makes no sense: the backup should also protect you for this kind of mistake. Backups must go to an object storage and stay, for the retention duration, even when the database is gone.

Cet article What is a database backup (back to the basics) est apparu en premier sur Blog dbi services.

PostgreSQL in AWS: clearing the doubts

$
0
0

By Franck Pachot

.
I’ve heard and read people saying that the PostgreSQL managed service is not the true open-source PostgreSQL from the community.
This is wrong and I’m writing this post to clarify it.

PostgreSQL on EC2

Obviously, you can install PostgreSQL on an EC2 instance, as a database running on IaaS (Infrastructure as a Service). You have the full choice of version, you can even compile it from sources, and add whatever extensions you want. This has the lowest cost because PostgreSQL is free of any subscription. But you need to do all the “Ops” work (so the TCO may be higher than what you think). Please take care of your backups if you do that. There’s a trend to build microservices with the database embedded with the stateless application and people forget that the database is a stateful component (we called that persistent 15 years ago, or durable 30 years ago) that cannot be stopped and started elsewhere. But if you consider cloud as a hosting solution, installing PostgreSQL in EC2 + EBS is a valid solution. There are no doubts about this: you run the community postgres.

Managed PostgreSQL on RDS

Here is where I’ve heard some wrong messages, so let’s be clear: Amazon RDS for PostgreSQL is running the real PostgreSQL, compiled from the postgres community sources. RDS is the family name for all managed relational databases and this includes Open Source databases (PostgreSQL, MySQL, MariaDB), some commercial databases (Oracle Database, Microsoft SQL Server), and Amazon Aurora (I will talk about it later). Here is how you create a PostgreSQL database in RDS: you select “PostgreSQL” with the PostgreSQL logo and can choose mostly any supported version (at the time of writing this: any minor version between 9.5.2 to 12.4):

There is no ambiguity there: only one service has the PostgreSQL name and logo. You cannot mistakenly select Aurora here. If you create an Amazon RDS PostgreSQL service, you have the “real” PostgreSQL. And you can even do that on the Free tier.

You can check the version and compilation:


$ PGHOST=database-1.ce5fwv4akhjp.eu-central-1.rds.amazonaws.com PGPORT=5432 PGPASSWORD=postgres psql -U postgres

psql (12.4)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=> select version();
                                                version
--------------------------------------------------------------------------------------------------------
 PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
(1 row)

The only difference in the features is that, as it is a managed database, you don’t have all privileges:


postgres=> \du
                                                                     List of roles
    Role name    |                         Attributes                         |                          Member of

-----------------+------------------------------------------------------------+---------------------------------------------------
 postgres        | Create role, Create DB                                    +| {rds_superuser}
                 | Password valid until infinity                              |
 rds_ad          | Cannot login                                               | {}
 rds_iam         | Cannot login                                               | {}
 rds_password    | Cannot login                                               | {}
 rds_replication | Cannot login                                               | {}
 rds_superuser   | Cannot login                                               | {pg_monitor,pg_signal_backend,rds_replication,rds_password}
 rdsadmin        | Superuser, Create role, Create DB, Replication, Bypass RLS+| {}
                 | Password valid until infinity                              |
 rdsrepladmin    | No inheritance, Cannot login, Replication                  | {}

postgres=> select * from pg_hba_file_rules;

 line_number | type  |   database    | user_name  | address  | netmask | auth_method | options | error
-------------+-------+---------------+------------+----------+---------+-------------+---------+-------
           4 | local | {all}         | {all}      |          |         | md5         |         |
          10 | host  | {all}         | {rdsadmin} | samehost |         | md5         |         |
          11 | host  | {all}         | {rdsadmin} | 0.0.0.0  | 0.0.0.0 | reject      |         |
          12 | host  | {rdsadmin}    | {all}      | all      |         | reject      |         |
          13 | host  | {all}         | {all}      | 0.0.0.0  | 0.0.0.0 | md5         |         |
          14 | host  | {replication} | {all}      | samehost |         | md5         |         |

But this is exactly the same as a PostgreSQL installed from the community sources where you are not the superuser.

There are a few additional RDS specific libraries (which are not open source):


postgres=> show shared_preload_libraries;

  shared_preload_libraries
-----------------------------
 rdsutils,pg_stat_statements

postgres=> select name,setting from pg_settings where name like 'rds.%';

                  name                  |                                                                                                                                                                                                                                                                                                                                                                                          setting                                                                                                                                                                                                                                                                                            
----------------------------------------+------------------------------------------------------------------------
 rds.extensions                         | address_standardizer, address_standardizer_data_us, amcheck, aws_commons, aws_s3, bloom, btree_gin, btree_gist, citext, cube, dblink, dict_int, dict_xsyn, earthdistance, fuzzystrmatch, hll, hstore, hstore_plperl, intagg, intarray, ip4r, isn, jsonb_plperl, log_fdw, ltree, orafce, pageinspect, pgaudit, pgcrypto, pglogical, pgrouting, pgrowlocks, pgstattuple, pgtap, pg_buffercache, pg_freespacemap, pg_hint_plan, pg_prewarm, pg_proctab, pg_repack, pg_similarity, pg_stat_statements, pg_transport, pg_trgm, pg_visibility, plcoffee, plls, plperl, plpgsql, plprofiler, pltcl, plv8, postgis, postgis_tiger_geocoder, postgis_raster, postgis_topology, postgres_fdw, prefix, rdkit, sslinfo, tablefunc, test_parser, tsm_system_rows, tsm_system_time, unaccent, uuid-ossp
 rds.force_admin_logging_level          | disabled
 rds.force_autovacuum_logging_level     | info
 rds.internal_databases                 | rdsadmin,template0
 rds.logical_replication                | off
 rds.rds_superuser_reserved_connections | 2
 rds.restrict_logical_slot_creation     | off
 rds.restrict_password_commands         | off
 rds.superuser_variables                | session_replication_role
 rds.tablespace_path_prefix             | /rdsdbdata/db/base/tablespace

This is still the community edition that allows extensibility.
There are also some additional functionalities, like RDS Performance Insights to show the database activity with a time x-axis and active session y-axis, drilling down to wait events.

By curiosity I’ve run the regression tests that are provided by the PostgreSQL distribution:


PGHOST=database-1.ce5fwv4akhjp.eu-central-1.rds.amazonaws.com PGPORT=5432 PGPASSWORD=postgres psql -U postgres -c "select version();"
cd /var/tmp
git clone --branch REL_12_STABLE https://github.com/postgres/postgres.git
cd postgres
./configure
cd src/test/regress
export PGHOST=database-1.ce5fwv4akhjp.eu-central-1.rds.amazonaws.com
export PGPORT=5432
export PGPASSWORD=postgres
export PGUSER=postgres
make installcheck

It starts by setting some parameters, which fails for missing privileges:


============== dropping database "regression"         ==============
DROP DATABASE
============== creating database "regression"         ==============
CREATE DATABASE
ERROR:  permission denied to set parameter "lc_messages"
command failed: "/usr/local/pgsql/bin/psql" -X -c "ALTER DATABASE \"regression\" SET lc_messages TO 'C';ALTER DATABASE \"regression\" SET lc_m
onetary TO 'C';ALTER DATABASE \"regression\" SET lc_numeric TO 'C';ALTER DATABASE \"regression\" SET lc_time TO 'C';ALTER DATABASE \"regressio
n\" SET bytea_output TO 'hex';ALTER DATABASE \"regression\" SET timezone_abbreviations TO 'Default';" "regression"
make: *** [installcheck] Error 2

In a managed database, the cloud provider needs to lockdown some administration commands in order to secure his platform, but we have the possibility to define those parameters by creating a parameter group. This is what I did in the console, and then removed those ALTER DATABASE from pg_regress.c:


sed -ie 's/"ALTER DATABASE/--&/' pg_regress.c

Then, I’m ready to run the regression tests:


[opc@a regress]$ make installcheck

...

../../../src/test/regress/pg_regress --inputdir=. --bindir='/usr/local/pgsql/bin'    --dlpath=. --max-concurrent-tests=20  --schedule=./serial_schedule
(using postmaster on database-1.ce5fwv4akhjp.eu-central-1.rds.amazonaws.com, port 5432)
============== dropping database "regression"         ==============
DROP DATABASE
============== creating database "regression"         ==============
CREATE DATABASE
============== running regression test queries        ==============
test tablespace                   ... FAILED     1406 ms
test boolean                      ... ok          684 ms
test char                         ... ok          209 ms
test name                         ... ok          311 ms
test varchar                      ... ok          201 ms
test text                         ... ok          549 ms
test int2                         ... ok          354 ms
test int4                         ... ok          580 ms
test int8                         ... ok          954 ms
test oid                          ... ok          222 ms
test float4                       ... FAILED      660 ms
test float8                       ... FAILED     1136 ms
test bit                          ... ok         1751 ms
test numeric                      ... ok         5388 ms

...

test hash_part                    ... ok          240 ms
test indexing                     ... FAILED     6406 ms
test partition_aggregate          ... ok         1569 ms
test partition_info               ... ok          620 ms
test event_trigger                ... FAILED     1237 ms
test fast_default                 ... ok         1990 ms
test stats                        ... FAILED      643 ms

======================================================
 90 of 194 tests failed, 1 of these failures ignored.
======================================================

The differences that caused some tests to fail can be viewed in the
file "/var/tmp/postgres/src/test/regress/regression.diffs".  A copy of the test summary that you see
above is saved in the file "/var/tmp/postgres/src/test/regress/regression.out".

There are multiple tests that fail because of missing privileges. Actually, pg_regression expects to have all privileges. Using pg_regress is not a good idea for testing that the RDS PostgreSQL behaves like expected. It is not made for functional tests.

PostgreSQL-like API in on RDS Aurora

This is where the confusion comes from. Aurora is a proprietary database built by Amazon. They started it by forking MySQL and modifying the storage layer in order to build a cloud-native database. Rather than storing the database files in EBS block storage attached to the database node where the instance is running, like all other RDS databases, the database service is split into separate (micro)services for the compute (EC2) and the storage (distributed over multiple AZ to provide High Availability, similar to the DynamoDB storage). Aurora code is not open-source and is very different from the community MySQL. More and more because of many improvements. Part of it is running on EC2 to parse, optimize and execute SQL statements and transactions, and update the buffers in cache. And part of it runs in the storage server which receives the redo to apply it on the data file blocks, which are distributed and shared with reader instances. This was in 2014 and Amazon always presented Aurora as another database engine in RDS. I’ve copy-pasted below a few archive.org snapshot of the https://aws.amazon.com/rds/ page if you want to look at the history. Amazon modified a lot the lower layers of the code but kept the upper layer to stay compatible with MySQL (version 5) in order to ease the application migration to Aurora. And this is how it is presented: Aurora is a new database with MySQL compatibility. Once you select the Aurora service in RDS, you can choose the version which mentions with MySQL version it is compatible with. For example, the latest version Aurora 2.09.. is labelled “Aurora (MySQL 5.7)”. Note that there are not only modification to adapt to the cloud native storage, but Amazon brings also some interesting improvement, unfortunately not given back to the open-source community.

Here are the screenshots from archive.org where you see from the begining that RDS for Aurora is different than RDS for MySQL. And when MySQL or PostgreSQL is mentioned with Aurora there’s always the “compatible” mention:

What I mean by this is that, in my opinion, there was no ambiguity from Amazon about what is the open-source database and what is their proprietary engine.

That’s a long story about the MySQL compatible Aurora because it was the only compatible API from 2014 to 2017. Then, as they had a strong storage engine, and a well layered code, they have built another flavor of Aurora taking the PostgreSQL upper layer to replace the MySQL one in Aurora. Then they have two proprietary databases: Aurora with MySQL compatibility and Aurora with PostgreSQL compatibility. As far as I know it has always been clear that it is only “compatibility” and Aurora has never been advertised to be a real PostgreSQL engine. They have RDS PostgreSQL for that. And I think they have different use cases. Aurora is probably the best choice when High Availability, scalability and elasticity is the most important. PostgreSQL may provide lower latency on block storage, and is probably cheaper (but Aurora serverless can also reduce the cost for rarely used databases).

However, there is still some confusion and some Aurora users are asking the PostgreSQL community for help. The PostgreSQL community helps a lot their users because they know the engine (which is very well documented, and source code and source comments are accessible). But they cannot do anything for Aurora as we don’t even know what has been modified. As an example, I mentioned in a previous post that some wait events are specific au Aurora – don’t ask the postgres mailing lists for that. I also think that the PostgreSQL community would appreciate that the improvements made by Amazon on the PostgreSQL code are shared with the community.

PostgreSQL on other public clouds

No only Amazon RDS PostgreSQL is the real PostgreSQL from the community, but it is also in my opinion one of the most advanced managed service for it.

  • Oracle Cloud provides no managed service for PostgreSQL but only a Bitnami image. I hope one day we will have a managed service with the same quality as the MySQL one.
  • Azure provides an old service running on Windows but that will change as they bought Citus Data and hired well known PsotgreSQL contributors.
  • Google Cloud DB has a decent managed service, but I need to look at their backups
  • .

Talking about backups, Amazon RDS PostgreSQL has all features to avoid or recover from mistakes, which I listed in a 5-points tweet:
https://twitter.com/FranckPachot/status/1325565762946396162?s=20

Cet article PostgreSQL in AWS: clearing the doubts est apparu en premier sur Blog dbi services.

Swiss Cloud Provider: Exoscale

$
0
0

Background

More and more companies are moving from on premise hosting to cloud hosting. There are many justifications to move to the cloud, but also also many justifications not to move to the cloud. It depends on the company.

Advantages of cloud are certainly:

  • With just a few clicks you can scale your environment.

  • It can be extremely cost-effective, depending on what you need.

  • Reduced IT administration effort.

  • No more investment costs for server hardware.

  • Access to geographically distributed IT resources regardless of device, time and location.

If you are hosting personal data, then you need to care about privacy and the guidelines of your local state/country. Otherwise you could have issues with GDPR. Because of this, the server location is very very important. Exoscale is a Swiss company focusing on cloud computing. Their goal is simple: To be a solid European cloud hosting alternative. Servers are located in: Switzerland, Austria and Germany.

Usability

Once you created a login and successfully logged in, you will notice at the first glance, that the website is kept very simple and is very easy to understand. Before you create a new instance, I would recommend you to generate a new ssh key pair and upload your public key to Exoscale.

Linux / Mac users:

ssh-keygen

Windows users:

ssh-keygen.exe

And then upload your public key under Computer -> SSH-KEYS -> Add

Because of security, we recommend to create a new firewall rule which allows ssh connections only from your public IP Address. For that you need to find out your public IP address and open Compute -> Firewalling -> default

As you see, it’s very simple.

Options

Templates

To create a new server and connect through SSH, it takes less than 5 minutes. Exoscale provides the most used Linux distributions like CentOS, RedHat, Ubuntu, Debian, SLES, Fedora, OpenBSD, Windows Server 2016/19 and other Templates like CheckPoint, Cisco Asav F5 BIG-IP…

You can select a default instance type or choose between Storage/Memory/CPU focused server types. For sure, you can upgrade/downgrade the size of your server, otherwise cloud computing would not make sense 🙂 After the server has been successfully created, the login password will be displayed and then you can login through SSH.

CLI

As every cloud provider, Exoscale provides a CLI, too. Otherwise automating processes is impossible. After you installed the CLI from this link and configured your IAM user, you are ready to use it. It’s very easy to use. Just type “exo” and you will get all available commands.

 

For example, if you want to create a template from your snapshot.

exo vm snapshot list
exo vm template register centos8-pg13 --from-snapshot 903fd897-380a-41e2-8425-c4e5488d0951 --description "CentOS 8 server with PostgreSQL 13"

What we tested?

We installed PostgreSQL 13 on CentOS 8 and Debian 10. To manage multiple PostgreSQL instances easily, we installed also our DMK, the Database Management Kit. Increased the disc space. Worked without problems.

The good side First of all, I’ts a Swiss company and they are providing 6 locations, in 4 Countries:

  • Switzerland, Zürich (CH-DK-2)

  • Switzerland, Geneva (CH-GVA-2)

  • Austria, Vienna (AT-VIE-1)

  • Germany, Munich (DE-MUC-1)

  • Germany, Frankfurt (DE-FRA-1)

  • Bulgaria, Sofia (BG-SOF-1)

For more information: https://www.exoscale.com/datacenters/ As you can see from the screenshots, it is very simple to use. Also the CLI is very easy. They are providing very good and understandable documentation. https://community.exoscale.com/

The bad side:

  • If you want to split your database volume from your OS volume by adding additional volumes, it’s not possible. Exosacle provides Object storage, its like the AWS S3 to store files. That means you can not mount it to your OS to use it for databases.

  • Once you create a new instance from your own template, you are not able to create a user at the same time.

Cet article Swiss Cloud Provider: Exoscale est apparu en premier sur Blog dbi services.


Loading data from S3 to AWS RDS for PostgreSQL

$
0
0

AWS RDS for PostgreSQL comes with an extension that allows you to fetch data from AWS S3 and to write back data to AWS S3. The use case for this is obvious: Either you use other AWS services that write data to S3 and you want to further process that data in PostgreSQL, or you want other AWS services to consume data from PostgreSQL by providing that data in S3. Let’s have a look at how that works.

The extension AWS is providing for working with S3 from inside PostgreSQL is called “aws_s3”:

postgres=> select * from pg_available_extensions where name like '%aws%';
    name     | default_version | installed_version |                   comment                   
-------------+-----------------+-------------------+---------------------------------------------
 aws_commons | 1.0             |                   | Common data types across AWS services
 aws_s3      | 1.0             |                   | AWS S3 extension for importing data from S3
(2 rows)

If you try to install the extension you’ll notice that there is a dependency on the “aws_commons” extension:

postgres=> create extension aws_s3;
ERROR:  required extension "aws_commons" is not installed
HINT:  Use CREATE EXTENSION ... CASCADE to install required extensions too.

You can install both extensions in one step using the “CASCADE” option:

postgres=> create extension aws_s3 cascade;
NOTICE:  installing required extension "aws_commons"
CREATE EXTENSION

These extensions provide a couple of helper functions (aws_commons) and the function to import a file from S3 (aws_s3):

postgres=> \dx+ aws_commons
             Objects in extension "aws_commons"
                     Object description                      
-------------------------------------------------------------
 function aws_commons.create_aws_credentials(text,text,text)
 function aws_commons.create_s3_uri(text,text,text)
 schema aws_commons
 type aws_commons._aws_credentials_1
 type aws_commons._s3_uri_1
(5 rows)

postgres=> \dx+ aws_s3
                                       Objects in extension "aws_s3"
                                            Object description                                             
-----------------------------------------------------------------------------------------------------------
 function aws_s3.table_import_from_s3(text,text,text,aws_commons._s3_uri_1,aws_commons._aws_credentials_1)
 function aws_s3.table_import_from_s3(text,text,text,text,text,text,text,text,text)
 schema aws_s3
(3 rows)

Having the extension ready we need a file we can import, so lets create one (exactly the same file as in the previous post, but a bit less rows):

dwe@dwe:~/Downloads$ cat gen_data.sh 
#!/bin/bash
 
FILE="/home/dwe/Downloads/sample.csv"
rm -rf ${FILE}
 
for i in {1..1000000}; do
    echo "${i},firstname${i},lastname${i},xxx${i}@xxx.com,street${i},country${i},description${i}" >> ${FILE}
done

dwe@dwe:~/Downloads$ chmod +x gen_data.sh 
dwe@dwe:~/Downloads$ ./gen_data.sh 
dwe@dwe:~/Downloads$ head -5 sample.csv 
1,firstname1,lastname1,xxx1@xxx.com,street1,country1,description1
2,firstname2,lastname2,xxx2@xxx.com,street2,country2,description2
3,firstname3,lastname3,xxx3@xxx.com,street3,country3,description3
4,firstname4,lastname4,xxx4@xxx.com,street4,country4,description4
5,firstname5,lastname5,xxx5@xxx.com,street5,country5,description5
dwe@dwe:~/Downloads$ ls -lha sample.csv 
-rw-rw-r-- 1 dwe dwe 96M Nov 10 11:11 sample.csv

We’ll be using a new bucket for this demo, so lets create one and then upload the file we just generated:

dwe@dwe:~/Downloads$ aws s3 mb s3://s3-rds-demo --region eu-central-1
make_bucket: s3-rds-demo
dwe@dwe:~/Downloads$ aws s3 cp sample.csv s3://s3-rds-demo/
upload: ./sample.csv to s3://s3-rds-demo/sample.csv         

Before we can do anything against S3 from RDS for PostgreSQL we need to setup the required permissions. You can use security credentials for this, but it is recommended to use IAM roles and policies. The first step is to create a policy that allows listing the bucket, read and write (write is required for writing data to S3 later on):

dwe@dwe:~$ aws iam create-policy \
>    --policy-name rds-s3-policy \
>    --policy-document '{
>      "Version": "2012-10-17",
>      "Statement": [
>        {
>          "Sid": "s3import",
>          "Action": [
>            "s3:GetObject",
>            "s3:ListBucket",
>            "S3:PutObject"
>          ],
>          "Effect": "Allow",
>          "Resource": [
>            "arn:aws:s3:::s3-rds-demo", 
>            "arn:aws:s3:::s3-rds-demo/*"
>          ] 
>        }
>      ] 
>    }' 
{
    "Policy": {
        "PolicyName": "rds-s3-policy",
        "PolicyId": "ANPA2U57KX3NFH4HU4COG",
        "Arn": "arn:aws:iam::xxxxxxxx:policy/rds-s3-policy",
        "Path": "/",
        "DefaultVersionId": "v1",
        "AttachmentCount": 0,
        "PermissionsBoundaryUsageCount": 0,
        "IsAttachable": true,
        "CreateDate": "2020-11-10T12:04:34+00:00",
        "UpdateDate": "2020-11-10T12:04:34+00:00"
    }
}

Once the policy is in place we create an IAM role which gets the policy just created attached to:

dwe@dwe:~$ aws iam create-role \
>    --role-name rds-s3-role \
>    --assume-role-policy-document '{
>      "Version": "2012-10-17",
>      "Statement": [
>        {
>          "Effect": "Allow",
>          "Principal": {
>             "Service": "rds.amazonaws.com"
>           },
>          "Action": "sts:AssumeRole"
>        }
>      ] 
>    }'
{
    "Role": {
        "Path": "/",
        "RoleName": "rds-s3-role",
        "RoleId": "AROA2U57KX3NP2XWVCELI",
        "Arn": "arn:aws:iam::xxxxxxxxxx:role/rds-s3-role",
        "CreateDate": "2020-11-10T12:07:20+00:00",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "rds.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }
    }
}

Attaching the policy to the role (you will need the ARN of the policy from above):

dwe@dwe:~$ aws iam attach-role-policy \
>    --policy-arn arn:aws:iam::xxxxxxxxxx:policy/rds-s3-policy \
>    --role-name rds-s3-role

Finally you need to attach the IAM role to the RDS instance by providing the ARN of the role and the identifier of your RDS instance:

aws rds add-role-to-db-instance \
   --db-instance-identifier dwe-postgres-helvetia \
   --feature-name s3Import \
   --role-arn arn:aws:iam::xxxxxxxx:role/rds-s3-role   \
   --region eu-central-1

Your RDS instance needs to be running to do that, otherwise you’ll get this:

An error occurred (InvalidDBInstanceState) when calling the AddRoleToDBInstance operation: The status for the dwe-postgres DB instance is stopped. The DB instance is not available for s3Import feature.

Having the IAM policy attached to the RDS instance we can load the csv, but first the s3 URI needs to be defined (we do not want to use access keys and credentials):

postgres=> SELECT aws_commons.create_s3_uri('s3-rds-demo'
postgres(>                                 ,'sample.csv'
postgres(>                                 ,'eu-central-1'
postgres(>                                 ) AS s3_uri \gset
postgres=> select :'s3_uri';
               ?column?                
---------------------------------------
 (s3-rds-demo,sample.csv,eu-central-1)
(1 row)

No we are ready to load the file:

postgres=> create table sample ( id int primary key
postgres(>                              , firstname varchar(20)
postgres(>                              , lastname varchar(20)
postgres(>                              , email varchar(20)
postgres(>                              , street varchar(20)
postgres(>                              , country varchar(20)
postgres(>                              , description varchar(20)
postgres(>                              );
CREATE TABLE
postgres=> SELECT aws_s3.table_import_from_s3 ( 'sample'
                                   , ''
                                   , '(format csv)'
                                   , :'s3_uri'
                                   );
                                 table_import_from_s3                                 
--------------------------------------------------------------------------------------
 1000000 rows imported into relation "sample" from file sample.csv of 100222272 bytes
(1 row)
postgres=> select * from sample limit 5;
 id |  firstname  |  lastname  |     email     |  street  |  country  |  description  
----+-------------+------------+---------------+----------+-----------+---------------
 77 | firstname77 | lastname77 | xxx77@xxx.com | street77 | country77 | description77
 78 | firstname78 | lastname78 | xxx78@xxx.com | street78 | country78 | description78
 79 | firstname79 | lastname79 | xxx79@xxx.com | street79 | country79 | description79
  1 | firstname1  | lastname1  | xxx1@xxx.com  | street1  | country1  | description1
  2 | firstname2  | lastname2  | xxx2@xxx.com  | street2  | country2  | description2
(5 rows)

And we’re done. The follow up post will show the opposite: Writing back to to S3 from RDS for PostgreSQL.

Cet article Loading data from S3 to AWS RDS for PostgreSQL est apparu en premier sur Blog dbi services.

PostgreSQL 14 will support OUT parameters for procedures

$
0
0

Before PostgreSQL 11 there was no possibility to work with procedures in PostgreSQL, only functions have been supported. Since PostgreSQL 11, procedures are supported and many waited for that because procedures also brought transaction control (commit/rollback), which is not possible with functions. Next year, when PostgreSQL 14 will be released there will also be support for OUT parameters. Currently only IN, INOUT and VARIADIC are supported. This makes PostgreSQL’s procedures more compatible to Oracle’s implementation of procedures, so lets have a look.

The only way for returning something from a procedure currently is to use INOUT:

postgres=# select version();
                                                          version                                                          
---------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
(1 row)

postgres=# create table t1 ( a int, b text);
CREATE TABLE
postgres=# insert into t1 values (1,'aa');
INSERT 0 1
postgres=# create or replace procedure p1 ( p_val inout int )
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a
postgres$#     into p_val
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE

This simple procedure has one INOUT parameter and the parameter will contain the result of the select once the procedure is executed:

postgres=# call p1(5);
 p_val 
-------
     1
(1 row)

You can, of course, use multiple INOUT parameters as well:

postgres=# create or replace procedure p1 ( p_val inout int 
postgres(#                                , p_val2 inout text)
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a, b
postgres$#     into p_val, p_val2
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE
postgres=# call p1 (5,'ccc');
 p_val | p_val2 
-------+--------
     1 | aa
(1 row)

But if you try to use an OUT parameter this will not work:

postgres=# create or replace procedure p1 ( p_val out int )
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a
postgres$#     into p_val
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
ERROR:  procedures cannot have OUT arguments
HINT:  INOUT arguments are permitted.
postgres=# 

This will change with PostgreSQL 14:

postgres=# select version();
                                                  version                                                  
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
(1 row)

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'aaa');
INSERT 0 1
postgres=# create or replace procedure p1 ( p_val out int )
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a
postgres$#     into p_val
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE

For testing that you somehow need to declare a variable to hold the value, which will be returned:

postgres=# do
postgres-# $$
postgres$# declare 
postgres$#   n int;
postgres$# begin
postgres$#   call p1(n);
postgres$#   raise notice '%', n;
postgres$# end;
postgres$# $$;
NOTICE:  1
DO

Nice.

Cet article PostgreSQL 14 will support OUT parameters for procedures est apparu en premier sur Blog dbi services.

Will PostgreSQL14 finally come with schema variables?

$
0
0

One of the bits you need to solve when you migrate from Oracle to PostgreSQL is this: In the Oracle database there are PL/SQL packages and some of those have package variables defined. PostgreSQL does not know the concept of a package but you can use schemas to group your PL/pgSQL functions and procedures. When it comes to packages variables there is no easy solution currently.

to emulate that, but it is nasty and not the intended use case for this. Schema variables are under discussion for quite some time and the good news is, that the patch is now “Ready for Committer”. This still is not a guarantee that PostgreSQL 14 will come with schema variables, but at least it is close to that.

As always, lets do some simple demos to understand what is going on and how the feature might help, once it is committed. The most simple example for a schema variable is this:

postgres=# create variable var1 as int;
CREATE VARIABLE

As the name of the feature (schema variables) implies, a variable is created in a specific schema. As I have not modified the default search_path the variable got created in the public schema. You can easily check this in the new catalog table pg_variable:

postgres=# select varname, varnamespace::regnamespace from pg_variable;
 varname | varnamespace 
---------+--------------
 var1    | public
(1 row)

By default, once a variable is created, it is persistent and available again after the instance is restarted:

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 01:29:48.825 CET - 1 - 80179 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 01:29:48.825 CET - 2 - 80179 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var1;
 var1 
------
     
(1 row)

Until now the variable does not contain any value, as we did not assign anything. To do that you can use “let”:

postgres=# \h let
Command:     LET
Description: change a schema variable's value
Syntax:
LET schema_variable = sql_expression
LET schema_variable = DEFAULT

URL: https://www.postgresql.org/docs/devel/sql-let.html

Assigning a value can be as simple as this:

postgres=# let var1 = 1;
LET
postgres=# select var1;
 var1 
------
    1
(1 row)

.. or you can calculate the new value:

postgres=# let var1 = var1 * 2;
LET
postgres=# select var1;
 var1 
------
    2
(1 row)

The value, a variable has assigned to, is not persistent, it lives only for the duration of the session:

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 01:44:42.837 CET - 1 - 80305 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 01:44:42.837 CET - 2 - 80305 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var1;
 var1 
------
     
(1 row)

If you want to have the value of a variable persistent you need to make it immutable:

postgres=# create immutable variable var2 as int default 2;
CREATE VARIABLE
postgres=# select var2;
 var2 
------
    2
(1 row)

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 01:58:53.365 CET - 1 - 80414 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 01:58:53.365 CET - 2 - 80414 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var2;
 var2 
------
    2
(1 row)

Important to understand is, that variables are not transaction safe by default:

postgres=# create variable var3 as int default 3;
CREATE VARIABLE 
postgres=# select var3;
 var3 
------
    3
(1 row)

postgres=# begin;
BEGIN
postgres=*# let var3=4;
LET
postgres=*# rollback;
ROLLBACK
postgres=# select var3;
 var3 
------
    4
(1 row)

But you can do it, if you want:

postgres=# create variable var4 as int default 5 on transaction end reset;
CREATE VARIABLE
postgres=# begin;
BEGIN
postgres=*# let var4 = 10;
LET
postgres=*# select var4;
 var4 
------
   10
(1 row)

postgres=*# rollback;
ROLLBACK
postgres=# select var4;
 var4 
------
    5
(1 row)

Like tables, variables can also be temporary:

postgres=# create temporary variable var6 as int default -1;
CREATE VARIABLE
postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 02:22:22.308 CET - 1 - 80611 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 02:22:22.308 CET - 2 - 80611 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var6;
ERROR:  column "var6" does not exist
LINE 1: select var6;
               ^
postgres=# 

… and you can also specify to drop the variable at commit time:

postgres=# begin;
BEGIN
postgres=*# create temporary variable var7 as int default -1 on commit drop;
CREATE VARIABLE
postgres=*# let var7 = -9;
LET
postgres=*# commit;
COMMIT
postgres=# select var7;
ERROR:  column "var7" does not exist
LINE 1: select var7;
               ^
postgres=# 

Variables can be referenced in procedures, functions and in SQL:

postgres=# create variable var8 as int default 100;
CREATE VARIABLE
postgres=# create variable var8 as int default 100;
CREATE VARIABLE
postgres=# create function f1() returns int as $$select var8;$$ language SQL;
CREATE FUNCTION
postgres=# select f1();
 f1  
-----
 100
(1 row)

postgres=# create procedure p1() as $$
postgres$# declare
postgres$# begin
postgres$#   let var8 = 101;
postgres$#   raise notice '%', var8;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE
postgres=# call p1();
NOTICE:  101
CALL

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# select var8;
 var8 
------
  101
(1 row)

postgres=# insert into t1 values (101,'aaa');
INSERT 0 1
postgres=# select * from t1 where a = var8;
  a  |  b  
-----+-----
 101 | aaa
(1 row)

This is really a great feature and I do hope it finally gets committed for PostgreSQL 14.

Cet article Will PostgreSQL14 finally come with schema variables? est apparu en premier sur Blog dbi services.

PostgreSQL 14: Add the number of de-allocations to pg_stat_statements?

$
0
0

In the last post we had a look at an interesting patch (schema variables) that is currently waiting to be committed for PostgreSQL 14. Another patch, that currently is in the same state is named [PATCH] Add features to pg_stat_statements. As this does not give much information what the patch is about, here a short summary: When you reach the maximum number of allowed statements to track, pg_stat_statements will de-allocate entries to free space for other statements. As de-allocations use resources as well and frequent de-allocations might also mean that pg_stat_statement is not configured the right way for your workload, this patch provides the following information: how many de-allocations happened.

In the default configuration pg_stat_statements tracks a maximum of 5000 statements:

postgres=# show pg_stat_statements.max;
 pg_stat_statements.max 
------------------------
 5000
(1 row)

The lowest value allowed for pg_stat_statements.max is 100:

postgres=# alter system set pg_stat_statements.max=5;
ERROR:  5 is outside the valid range for parameter "pg_stat_statements.max" (100 .. 2147483647)
postgres=# alter system set pg_stat_statements.max=100;
ALTER SYSTEM

As this change requires a restart it should be set carefully:

postgres=# select context from pg_settings where name = 'pg_stat_statements.max';
  context   
------------
 postmaster
(1 row)

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-20 01:00:54.656 CET - 1 - 102610 -  - @ LOG:  redirecting log output to logging collector process
2020-11-20 01:00:54.656 CET - 2 - 102610 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# show pg_stat_statements.max;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# show pg_stat_statements.max;
 pg_stat_statements.max 
------------------------
 100
(1 row)

Lets generate some random statements and the check what additional information we get with this patch:

postgres=# select 'create table t'||i||'( a int, b int, c int)' from generate_series(1,101) i; \gexec
                ?column?                 
-----------------------------------------
 create table t1( a int, b int, c int)
 create table t2( a int, b int, c int)
 create table t3( a int, b int, c int)
 create table t4( a int, b int, c int)
 create table t5( a int, b int, c int)
 create table t6( a int, b int, c int)
 create table t7( a int, b int, c int)
 create table t8( a int, b int, c int)
 create table t9( a int, b int, c int)
...
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
postgres=# 

This created 101 tables and should have forced pg_stat_statements to de-allocate some older statements. There is a new, very simple view which provides this information:

postgres=# select * from pg_stat_statements_info;
 dealloc 
---------
       3
(1 row)

Nice. This will give you an idea if pg_stat_statements.max is not properly configured for your environment. If you manually re-set pg_statements the de-allocation counter will also be reset:

postgres=# select * from pg_stat_statements_info;
 dealloc 
---------
       3
(1 row)

postgres=# select pg_stat_statements_reset();
 pg_stat_statements_reset 
--------------------------
 
(1 row)

postgres=# select * from pg_stat_statements_info;
 dealloc 
---------
       0
(1 row)

Cet article PostgreSQL 14: Add the number of de-allocations to pg_stat_statements? est apparu en premier sur Blog dbi services.

PostgreSQL 14: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

$
0
0

It is a common misunderstanding that VACUUM FULL saves you from running out of disk space if you already have space pressure. Running a VACUUM FULL temporarily requires at least double the space, as the table (and the indexes on the table) get completely re-written. PostgreSQL 14 will probably come with a solution for that as this patch introduces the possibility to move relations from one tablespace to another, when either CLUSTER, VACUUM FULL or REINDEX is executed.

As this is about moving relations from one tablespace to another we obviously need at least two tablespaces to play with:

postgres=# \! mkdir /var/tmp/tbs1
postgres=# \! mkdir /var/tmp/tbs2
postgres=# create tablespace tbs1 location '/var/tmp/tbs1';
CREATE TABLESPACE
postgres=# create tablespace tbs2 location '/var/tmp/tbs2';
CREATE TABLESPACE
postgres=# \db
          List of tablespaces
    Name    |  Owner   |   Location    
------------+----------+---------------
 pg_default | postgres | 
 pg_global  | postgres | 
 tbs1       | postgres | /var/tmp/tbs1
 tbs2       | postgres | /var/tmp/tbs2
(4 rows)

Lets assume we have a table in the first tablespace and we face space pressure on that file system:

postgres=# create table t1 ( a int, b date ) tablespace tbs1;
CREATE TABLE
postgres=# insert into t1 select x, now() from generate_series(1,1000000) x;
INSERT 0 1000000

Without that patch there is not much you can do, except for this (which blocks for the duration of the operation):

postgres=# alter table t1 set tablespace tbs2;
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | text    |           |          | 
 c      | date    |           |          | 
Tablespace: "tbs2"

This will move the files of that table to the new tablespace (but not the indexes). If you really want to get back the space on disk with “vacuum full” you can now do that:

postgres=# vacuum (tablespace tbs1, full true)  t1;
VACUUM
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 
Tablespace: "tbs1"

The very same is possible with reindex:

postgres=# create index i1 on t1 (a);
CREATE INDEX
postgres=# reindex (tablespace tbs2) index i1;
REINDEX

… and cluster:

postgres=# cluster (tablespace tbs1, index_tablespace tbs1) t1 using i1;
CLUSTER
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 
Indexes:
    "i1" btree (a) CLUSTER, tablespace "tbs1"
Tablespace: "tbs1"

postgres=# 

Nice.

Cet article PostgreSQL 14: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly est apparu en premier sur Blog dbi services.

The PostgreSQL shared/global catalog

$
0
0

A PostgreSQL instance (or cluster) can contain many databases, three of them (template0, template1 and postgres) are there by default. Over the last years we trained many people on PostgreSQL Essentials and there have been mainly two points that needed more clarification when it comes to catalogs and the postgres default database:

  1. Does the postgres default database define the catalog and somehow is the master database?
  2. What exactly is in the global catalog?

In this post we’ll look into both points and I hope to make it more clear what the shared/global catalog contains, and that the postgres default database is not a master database and it does not define the postgres catalog.

For the first point (is the default postgres database a master database and does it define the catalog?) the answer can quite easily be given. The default postgres database is there for only one reason: Because most client utilities assume it is there, and by default connect into that database. But this does not mean, that the default postgres is any special, you can go well ahead and drop it:

postgres=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# drop database postgres;
DROP DATABASE
template1=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(2 rows)

template1=# 

We even have customers which do that by default. The default postgres database is nothing special and initially it is exactly the same as template1. You can easily re-create, it if you want:

template1=# create database postgres;
CREATE DATABASE
template1=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

This answers the first question: The default postgres database is not a master database and it does not define the PostgreSQL catalog. Again, check here if you want to have more details about the three default databases.

The second question can be answered easily as well: What exactly is in the global/shared catalog? Most of the PostgreSQL catalog tables are per database, such as pg_tables:

postgres=# \d pg_tables
              View "pg_catalog.pg_tables"
   Column    |  Type   | Collation | Nullable | Default 
-------------+---------+-----------+----------+---------
 schemaname  | name    |           |          | 
 tablename   | name    |           |          | 
 tableowner  | name    |           |          | 
 tablespace  | name    |           |          | 
 hasindexes  | boolean |           |          | 
 hasrules    | boolean |           |          | 
 hastriggers | boolean |           |          | 
 rowsecurity | boolean |           |          | 

All these catalog tables and views are in a system schema called “pg_catalog”. This schema is not listed by default when you use the “\dn” shortcut in psql:

postgres=# \dn
  List of schemas
  Name  |  Owner   
--------+----------
 public | postgres
(1 row)

You need to add “S” for system, to list the system schemas:

postgres=# \dnS
        List of schemas
        Name        |  Owner   
--------------------+----------
 information_schema | postgres
 pg_catalog         | postgres
 pg_toast           | postgres
 public             | postgres
(4 rows)

Some catalog tables/views are global to the cluster/instance and are not per database. The obvious ones are users/roles and tablespaces. None of them are per database as users/roles can have access to various databases and various databases can store relations in the same tablespace. The question now is: How can I know if a catalog table/view is global or per database? Even global catalog tables/views are listed in the local catalog schema:

postgres=# \d pg_catalog.pg_roles
                         View "pg_catalog.pg_roles"
     Column     |           Type           | Collation | Nullable | Default 
----------------+--------------------------+-----------+----------+---------
 rolname        | name                     |           |          | 
 rolsuper       | boolean                  |           |          | 
 rolinherit     | boolean                  |           |          | 
 rolcreaterole  | boolean                  |           |          | 
 rolcreatedb    | boolean                  |           |          | 
 rolcanlogin    | boolean                  |           |          | 
 rolreplication | boolean                  |           |          | 
 rolconnlimit   | integer                  |           |          | 
 rolpassword    | text                     |           |          | 
 rolvaliduntil  | timestamp with time zone |           |          | 
 rolbypassrls   | boolean                  |           |          | 
 rolconfig      | text[]                   | C         |          | 
 oid            | oid                      |           |          | 

By only looking in the catalog schema we can not answer that question. What we can do, however, is to look at the data directory ($PGDATA). The databases are in “base” and the global/shared catalog is in “global”:

postgres@centos8pg:/home/postgres/ [pgdev] cd $PGDATA
postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] ls -l | egrep "base|global"
drwx------. 6 postgres postgres    58 Nov 21 09:50 base
drwx------. 2 postgres postgres  4096 Nov 21 09:48 global

When we look into the “global” directory we’ll see a number of OIDs (object identifiers), this is how PostgreSQL internally is referencing the relations:

postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] ls -l global/
total 564
-rw-------. 1 postgres postgres  8192 Nov 21 03:52 1213
-rw-------. 1 postgres postgres 24576 Nov 20 22:52 1213_fsm
-rw-------. 1 postgres postgres  8192 Nov 21 03:53 1213_vm
-rw-------. 1 postgres postgres  8192 Nov 20 22:52 1214
-rw-------. 1 postgres postgres 24576 Nov 20 22:52 1214_fsm
-rw-------. 1 postgres postgres  8192 Nov 20 22:52 1214_vm
-rw-------. 1 postgres postgres 16384 Nov 20 22:52 1232
-rw-------. 1 postgres postgres 16384 Nov 20 22:52 1233
-rw-------. 1 postgres postgres  8192 Nov 20 22:57 1260
-rw-------. 1 postgres postgres 24576 Nov 20 22:52 1260_fsm
-rw-------. 1 postgres postgres  8192 Nov 20 22:52 1260_vm
...

Each of these OIDs is one relation of the global/shared catalog. As we are not interested in the visibility maps and free space maps let’s exclude them, and only list the unique OIDs:

postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] ls -l global/ | awk -F " " '{print $9}' | egrep "^[0-9]" | egrep -v "fsm|vm"
1213
1214
1232
1233
1260
1261
1262
2396
2397
2671
2672
2676
2677
2694
2695
2697
2698
2846
2847
2964
2965
2966
2967
3592
3593
4060
4061
4175
4176
4177
4178
4181
4182
4183
4184
4185
4186
6000
6001
6002
6100
6114
6115

These are the relations in the global/shared catalog. For translating these OIDs into human readable names there is oid2name. Without any additional parameters oid2name will give you the name of the databases listed in the “base” directory:

postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] oid2name 
All databases:
    Oid  Database Name  Tablespace
----------------------------------
  24616       postgres  pg_default
  12905      template0  pg_default
      1      template1  pg_default

We can also pass the OIDs of the shared/global catalog to oid2name and the result will answer the second question: What, exactly, is in the global/shared catalog?

postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] for i in `ls -l global/ | awk -F " " '{print $9}' | egrep "^[0-9]" | egrep -v "fsm|vm"`; do oid2name -x -S -q -o $i; done | grep -v "index"
      1213  pg_tablespace  1213  pg_catalog   pg_global
      1214  pg_shdepend  1214  pg_catalog   pg_global
      1260   pg_authid  1260  pg_catalog   pg_global
      1261  pg_auth_members  1261  pg_catalog   pg_global
      1262  pg_database  1262  pg_catalog   pg_global
      2396  pg_shdescription  2396  pg_catalog   pg_global
      2846  pg_toast_2396  2846  pg_toast   pg_global
      2964  pg_db_role_setting  2964  pg_catalog   pg_global
      2966  pg_toast_2964  2966  pg_toast   pg_global
      3592  pg_shseclabel  3592  pg_catalog   pg_global
      4060  pg_toast_3592  4060  pg_toast   pg_global
      4175  pg_toast_1260  4175  pg_toast   pg_global
      4177  pg_toast_1262  4177  pg_toast   pg_global
      4181  pg_toast_6000  4181  pg_toast   pg_global
      4183  pg_toast_6100  4183  pg_toast   pg_global
      4185  pg_toast_1213  4185  pg_toast   pg_global
      6000  pg_replication_origin  6000  pg_catalog   pg_global
      6100  pg_subscription  6100  pg_catalog   pg_global

Here is the answer (excluding the indexes). If we exclude the toast tables as well, you’ll notice that not many catalog tables/views are in the global/shared catalog:

postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] for i in `ls -l global/ | awk -F " " '{print $9}' | egrep "^[0-9]" | egrep -v "fsm|vm"`; do oid2name -x -S -q -o $i; done | egrep -v "index|toast"
      1213  pg_tablespace  1213  pg_catalog   pg_global
      1214  pg_shdepend  1214  pg_catalog   pg_global
      1260   pg_authid  1260  pg_catalog   pg_global
      1261  pg_auth_members  1261  pg_catalog   pg_global
      1262  pg_database  1262  pg_catalog   pg_global
      2396  pg_shdescription  2396  pg_catalog   pg_global
      2964  pg_db_role_setting  2964  pg_catalog   pg_global
      3592  pg_shseclabel  3592  pg_catalog   pg_global
      6000  pg_replication_origin  6000  pg_catalog   pg_global
      6100  pg_subscription  6100  pg_catalog   pg_global

That’s it, hope it helps.

Cet article The PostgreSQL shared/global catalog est apparu en premier sur Blog dbi services.

PostgreSQL 14: Automatic hash and list partitioning?

$
0
0

Declarative partitioning was introduced in PostgreSQL 10 and since then has improved quite much over the last releases. Today almost everything is there what you would expect from such a feature:

  • You can partition by range, list and hash
  • Attaching and detaching partitions
  • Foreign keys
  • Sub-partitioning
  • Indexing and constrains on partitions
  • Partition pruning

What is missing, is the possibility to let PostgreSQL create partitions automatically. With this patch this will finally be possible for hash and list partitioning, once it gets committed.

Lets start with list partitioning: Looking at the patch, new syntax is introduced:

CREATE TABLE tbl_list (i int) PARTITION BY LIST (i)
CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default);

Taking that as an example we should see all partitions created automatically, if we create a partitioned table like this:

postgres=# create table tpart_list ( a text primary key, b int, c int )
           partition by list(a)
           configuration (values in ('a'),('b'),('c'),('d') default partition tpart_list_default);
CREATE TABLE

That should have created 5 partitions automatically: a,b,c,d and the default partition:

postgres=# \d+ tpart_list
                           Partitioned table "public.tpart_list"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 a      | text    |           | not null |         | extended |              | 
 b      | integer |           |          |         | plain    |              | 
 c      | integer |           |          |         | plain    |              | 
Partition key: LIST (a)
Indexes:
    "tpart_list_pkey" PRIMARY KEY, btree (a)
Partitions: tpart_list_0 FOR VALUES IN ('a'),
            tpart_list_1 FOR VALUES IN ('b'),
            tpart_list_2 FOR VALUES IN ('c'),
            tpart_list_3 FOR VALUES IN ('d'),
            tpart_list_default DEFAULT

Nice. The same works for hash partitioned tables but the syntax is slightly different:

CREATE TABLE tbl_hash (i int) PARTITION BY HASH (i)
CONFIGURATION (modulus 3);

The idea is the same, of course: You need to specify the “configuration” and when you go for hash partitioning you need to provide the modulus:

postgres=# create table tpart_hash ( a int primary key, b text)
           partition by hash (a)
           configuration (modulus 5);
CREATE TABLE
postgres=# \d+ tpart_hash
                           Partitioned table "public.tpart_hash"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 a      | integer |           | not null |         | plain    |              | 
 b      | text    |           |          |         | extended |              | 
Partition key: HASH (a)
Indexes:
    "tpart_hash_pkey" PRIMARY KEY, btree (a)
Partitions: tpart_hash_0 FOR VALUES WITH (modulus 5, remainder 0),
            tpart_hash_1 FOR VALUES WITH (modulus 5, remainder 1),
            tpart_hash_2 FOR VALUES WITH (modulus 5, remainder 2),
            tpart_hash_3 FOR VALUES WITH (modulus 5, remainder 3),
            tpart_hash_4 FOR VALUES WITH (modulus 5, remainder 4)

Really nice, great work and thanks to all involved. I hope that the next steps will be:

  • Support automatic partition creation for range partitioning
  • Support automatic partition creation on the fly when data comes in, which requires a new partition. In the thread this is referenced as “dynamic” partitioning and what is implemented here is referenced as “static” partitioning

Cet article PostgreSQL 14: Automatic hash and list partitioning? est apparu en premier sur Blog dbi services.


Even faster data loading with PostgreSQL 14? wal_level=none

$
0
0

PostgreSQL is already very fast with loading loading large amounts of data. You can follow this post to get some recommendations for loading data as fast as possible. In addition you can create unlogged tables, but this on the table level and not the whole cluster. With this patch there will be another option: wal_level=none. With this, only minimal WAL is written, but of course at the cost of losing durability. If the cluster crashes in that mode, the whole cluster is corrupted and can not anymore be started. If you accept that risks, this can be something for you, especially when do data warehousing and load time is one of the most important factors.

To have a baseline to start with lets create a simple file we’ll use for loading a table:

postgres=# create table t1 ( a int, b text, c date );
CREATE TABLE
postgres=# insert into t1 select x, md5(x::text), now() from generate_series(1,6000000) x;
INSERT 0 6000000
postgres=# copy t1 to '/var/tmp/demo.txt';
COPY 6000000
postgres=# \! ls -lha /var/tmp/demo.txt
-rw-r--r--. 1 postgres postgres 297M Nov 23 15:51 /var/tmp/demo.txt
postgres=# 

My current wal_level is replica, so lets change that to minimal:

postgres=# alter system set wal_level = minimal;
ALTER SYSTEM
postgres=# \! pg_ctl restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-23 15:53:23.424 CET - 1 - 209537 -  - @ LOG:  redirecting log output to logging collector process
2020-11-23 15:53:23.424 CET - 2 - 209537 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# show wal_level;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# show wal_level;
 wal_level 
-----------
 minimal
(1 row)

How long does it take to load that file into a new table with wal_level=minimal and how much WAL was generated?

postgres=# create table t2 ( like t1 );
CREATE TABLE
postgres=# \timing
Timing is on.
postgres=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/39872628
(1 row)

Time: 0.757 ms
postgres=# copy t2 from '/var/tmp/demo.txt';
COPY 6000000
Time: 10008.335 ms (00:10.008)
postgres=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/4C693DD8
(1 row)

Time: 0.857 ms
postgres=# select pg_wal_lsn_diff('0/4C693DD8','0/39872628');
 pg_wal_lsn_diff 
-----------------
       316807088
(1 row)
Time: 2.714 ms

The time is around 10 second and we have generated around 316MB of WAL. How does that change if we go with wal_level=none?

ALTER SYSTEM
Time: 28.625 ms
postgres=# \! pg_ctl restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-23 16:00:25.648 CET - 1 - 209599 -  - @ LOG:  redirecting log output to logging collector process
2020-11-23 16:00:25.648 CET - 2 - 209599 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# show wal_level;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
Time: 21.286 ms
postgres=# show wal_level;
 wal_level 
-----------
 none
(1 row)

Time: 1.251 ms

Same test as above:

postgres=# create table t3 ( like t1 );
CREATE TABLE
Time: 44.676 ms
postgres=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/4CA0A550
(1 row)

Time: 7.053 ms
postgres=# copy t3 from '/var/tmp/demo.txt';
COPY 6000000
Time: 7968.204 ms (00:07.968)
postgres=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/4CA0A550
(1 row)

Time: 0.948 ms
postgres=# select pg_wal_lsn_diff('0/4CA0A550','0/4CA0A550');
 pg_wal_lsn_diff 
-----------------
               0
(1 row)

Time: 3.857 ms

We come down to 7 seconds and no WAL generated at all. That means faster loading and no space consumption in the pg_wal directory. Really nice, but be aware that the cluster gets corrupted when it crashes during loading:

postgres=# \! ps -ef | grep "postgres -D"
postgres  209599       1  0 16:00 ?        00:00:00 /u01/app/postgres/product/DEV/db_1/bin/postgres -D /u02/pgdata/DEV
postgres  209644  209534  0 16:04 pts/1    00:00:00 sh -c ps -ef | grep "postgres -D"
postgres  209646  209644  0 16:04 pts/1    00:00:00 grep postgres -D
postgres=# create table t4 ( like t1 );
CREATE TABLE
Time: 3.731 ms
postgres=# copy t4 from '/var/tmp/demo.txt';
COPY 6000000
Time: 8070.995 ms (00:08.071)

In another session kill the postmaster while the load is running:

postgres@centos8pg:/home/postgres/ [pgdev] kill -9 209599

If you try to restart the cluster afterwards this is the result:

2020-11-23 16:05:17.441 CET - 1 - 210089 -  - @ LOG:  database system was interrupted; last known up at 2020-11-23 16:00:25 CET
2020-11-23 16:05:17.441 CET - 2 - 210089 -  - @ FATAL:  detected an unexpected server shutdown when WAL logging was disabled
2020-11-23 16:05:17.441 CET - 3 - 210089 -  - @ HINT:  It looks like you need to deploy a new cluster from your full backup again.
2020-11-23 16:05:17.444 CET - 7 - 210087 -  - @ LOG:  startup process (PID 210089) exited with exit code 1
2020-11-23 16:05:17.444 CET - 8 - 210087 -  - @ LOG:  aborting startup due to startup process failure
2020-11-23 16:05:17.449 CET - 9 - 210087 -  - @ LOG:  database system is shut dow

If you can accept that, this can be a huge step to load data faster than it is possible now.

Cet article Even faster data loading with PostgreSQL 14? wal_level=none est apparu en premier sur Blog dbi services.

Incremental materialized view maintenance for PostgreSQL 14?

$
0
0

Since PostgreSQL 9.3 there is the possibility to create materialized views in PostgreSQL. PostgreSQL 9.4 (one year later) brought concurrent refresh which already is a major step forward as this allowed querying the materialized view while it is being refreshed. What still is missing are materialized views which refresh themselves, as soon as there are changed to the underlying tables. This might change with PostgreSQL 14, as this patch is in active development (at least since middle of 2019). Lets have a look at how that currently works and what the limitations are. If you want to play with this for yourself and do not want to apply the patches: There is a Docker container you can use for your testing as well.

If you want to have a materialized view that is incrementally updated you need to specify this when the materialized view is created:

postgres=# \h create materialized view
Command:     CREATE MATERIALIZED VIEW
Description: define a new materialized view
Syntax:
CREATE [ INCREMENTAL ] MATERIALIZED VIEW [ IF NOT EXISTS ] table_name
    [ (column_name [, ...] ) ]
    [ USING method ]
    [ WITH ( storage_parameter [= value] [, ... ] ) ]
    [ TABLESPACE tablespace_name ]
    AS query
    [ WITH [ NO ] DATA ]

URL: https://www.postgresql.org/docs/devel/sql-creatematerializedview.html

If you skip “INCREMENTAL”, the materialized view will not be updated automatically and you get the behavior as it is now. As we want to have a look at the new feature lets create a base table and then add an incrementally updated materialized view on top of it:

postgres=# create table t1 ( a int, b text, c date );
CREATE TABLE
postgres=# insert into t1 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
postgres=# create incremental materialized view mv1 as select * from t1 with data;
SELECT 1000000
postgres=# 

“\d+” will show you that this materialized view is incrementally updated:

postgres=# \d+ mv1
                              Materialized view "public.mv1"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 a      | integer |           |          |         | plain    |              | 
 b      | text    |           |          |         | extended |              | 
 c      | date    |           |          |         | plain    |              | 
View definition:
 SELECT t1.a,
    t1.b,
    t1.c
   FROM t1;
Access method: heap
Incremental view maintenance: yes

If we update the underlying table, the materialized view gets updated automatically:

postgres=# insert into t1 (a,b,c) values(-1,'aaa',now());
INSERT 0 1
postgres=# select * from mv1 where a = -1;
 a  |  b  |     c      
----+-----+------------
 -1 | aaa | 2020-11-23
(1 row)

postgres=# update t1 set a = -2 where a = -1;
UPDATE 1
postgres=# select * from mv1 where a = -2;
 a  |  b  |     c      
----+-----+------------
 -2 | aaa | 2020-11-23
(1 row)

postgres=# 

That’s really cool but you need to be aware that this comes with a cost: Modifying (insert/update/delete) the underlying table(s) becomes more expensive. Lets compare a small bulk load into a table without a materialized view on top of it against the same load into a table with a materialized view on top:

postgres=# truncate table t1;
TRUNCATE TABLE
postgres=# create table t2 ( a int, b text, c date );
CREATE TABLE
postgres=# \timing
Timing is on.
postgres=# insert into t1 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 3214.712 ms (00:03.215)
postgres=# insert into t2 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 1285.578 ms (00:01.286)
postgres=# insert into t1 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 4117.097 ms (00:04.117)
postgres=# insert into t2 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 1511.681 ms (00:01.512)
postgres=# insert into t1 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 3844.273 ms (00:03.844)
postgres=# insert into t2 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 1463.377 ms (00:01.463)

Without a materialized view, the load time is around 3 times faster, so you have to decide what is more important to you: Fast loading or up to date materialized views.

Finally: Here is the Wiki page that summarizes the feature and also lists some limitations.

Cet article Incremental materialized view maintenance for PostgreSQL 14? est apparu en premier sur Blog dbi services.

Easy failover and switchover with pg_auto_failover

$
0
0

One the really cool things with PostgreSQL is, that you have plenty of choices when it comes to tooling. For high availability we usually go with Patroni, but there is also pg_auto_failover and this will be the topic of this post. Because of the recent announcement around CentOS we’ll go with Debian this time. What is already prepared is the PostgreSQL installation (version 13.1), but nothing else. We start from scratch to see, if “is optimized for simplicity and correctness”, as it is stated on the GitHub page holds true.

This is the setup we’ll start with:

Hostname IP-Address Initial role
pgaf1.it.dbi-services.com 192.168.22.190 Primary and pg_auto_failover monitor
pgaf2.it.dbi-services.com 192.168.22.191 First replica
pgaf3.it.dbi-services.com 192.168.22.192 Second replica

As said above, all three nodes have PostgreSQL 13.1 already installed at this location (PostgreSQL was installed from source code, but that should not really matter):

postgres@pgaf1:~$ ls /u01/app/postgres/product/13/db_1/
bin  include  lib  share

What I did in addition, is to create ssh keys and then copy those from each machine to all nodes so password-less ssh connections are available between the nodes (here is the example from the first node):

postgres@pgaf1:~$ ssh-keygen
postgres@pgaf1:~$ ssh-copy-id postgres@pgaf1
postgres@pgaf1:~$ ssh-copy-id postgres@pgaf2
postgres@pgaf1:~$ ssh-copy-id postgres@pgaf3

For installing pg_auto_failover from source make sure that pg_config is in your path:

postgres@pgaf1:~$ which pg_config
/u01/app/postgres/product/13/db_1//bin/pg_config

Once that is ready, getting pg_auto_failover installed is quite simple:

postgres@pgaf1:~$ git clone https://github.com/citusdata/pg_auto_failover.git
Cloning into 'pg_auto_failover'...
remote: Enumerating objects: 252, done.
remote: Counting objects: 100% (252/252), done.
remote: Compressing objects: 100% (137/137), done.
remote: Total 8131 (delta 134), reused 174 (delta 115), pack-reused 7879
Receiving objects: 100% (8131/8131), 5.07 MiB | 1.25 MiB/s, done.
Resolving deltas: 100% (6022/6022), done.
postgres@pgaf1:~$ cd pg_auto_failover/
postgres@pgaf1:~$ make
make -C src/monitor/ all
make[1]: Entering directory '/home/postgres/pg_auto_failover/src/monitor'
gcc -std=c99 -D_GNU_SOURCE -g -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -O2 -Wformat -Wall -Werror=implicit-int -Werror=implicit-function-declaration -Werror=return-type -Wno-declaration-after-statement -Wno-missing-braces  -fPIC -std=c99 -Wall -Werror -Wno-unused-parameter -Iinclude -I/u01/app/postgres/product/13/db_1/include -g -I. -I./ -I/u01/app/postgres/product/13/db_1/include/server -I/u01/app/postgres/product/13/db_1/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o metadata.o metadata.c
...
make[2]: Leaving directory '/home/postgres/pg_auto_failover/src/bin/pg_autoctl'
make[1]: Leaving directory '/home/postgres/pg_auto_failover/src/bin'
postgres@pgaf1:~$ make install
make -C src/monitor/ all
make[1]: Entering directory '/home/postgres/pg_auto_failover/src/monitor'
make[1]: Nothing to be done for 'all'.
...

This needs to be done on all hosts, of course. You will notice a new extension and new binaries in your PostgreSQL installation:

postgres@pgaf1:~$ ls /u01/app/postgres/product/13/db_1/share/extension/*pgauto*
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.0--1.1.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.0.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.1--1.2.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.2--1.3.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.3--1.4.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.4--dummy.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.4.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover.control
postgres@pgaf1:~$ ls /u01/app/postgres/product/13/db_1/bin/*auto*
/u01/app/postgres/product/13/db_1/bin/pg_autoctl

Having that available we’ll need to initialize the pg_auto_failover monitor which is responsible for assigning roles and health-checking. We’ll do that in the first node:

postgres@pgaf1:~$ export PGDATA=/u02/pgdata/13/monitor
postgres@pgaf1:~$ export PGPORT=5433
postgres@pgaf1:~$ pg_autoctl create monitor --ssl-self-signed --hostname pgaf1.it.dbi-services.com --auth trust --run
14:45:40 13184 INFO  Using default --ssl-mode "require"
14:45:40 13184 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
14:45:40 13184 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
14:45:40 13184 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
14:45:40 13184 INFO  Initialising a PostgreSQL cluster at "/u02/pgdata/13/monitor"
14:45:40 13184 INFO  /u01/app/postgres/product/13/db_1/bin/pg_ctl initdb -s -D /u02/pgdata/13/monitor --option '--auth=trust'
14:45:42 13184 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/monitor/server.crt -keyout /u02/pgdata/13/monitor/server.key -subj "/CN=pgaf1.it.dbi-services.com"
14:45:42 13184 INFO  Started pg_autoctl postgres service with pid 13204
14:45:42 13184 INFO  Started pg_autoctl listener service with pid 13205
14:45:42 13204 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service postgres --pgdata /u02/pgdata/13/monitor -v
14:45:42 13209 INFO   /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/monitor -p 5433 -h *
14:45:42 13205 ERROR Connection to database failed: could not connect to server: No such file or directory
14:45:42 13205 ERROR    Is the server running locally and accepting
14:45:42 13205 ERROR    connections on Unix domain socket "/tmp/.s.PGSQL.5433"?
14:45:42 13205 ERROR Failed to connect to local Postgres database at "port=5433 dbname=postgres", see above for details
14:45:42 13205 ERROR Failed to create user "autoctl" on local postgres server
14:45:42 13184 ERROR pg_autoctl service listener exited with exit status 12
14:45:42 13184 INFO  Restarting service listener
14:45:42 13204 INFO  Postgres is now serving PGDATA "/u02/pgdata/13/monitor" on port 5433 with pid 13209
14:45:43 13221 WARN  NOTICE:  installing required extension "btree_gist"
14:45:43 13221 INFO  Granting connection privileges on 192.168.22.0/24
14:45:43 13221 INFO  Your pg_auto_failover monitor instance is now ready on port 5433.
14:45:43 13221 INFO  Monitor has been successfully initialized.
14:45:43 13221 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service listener --pgdata /u02/pgdata/13/monitor -v
14:45:43 13221 INFO  Managing the monitor at postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require
14:45:43 13221 INFO  Reloaded the new configuration from "/home/postgres/.config/pg_autoctl/u02/pgdata/13/monitor/pg_autoctl.cfg"
14:45:44 13221 INFO  The version of extension "pgautofailover" is "1.4" on the monitor
14:45:44 13221 INFO  Contacting the monitor to LISTEN to its events.

This created a standard PostgreSQL cluster in the background:

postgres@pgaf1:~$ ls /u02/pgdata/13/monitor/
base              pg_dynshmem    pg_notify     pg_stat_tmp  pg_wal                         postmaster.opts
current_logfiles  pg_hba.conf    pg_replslot   pg_subtrans  pg_xact                        postmaster.pid
global            pg_ident.conf  pg_serial     pg_tblspc    postgresql.auto.conf           server.crt
log               pg_logical     pg_snapshots  pg_twophase  postgresql-auto-failover.conf  server.key
pg_commit_ts      pg_multixact   pg_stat       PG_VERSION   postgresql.conf                startup.log
postgres@pgaf1:~$ ps -ef | grep "postgres \-D"
postgres 13209 13204  0 14:45 pts/0    00:00:00 /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/monitor -p 5433 -h *

Before we can initialize the primary instance we need to get the connection string to the monitor:

postgres@pgaf1:~$ pg_autoctl show uri --monitor --pgdata /u02/pgdata/13/monitor/
postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require

Create the primary:

postgres@pgaf1:~$ pg_autoctl create postgres \
>     --hostname pgaf1.it.dbi-services.com \
>     --auth trust \
>     --ssl-self-signed \
>     --monitor 'postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require' \
>     --run
14:52:11 13354 INFO  Using default --ssl-mode "require"
14:52:11 13354 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
14:52:11 13354 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
14:52:11 13354 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
14:52:11 13354 INFO  Started pg_autoctl postgres service with pid 13356
14:52:11 13354 INFO  Started pg_autoctl node-active service with pid 13357
14:52:11 13356 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service postgres --pgdata /u02/pgdata/13/PG1 -v
14:52:11 13357 INFO  Registered node 1 (pgaf1.it.dbi-services.com:5432) with name "node_1" in formation "default", group 0, state "single"
14:52:11 13357 INFO  Writing keeper state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.state"
14:52:11 13357 INFO  Writing keeper init state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.init"
14:52:11 13357 INFO  Successfully registered as "single" to the monitor.
14:52:11 13357 INFO  FSM transition from "init" to "single": Start as a single node
14:52:11 13357 INFO  Initialising postgres as a primary
14:52:11 13357 INFO  Initialising a PostgreSQL cluster at "/u02/pgdata/13/PG1"
14:52:11 13357 INFO  /u01/app/postgres/product/13/db_1/bin/pg_ctl initdb -s -D /u02/pgdata/13/PG1 --option '--auth=trust'
14:52:14 13357 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/PG1/server.crt -keyout /u02/pgdata/13/PG1/server.key -subj "/CN=pgaf1.it.dbi-services.com"
14:52:14 13385 INFO   /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/PG1 -p 5432 -h *
14:52:14 13357 INFO  CREATE DATABASE postgres;
14:52:14 13356 INFO  Postgres is now serving PGDATA "/u02/pgdata/13/PG1" on port 5432 with pid 13385
14:52:14 13357 INFO  The database "postgres" already exists, skipping.
14:52:14 13357 INFO  CREATE EXTENSION pg_stat_statements;
14:52:14 13357 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/PG1/server.crt -keyout /u02/pgdata/13/PG1/server.key -subj "/CN=pgaf1.it.dbi-services.com"
14:52:14 13357 INFO  Contents of "/u02/pgdata/13/PG1/postgresql-auto-failover.conf" have changed, overwriting
14:52:14 13357 INFO  Transition complete: current state is now "single"
14:52:14 13357 INFO  keeper has been successfully initialized.
14:52:14 13357 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service node-active --pgdata /u02/pgdata/13/PG1 -v
14:52:14 13357 INFO  Reloaded the new configuration from "/home/postgres/.config/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.cfg"
14:52:14 13357 INFO  pg_autoctl service is running, current state is "single"

Repeating the same command on the second node (with a different –hostname) will initialize the first replica:

postgres@pgaf2:~$ export PGDATA=/u02/pgdata/13/PG1
postgres@pgaf2:~$ export PGPORT=5432
postgres@pgaf2:~$ pg_autoctl create postgres \
>     --hostname pgaf2.it.dbi-services.com \
>     --auth trust \
>     --ssl-self-signed \
>     --monitor 'postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require' \
>     --run
14:54:09 13010 INFO  Using default --ssl-mode "require"
14:54:09 13010 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
14:54:09 13010 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
14:54:09 13010 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
14:54:09 13010 INFO  Started pg_autoctl postgres service with pid 13012
14:54:09 13010 INFO  Started pg_autoctl node-active service with pid 13013
14:54:09 13012 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service postgres --pgdata /u02/pgdata/13/PG1 -v
14:54:09 13013 INFO  Registered node 2 (pgaf2.it.dbi-services.com:5432) with name "node_2" in formation "default", group 0, state "wait_standby"
14:54:09 13013 INFO  Writing keeper state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.state"
14:54:09 13013 INFO  Writing keeper init state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.init"
14:54:09 13013 INFO  Successfully registered as "wait_standby" to the monitor.
14:54:09 13013 INFO  FSM transition from "init" to "wait_standby": Start following a primary
14:54:09 13013 INFO  Transition complete: current state is now "wait_standby"
14:54:09 13013 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): single ➜ wait_primary
14:54:09 13013 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): wait_primary ➜ wait_primary
14:54:09 13013 INFO  Still waiting for the monitor to drive us to state "catchingup"
14:54:09 13013 WARN  Please make sure that the primary node is currently running `pg_autoctl run` and contacting the monitor.
14:54:09 13013 INFO  FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
14:54:09 13013 INFO  Initialising PostgreSQL as a hot standby
14:54:09 13013 INFO   /u01/app/postgres/product/13/db_1/bin/pg_basebackup -w -d application_name=pgautofailover_standby_2 host=pgaf1.it.dbi-services.com port=5432 user=pgautofailover_replicator sslmode=require --pgdata /u02/pgdata/13/backup/node_2 -U pgautofailover_replicator --verbose --progress --max-rate 100M --wal-method=stream --slot pgautofailover_standby_2
14:54:09 13013 INFO  pg_basebackup: initiating base backup, waiting for checkpoint to complete
14:54:15 13013 INFO  pg_basebackup: checkpoint completed
14:54:15 13013 INFO  pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
14:54:15 13013 INFO  pg_basebackup: starting background WAL receiver
14:54:15 13013 INFO      0/23396 kB (0%), 0/1 tablespace (...ta/13/backup/node_2/backup_label)
14:54:16 13013 INFO   1752/23396 kB (7%), 0/1 tablespace (...ata/13/backup/node_2/base/1/2610)
14:54:16 13013 INFO  23406/23406 kB (100%), 0/1 tablespace (.../backup/node_2/global/pg_control)
14:54:16 13013 INFO  23406/23406 kB (100%), 1/1 tablespace                                         
14:54:16 13013 INFO  pg_basebackup:
14:54:16 13013 INFO   
14:54:16 13013 INFO  write-ahead log end point: 0/2000100
14:54:16 13013 INFO  pg_basebackup:
14:54:16 13013 INFO   
14:54:16 13013 INFO  waiting for background process to finish streaming ...
14:54:16 13013 INFO  pg_basebackup: syncing data to disk ...
14:54:17 13013 INFO  pg_basebackup: renaming backup_manifest.tmp to backup_manifest
14:54:17 13013 INFO  pg_basebackup: base backup completed
14:54:17 13013 INFO  Creating the standby signal file at "/u02/pgdata/13/PG1/standby.signal", and replication setup at "/u02/pgdata/13/PG1/postgresql-auto-failover-standby.conf"
14:54:17 13013 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/PG1/server.crt -keyout /u02/pgdata/13/PG1/server.key -subj "/CN=pgaf2.it.dbi-services.com"
14:54:17 13021 INFO   /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/PG1 -p 5432 -h *
14:54:19 13013 INFO  PostgreSQL started on port 5432
14:54:19 13013 INFO  Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes.
14:54:19 13013 INFO  Ensuring HBA rules for node 1 "node_1" (pgaf1.it.dbi-services.com:5432)
14:54:19 13013 INFO  Transition complete: current state is now "catchingup"
14:54:20 13012 INFO  Postgres is now serving PGDATA "/u02/pgdata/13/PG1" on port 5432 with pid 13021
14:54:20 13013 INFO  keeper has been successfully initialized.
14:54:20 13013 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service node-active --pgdata /u02/pgdata/13/PG1 -v
14:54:20 13013 INFO  Reloaded the new configuration from "/home/postgres/.config/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.cfg"
14:54:20 13013 INFO  pg_autoctl service is running, current state is "catchingup"
14:54:20 13013 INFO  Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes.
14:54:20 13013 INFO  Ensuring HBA rules for node 1 "node_1" (pgaf1.it.dbi-services.com:5432)
14:54:21 13013 INFO  Monitor assigned new state "secondary"
14:54:21 13013 INFO  FSM transition from "catchingup" to "secondary": Convinced the monitor that I'm up and running, and eligible for promotion again
14:54:21 13013 INFO  Creating replication slot "pgautofailover_standby_1"
14:54:21 13013 INFO  Transition complete: current state is now "secondary"
14:54:21 13013 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): primary ➜ primary

The last lines of the output confirm, that pgaf1 is the primary cluster and pgaf2 now hosts a replica. Lets do the same on the third node:

postgres@pgaf3:~$ pg_autoctl create postgres \
>     --hostname pgaf3.it.dbi-services.com \
>     --auth trust \
>     --ssl-self-signed \
>     --monitor 'postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require' \
>     --run
14:57:19 12831 INFO  Using default --ssl-mode "require"
14:57:19 12831 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
14:57:19 12831 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
14:57:19 12831 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
14:57:19 12831 INFO  Started pg_autoctl postgres service with pid 12833
14:57:19 12831 INFO  Started pg_autoctl node-active service with pid 12834
14:57:19 12833 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service postgres --pgdata /u02/pgdata/13/PG1 -v
14:57:19 12834 INFO  Registered node 3 (pgaf3.it.dbi-services.com:5432) with name "node_3" in formation "default", group 0, state "wait_standby"
14:57:19 12834 INFO  Writing keeper state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.state"
14:57:19 12834 INFO  Writing keeper init state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.init"
14:57:19 12834 INFO  Successfully registered as "wait_standby" to the monitor.
14:57:19 12834 INFO  FSM transition from "init" to "wait_standby": Start following a primary
14:57:19 12834 INFO  Transition complete: current state is now "wait_standby"
14:57:19 12834 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): primary ➜ join_primary
14:57:20 12834 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): join_primary ➜ join_primary
14:57:20 12834 INFO  Still waiting for the monitor to drive us to state "catchingup"
14:57:20 12834 WARN  Please make sure that the primary node is currently running `pg_autoctl run` and contacting the monitor.
14:57:20 12834 INFO  FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
14:57:20 12834 INFO  Initialising PostgreSQL as a hot standby
14:57:20 12834 INFO   /u01/app/postgres/product/13/db_1/bin/pg_basebackup -w -d application_name=pgautofailover_standby_3 host=pgaf1.it.dbi-services.com port=5432 user=pgautofailover_replicator sslmode=require --pgdata /u02/pgdata/13/backup/node_3 -U pgautofailover_replicator --verbose --progress --max-rate 100M --wal-method=stream --slot pgautofailover_standby_3
14:57:20 12834 INFO  pg_basebackup: initiating base backup, waiting for checkpoint to complete
14:57:20 12834 INFO  pg_basebackup: checkpoint completed
14:57:20 12834 INFO  pg_basebackup: write-ahead log start point: 0/4000028 on timeline 1
14:57:20 12834 INFO  pg_basebackup: starting background WAL receiver
14:57:20 12834 INFO      0/23397 kB (0%), 0/1 tablespace (...ta/13/backup/node_3/backup_label)
14:57:20 12834 INFO  23406/23406 kB (100%), 0/1 tablespace (.../backup/node_3/global/pg_control)
14:57:20 12834 INFO  23406/23406 kB (100%), 1/1 tablespace                                         
14:57:20 12834 INFO  pg_basebackup: write-ahead log end point: 0/4000100
14:57:20 12834 INFO  pg_basebackup: waiting for background process to finish streaming ...
14:57:20 12834 INFO  pg_basebackup: syncing data to disk ...
14:57:22 12834 INFO  pg_basebackup: renaming backup_manifest.tmp to backup_manifest
14:57:22 12834 INFO  pg_basebackup: base backup completed
14:57:22 12834 INFO  Creating the standby signal file at "/u02/pgdata/13/PG1/standby.signal", and replication setup at "/u02/pgdata/13/PG1/postgresql-auto-failover-standby.conf"
14:57:22 12834 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/PG1/server.crt -keyout /u02/pgdata/13/PG1/server.key -subj "/CN=pgaf3.it.dbi-services.com"
14:57:22 12841 INFO   /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/PG1 -p 5432 -h *
14:57:22 12834 INFO  PostgreSQL started on port 5432
14:57:22 12834 INFO  Fetched current list of 2 other nodes from the monitor to update HBA rules, including 2 changes.
14:57:22 12834 INFO  Ensuring HBA rules for node 1 "node_1" (pgaf1.it.dbi-services.com:5432)
14:57:22 12834 INFO  Ensuring HBA rules for node 2 "node_2" (pgaf2.it.dbi-services.com:5432)
14:57:22 12834 ERROR Connection to database failed: could not connect to server: No such file or directory
14:57:22 12834 ERROR    Is the server running locally and accepting
14:57:22 12834 ERROR    connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
14:57:22 12834 ERROR Failed to connect to local Postgres database at "port=5432 dbname=postgres", see above for details
14:57:22 12834 ERROR Failed to reload the postgres configuration after adding the standby user to pg_hba
14:57:22 12834 ERROR Failed to update the HBA entries for the new elements in the our formation "default" and group 0
14:57:22 12834 ERROR Failed to update HBA rules after a base backup
14:57:22 12834 ERROR Failed to transition from state "wait_standby" to state "catchingup", see above.
14:57:22 12831 ERROR pg_autoctl service node-active exited with exit status 12
14:57:22 12831 INFO  Restarting service node-active
14:57:22 12845 INFO  Continuing from a previous `pg_autoctl create` failed attempt
14:57:22 12845 INFO  PostgreSQL state at registration time was: PGDATA does not exists
14:57:22 12845 INFO  FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
14:57:22 12845 INFO  Initialising PostgreSQL as a hot standby
14:57:22 12845 INFO  Target directory exists: "/u02/pgdata/13/PG1", stopping PostgreSQL
14:57:24 12833 INFO  Postgres is now serving PGDATA "/u02/pgdata/13/PG1" on port 5432 with pid 12841
14:57:24 12833 INFO  Stopping pg_autoctl postgres service
14:57:24 12833 INFO  /u01/app/postgres/product/13/db_1/bin/pg_ctl --pgdata /u02/pgdata/13/PG1 --wait stop --mode fast
14:57:24 12845 INFO   /u01/app/postgres/product/13/db_1/bin/pg_basebackup -w -d application_name=pgautofailover_standby_3 host=pgaf1.it.dbi-services.com port=5432 user=pgautofailover_replicator sslmode=require --pgdata /u02/pgdata/13/backup/node_3 -U pgautofailover_replicator --verbose --progress --max-rate 100M --wal-method=stream --slot pgautofailover_standby_3
14:57:24 12845 INFO  pg_basebackup:
14:57:24 12845 INFO   
14:57:24 12845 INFO  initiating base backup, waiting for checkpoint to complete
14:57:24 12845 INFO  pg_basebackup:
14:57:24 12845 INFO   
14:57:24 12845 INFO  checkpoint completed
14:57:24 12845 INFO  pg_basebackup:
14:57:24 12845 INFO   
14:57:24 12845 INFO  write-ahead log start point: 0/5000028 on timeline 1
14:57:24 12845 INFO  pg_basebackup:
14:57:24 12845 INFO   
14:57:24 12845 INFO  starting background WAL receiver
14:57:24 12845 INFO      0/23397 kB (0%), 0/1 tablespace (...ta/13/backup/node_3/backup_label)
14:57:25 12845 INFO  16258/23397 kB (69%), 0/1 tablespace (...3/backup/node_3/base/12662/12512)
14:57:25 12845 INFO  23406/23406 kB (100%), 0/1 tablespace (.../backup/node_3/global/pg_control)
14:57:25 12845 INFO  23406/23406 kB (100%), 1/1 tablespace                                         
14:57:25 12845 INFO  pg_basebackup: write-ahead log end point: 0/5000100
14:57:25 12845 INFO  pg_basebackup: waiting for background process to finish streaming ...
14:57:25 12845 INFO  pg_basebackup: syncing data to disk ...
14:57:27 12845 INFO  pg_basebackup:
14:57:27 12845 INFO   
14:57:27 12845 INFO  renaming backup_manifest.tmp to backup_manifest
14:57:27 12845 INFO  pg_basebackup:
14:57:27 12845 INFO   
14:57:27 12845 INFO  base backup completed
14:57:27 12845 INFO  Creating the standby signal file at "/u02/pgdata/13/PG1/standby.signal", and replication setup at "/u02/pgdata/13/PG1/postgresql-auto-failover-standby.conf"
14:57:27 12845 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/PG1/server.crt -keyout /u02/pgdata/13/PG1/server.key -subj "/CN=pgaf3.it.dbi-services.com"
14:57:27 12881 INFO   /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/PG1 -p 5432 -h *
14:57:29 12845 INFO  PostgreSQL started on port 5432
14:57:29 12845 INFO  Fetched current list of 2 other nodes from the monitor to update HBA rules, including 2 changes.
14:57:29 12845 INFO  Ensuring HBA rules for node 1 "node_1" (pgaf1.it.dbi-services.com:5432)
14:57:29 12845 INFO  Ensuring HBA rules for node 2 "node_2" (pgaf2.it.dbi-services.com:5432)
14:57:29 12845 INFO  Transition complete: current state is now "catchingup"
14:57:29 12845 INFO  keeper has been successfully initialized.
14:57:29 12845 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service node-active --pgdata /u02/pgdata/13/PG1 -v
14:57:29 12845 INFO  Reloaded the new configuration from "/home/postgres/.config/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.cfg"
14:57:29 12845 INFO  pg_autoctl service is running, current state is "catchingup"
14:57:29 12845 INFO  Fetched current list of 2 other nodes from the monitor to update HBA rules, including 2 changes.
14:57:29 12845 INFO  Ensuring HBA rules for node 1 "node_1" (pgaf1.it.dbi-services.com:5432)
14:57:29 12845 INFO  Ensuring HBA rules for node 2 "node_2" (pgaf2.it.dbi-services.com:5432)
14:57:29 12845 INFO  Monitor assigned new state "secondary"
14:57:29 12845 INFO  FSM transition from "catchingup" to "secondary": Convinced the monitor that I'm up and running, and eligible for promotion again
14:57:29 12833 WARN  PostgreSQL was not running, restarted with pid 12881
14:57:29 12845 INFO  Creating replication slot "pgautofailover_standby_1"
14:57:29 12845 INFO  Creating replication slot "pgautofailover_standby_2"
14:57:29 12845 INFO  Transition complete: current state is now "secondary"
14:57:29 12845 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): primary ➜ primary

That really was quite simple. We now have two replicas synchronizing from the same primary:

postgres=# select usename,application_name,client_hostname,sent_lsn,write_lsn,flush_lsn,replay_lsn,write_lag from pg_stat_replication ;
          usename          |     application_name     |      client_hostname      | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag 
---------------------------+--------------------------+---------------------------+-----------+-----------+-----------+------------+-----------
 pgautofailover_replicator | pgautofailover_standby_2 | pgaf2.it.dbi-services.com | 0/6000148 | 0/6000148 | 0/6000148 | 0/6000148  | 
 pgautofailover_replicator | pgautofailover_standby_3 | pgaf3.it.dbi-services.com | 0/6000148 | 0/6000148 | 0/6000148 | 0/6000148  | 
(2 rows)

If you prepare that well, it is a matter of a few minutes and a setup like this is up and runnning. For the setup part, one bit is missing: All these pg_autoctl commands did not detach from the console, but run in the foreground and everything stops if we cancel the commands or close the terminal.

Luckily pg_auto_failover comes with a handy command to create a systemd service file:

postgres@pgaf1:~$ pg_autoctl -q show systemd --pgdata /u02/pgdata/13/monitor/ > pgautofailover.service
postgres@pgaf1:~$ cat pgautofailover.service
[Unit]
Description = pg_auto_failover

[Service]
WorkingDirectory = /home/postgres
Environment = 'PGDATA=/u02/pgdata/13/monitor/'
User = postgres
ExecStart = /u01/app/postgres/product/13/db_1/bin/pg_autoctl run
Restart = always
StartLimitBurst = 0

[Install]
WantedBy = multi-user.target

This can easily be added to systemd so the monitor will start automatically:

postgres@pgaf1:~$ sudo mv pgautofailover.service /etc/systemd/system
postgres@pgaf1:~$ sudo systemctl daemon-reload
postgres@pgaf1:~$ sudo systemctl enable pgautofailover.service
Created symlink /etc/systemd/system/multi-user.target.wants/pgautofailover.service → /etc/systemd/system/pgautofailover.service.
postgres@pgaf1:~$ sudo systemctl start pgautofailover.service

From now the service will start automatically when the node boots up. Lets do the same for the PostgreSQL clusters:

postgres@pgaf1:~$ pg_autoctl -q show systemd --pgdata /u02/pgdata/13/PG1/ > postgresp1.service
postgres@pgaf1:~$ cat postgresp1.service
[Unit]
Description = pg_auto_failover

[Service]
WorkingDirectory = /home/postgres
Environment = 'PGDATA=/u02/pgdata/13/PG1/'
User = postgres
ExecStart = /u01/app/postgres/product/13/db_1/bin/pg_autoctl run
Restart = always
StartLimitBurst = 0

[Install]
WantedBy = multi-user.target
postgres@pgaf1:~$ sudo mv postgresp1.service /etc/systemd/system
postgres@pgaf1:~$ sudo systemctl daemon-reload
postgres@pgaf1:~$ sudo systemctl enable postgresp1.service
Created symlink /etc/systemd/system/multi-user.target.wants/postgresp1.service → /etc/systemd/system/postgresp1.service.
postgres@pgaf1:~$ sudo systemctl start postgresp1.service

Do the same on the remaing two nodes and reboot all systems. If all went fine pg_auto_failover and the PostgreSQL cluster will come up automatically:

postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/6002320 |       yes |             primary |             primary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/6002320 |       yes |           secondary |           secondary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6002320 |       yes |           secondary |           secondary

That’s it for the first part. In the next post we’ll look at how robust pg_auto_failover is, e.g. what happens when the first node, which also runs the monitor, goes down?

Cet article Easy failover and switchover with pg_auto_failover est apparu en premier sur Blog dbi services.

Recovery in the ☁ with Google Cloud SQL (PostgreSQL)

$
0
0

By Franck Pachot

.
In a previous post I started this series of “Recovery in the ☁” with the Oracle Autonomous database. My goal is to explain the recovery procedures, especially the Point-In-Time recovery procedures because there is often confusion, which I tried to clarify in What is a database backup (back to the basics). And the terms used in managed cloud services or documentation is not very clear, not always the same, and sometimes misleading.

For example, Google Cloud SQL documentation says: “Backups are lightweight; they provide a way to restore the data on your instance to its state at the time you took the backup” and this is right (you can also restore to another instance). The same documentation mentions a bit later that “Point-in-time recovery helps you recover an instance to a specific point in time”. So all information is correct here. But misleading the way it is put: mentioning backups (i.e how the protection is implemented) for one and recovery (i.e how protection is used) for the other. In my opinion, the cloud practitioner should not be concerned by backups in a managed database. Of course, the cloud architect must know how it works. But only the recovery should be exposed to the user. Backups are what the cloud provider runs to ensure the recovery SLA. Here the term backup actually means “restore point”: the only point-in-time you can recover when point-in-time recovery is not enabled. But backups are actually used for both. The point-in-time recovery option just enables additional backup (the WAL/redo).

PostgreSQL

I have created a PostgreSQL instance on the Google Cloud (the service “Google Cloud SQL” offers MySQL, PostgreSQL and SQLServer):

You can see that I enabled “Automate backups” with a time window where they can occur (daily backups) by keeping the default. And “Enable point-in-time recovery”, which is not enabled by default.

Point in Time Recovery

I can understand the reason why it is not enabled by default: enabling it requires more storage for the backups and it is fair not to activate by default a more expensive option. However, I think that when you choose a SQL database, you opt for persistence and durability and expect your database to be protected. I’m not talking only about daily snapshots of the database. All transactions must be protected. Any component can fail and without it, a failure compromises durability.

From my consulting experience and contribution in database forums, I know how people read this. They see “backup” enabled and then think they are protected. It is a managed service, they may not know that their transactions are not protected if they don’t enable WAL archiving. And when they will discover it, it will be too late. I have seen too many databases where recovery settings do not fit what users expect. If I were to design this GUI, with my DBA wisdom, either I would put point-in-time recover as a default, or show a red warning saying: with this default you save storage but will lose transactions if you need to recover.

Here, I have enabled the option “Enable point-in-time recovery” which is clearly described: Allows you to recover data from a specific point in time, down to a fraction of a second, via write-ahead log archiving. Make sure your storage can support at least 7 days of logs. We will see later what happens if storage cannot support 7 days.

I’ve created a simple table, similar to what I did on DigitalOcean to understand their recovery possibilities in this post.


postgres=> create table DEMO as select current_timestamp ts;
SELECT 1
postgres=> select * from DEMO;
              ts
-------------------------------
 2020-12-09 18:08:24.818999+00
(1 row)

I have created a simple table with a timestamp


while true ; do
 PGUSER=postgres PGPASSWORD="**P455w0rd**" psql -h 34.65.91.234 postgres <<<'insert into DEMO select current_timestamp;'
sleep 15 ; done

This connects and inserts one row every 15 seconds.


[opc@a aws]$ PGUSER=postgres PGPASSWORD="**P455w0rd**" psql -h 34.65.91.234 postgres <<<'select max(ts) from DEMO;' | ts
Dec 09 21:53:25               max
Dec 09 21:53:25 -------------------------------
Dec 09 21:53:25  2020-12-09 20:53:16.008487+00
Dec 09 21:53:25 (1 row)
Dec 09 21:53:25

I’m interested to see the last value, especially with I’ll do point-in-time recovery.


[opc@a aws]$ PGUSER=postgres PGPASSWORD="**P455w0rd**" psql -h 34.65.91.234 postgres | ts
insert into DEMO select current_timestamp returning *;
Dec 09 21:55:58               ts
Dec 09 21:55:58 -------------------------------
Dec 09 21:55:58  2020-12-09 20:55:58.959696+00
Dec 09 21:55:58 (1 row)
Dec 09 21:55:58
Dec 09 21:55:58 INSERT 0 1
insert into DEMO select current_timestamp returning *;
Dec 09 21:55:59               ts
Dec 09 21:55:59 -------------------------------
Dec 09 21:55:59  2020-12-09 20:55:59.170259+00
Dec 09 21:55:59 (1 row)
Dec 09 21:55:59
Dec 09 21:55:59 INSERT 0 1
insert into DEMO select current_timestamp returning *;
Dec 09 21:55:59               ts
Dec 09 21:55:59 -------------------------------
Dec 09 21:55:59  2020-12-09 20:55:59.395784+00
Dec 09 21:55:59 (1 row)
Dec 09 21:55:59
Dec 09 21:55:59 INSERT 0 1
insert into DEMO select current_timestamp returning *;
Dec 09 21:55:59               ts
Dec 09 21:55:59 -------------------------------
Dec 09 21:55:59  2020-12-09 20:55:59.572712+00
Dec 09 21:55:59 (1 row)
Dec 09 21:55:59
Dec 09 21:55:59 INSERT 0 1

I have inserted more frequently a few more records and this is the point I want to recover to: 2020-12-09 20:55:59 where I expect to see the previous value comitted (20:55:58.959696).

You do a Point In Time recovery with a clone. This is where namings may be different between cloud providers and it is important to understand. You do a Point In Time recovery when you have an error that happened in the past: a table was dropped by mistake, the application updated the wrong data, because of a user error or application bug, maybe you need to check a past version of a stored procedure,… You want to recover the database to the state just before this error. But you also want to keep the modifications that happened later. And recovery is at database level (some databases offer tablespace subdivision) so it is all or none. Then, can’t overwrite the current database. You keep it running, at its current state, and do your point-in-time recovery into another one. Actually, even with databases with fast point-in-time recovery (PITR), like Oracle Flashback Database or Aurora Backtrack, I did in-place PITR only for special cases: CI test database, or prod during an offline application release. But usually production databases have transactions coming that you don’t want to lose.

Then, with out-of-place PITR, you have access to the current state and the previous state and merge what you have to merge in order to keep the current state but with errors corrected from the past state. This is a copy of the database from a previous state and this is called a clone, it will create a new database instance, that you will keep at least the time you need to compare, analyze, export, and correct the error. So… do not search for a “recover” button. This is in the CLONE action.

The “Create a clone has to options”: “Clone current state of instance” and “Clone from an earlier point in time”. The first one is not about recovery because there’s no error to recover, just the need to get a copy. The second is the Point In Time recovery.

So yes, this operation is possible because you enabled “Point in Time Recovery” and “Point in Time Recovery” (PITR) is what you want to do. But, in order to do that, you go to the “Clone” menu and you click on “Clone”. Again, it makes sense, it is a clone, but I think it can be misleading. Especially when the first time you go to this menu is when a mistake has been made and you are under stress to repair.

When you select “Clone from an earlier point in time” you choose the point in time with a precision of one second. This is where you select the latest point just before the failure. I’ll choose 2020-12-09 20:55:59 or – as this is American, 2020-12-09 8:55:59 PM.

While it runs, it can take time because the whole database is cloned even if you need only part of it, I’ll mention two things. The first one is that you have a granularity of 1 second in the GUI and can even go further with CLI. The second one is that you can restore to a point in time that is even a few minutes before the current one. This is obvious when you work on on-premises databases because you know the WAL is there, but not all managed databases allow that. For example, in the previous post on Oracle Autonomous Database I got a message telling me that “the timestamp specified is not at least 2 hours in the past”. Here at Dec 9, 2020, 10:12:43 PM I’m creating a clone of the 2020-12-09 8:55:59 PM state with no problem.

Failed

Yes, my first PITR attempt failed. But that’s actually not bad because I’m testing the service, and that’s the occasion to see what happens and how to troubleshoot.

One bad thing (which is unfortunately common to many managed clouds as they try to show a simple interface that hides the complexity of a database system): no clue about what happened:

The message says “Failed to create or a fatal error during maintenance” and the SEE DETAILS has the following details: “An unknown error occured”. Not very helpful.

But I have also 3 positive feedbacks. First, we have full access to the postgreSQL logs. There’s even a nice interface to browse them (see the screenshot) but I downloaded them as text to browse with vi 😉

From here I see no problem at all. Just a normal point-in-time recovery:


,INFO,"2020-12-09 21:59:18.236 UTC [1]: [10-1] db=,user= LOG:  aborting any active transactions",2020-12-09T21:59:18.237683Z
,INFO,"2020-12-09 21:59:18.232 UTC [1]: [9-1] db=,user= LOG:  received fast shutdown request",2020-12-09T21:59:18.235100Z
,INFO,"2020-12-09 21:59:17.457 UTC [1]: [8-1] db=,user= LOG:  received SIGHUP, reloading configuration files",2020-12-09T21:59:17.457731Z
,INFO,"2020-12-09 21:59:12.562 UTC [1]: [7-1] db=,user= LOG:  received SIGHUP, reloading configuration files",2020-12-09T21:59:12.567686Z
,INFO,"2020-12-09 21:59:11.436 UTC [1]: [6-1] db=,user= LOG:  database system is ready to accept connections",2020-12-09T21:59:11.437753Z
,INFO,"2020-12-09 21:59:11.268 UTC [11]: [11-1] db=,user= LOG:  archive recovery complete",2020-12-09T21:59:11.268715Z
,INFO,"2020-12-09 21:59:11.105 UTC [11]: [10-1] db=,user= LOG:  selected new timeline ID: 2",2020-12-09T21:59:11.106147Z
,INFO,"2020-12-09 21:59:11.040 UTC [11]: [9-1] db=,user= LOG:  last completed transaction was at log time 2020-12-09 19:55:58.056897+00",2020-12-09T21:59:11.041372Z
,INFO,"2020-12-09 21:59:11.040 UTC [11]: [8-1] db=,user= LOG:  redo done at 0/123F71D0",2020-12-09T21:59:11.041240Z
,INFO,"2020-12-09 21:59:11.040 UTC [11]: [7-1] db=,user= LOG:  recovery stopping before commit of transaction 122997, time 2020-12-09 19:56:03.057621+00",2020-12-09T21:59:11.040940Z
,INFO,"2020-12-09 21:59:10.994 UTC [11]: [6-1] db=,user= LOG:  restored log file ""000000010000000000000012"" from archive",2020-12-09T21:59:10.996445Z
,INFO,"2020-12-09 21:59:10.900 UTC [1]: [5-1] db=,user= LOG:  database system is ready to accept read only connections",2020-12-09T21:59:10.900859Z
,INFO,"2020-12-09 21:59:10.899 UTC [11]: [5-1] db=,user= LOG:  consistent recovery state reached at 0/11000288",2020-12-09T21:59:10.899960Z
,ALERT,"2020-12-09 21:59:10.896 UTC [32]: [1-1] db=cloudsqladmin,user=cloudsqladmin FATAL:  the database system is starting up",2020-12-09T21:59:10.896214Z
,INFO,"2020-12-09 21:59:10.894 UTC [11]: [4-1] db=,user= LOG:  redo starts at 0/11000028",2020-12-09T21:59:10.894908Z
,INFO,"2020-12-09 21:59:10.852 UTC [11]: [3-1] db=,user= LOG:  restored log file ""000000010000000000000011"" from archive",2020-12-09T21:59:10.852640Z
,INFO,"2020-12-09 21:59:10.751 UTC [11]: [2-1] db=,user= LOG:  starting point-in-time recovery to 2020-12-09 19:55:59+00",2020-12-09T21:59:10.764881Z
,ALERT,"2020-12-09 21:59:10.575 UTC [21]: [1-1] db=cloudsqladmin,user=cloudsqladmin FATAL:  the database system is starting up",2020-12-09T21:59:10.576173Z
,ALERT,"2020-12-09 21:59:10.570 UTC [20]: [1-1] db=cloudsqladmin,user=cloudsqladmin FATAL:  the database system is starting up",2020-12-09T21:59:10.571169Z
,ALERT,"2020-12-09 21:59:10.566 UTC [19]: [1-1] db=cloudsqladmin,user=cloudsqladmin FATAL:  the database system is starting up",2020-12-09T21:59:10.567159Z
,ALERT,"2020-12-09 21:59:10.563 UTC [18]: [1-1] db=cloudsqladmin,user=cloudsqladmin FATAL:  the database system is starting up",2020-12-09T21:59:10.563188Z
,ALERT,"2020-12-09 21:59:10.560 UTC [17]: [1-1] db=cloudsqladmin,user=cloudsqladmin FATAL:  the database system is starting up",2020-12-09T21:59:10.560293Z
,ALERT,"2020-12-09 21:59:10.540 UTC [16]: [1-1] db=cloudsqladmin,user=cloudsqladmin FATAL:  the database system is starting up",2020-12-09T21:59:10.540919Z
,ALERT,"2020-12-09 21:59:10.526 UTC [14]: [1-1] db=cloudsqladmin,user=cloudsqladmin FATAL:  the database system is starting up",2020-12-09T21:59:10.526218Z
,ALERT,"2020-12-09 21:59:10.524 UTC [15]: [1-1] db=cloudsqladmin,user=cloudsqladmin FATAL:  the database system is starting up",2020-12-09T21:59:10.524291Z
,INFO,"2020-12-09 21:59:10.311 UTC [11]: [1-1] db=,user= LOG:  database system was interrupted; last known up at 2020-12-08 23:29:48 UTC",2020-12-09T21:59:10.311491Z
,INFO,"2020-12-09 21:59:10.299 UTC [1]: [4-1] db=,user= LOG:  listening on Unix socket ""/pgsql/.s.PGSQL.5432""",2020-12-09T21:59:10.299742Z
,INFO,"2020-12-09 21:59:10.291 UTC [1]: [3-1] db=,user= LOG:  listening on IPv6 address ""::"", port 5432",2020-12-09T21:59:10.291347Z
,INFO,"2020-12-09 21:59:10.290 UTC [1]: [2-1] db=,user= LOG:  listening on IPv4 address ""0.0.0.0"", port 5432",2020-12-09T21:59:10.290905Z
,INFO,"2020-12-09 21:59:10.288 UTC [1]: [1-1] db=,user= LOG:  starting PostgreSQL 13.0 on x86_64-pc-linux-gnu, compiled by Debian clang version 10.0.1 , 64-bit",2020-12-09T21:59:10.289086Z

The last transaction recovered 2020-12-09 19:55:58.056897+00 and this is exactly what I expected as my point-in-time was 19:55:58 (yes I wanted to put 20:55:59 in order to see the transaction from one second ago, but having a look at the screenshot I forgot that I was in UTC+1 there 🤷‍♂️)

While being there watching the logs I see many messages like ERROR: relation “pg_stat_statements” does not exist
It seems they use PMM, from Percona, to monitor. I’ve CREATE EXTENSION PG_STAT_STATEMENTS; to avoid filling the logs.

So, first thing that is awesome: recovery happens exactly as expected and we can see the full log. My unknown fatal problem happened later. But there’s another very positive point: I’m running with trial credits but tried to find some support. And someone from the billing support (not really their job) tried to help me. It was not really helpful in this case but always nice to find someone who tries to help and, transparently, tells you that he tries but not having all tech support access to go further. Thanks Dan.

And I mentioned a third thing that is positive. Knowing that this unexpected error happened after the recovery, I just tried again while Dan was looking if some more information was available. And it worked (so I didn’t distrurb the billing support anymore). So I was just unlucky probably.

Second try

The second try was sucessful. Here is the log of operations (I started the clone at 22:24 – I mean 10:24 PM, GMT+1 so actually 21:24 UTC…):


Dec 9, 2020, 11:00:30 PM	Backup	Backup finished
Dec 9, 2020, 10:54:22 PM	Clone	Clone finished

Great, a backup was initiated just after the clone. My clone is protected (and point-in-time recovery is enabled by default here, like in the source)

Let’s check the log:

,INFO,"2020-12-09 21:59:10.751 UTC [11]: [2-1] db=,user= LOG:  starting point-in-time recovery to 2020-12-09 19:55:59+00",2020-12-09T21:59:10.764881Z

Yes, again I didn’t realize yet that I entered GMT+1 but no worry, I trust the postgreSQL logs.

I check quickly the last record in my table there in the clone:


Your Cloud Platform project in this session is set to disco-abacus-161115.
Use “gcloud config set project [PROJECT_ID]” to change to a different project.

franck@cloudshell:~ (disco-abacus-161115)$ PGUSER=postgres PGPASSWORD="**P455w0rd**" psql -h 34.65.191.96   postgres

psql (13.1 (Debian 13.1-1.pgdg100+1), server 13.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=> select max(ts) from demo;

              max
-------------------------------
 2020-12-09 19:55:51.229302+00
(1 row)

postgres=>

19:55:51 is ok for a recovery at 19:55:59 as I insert every 15 seconds – this was my last transaction at this point in time. PITR is ok.

Disabling PITR

I order to test the recovery without the point-in-time recovery enabled, I disabled it. This requires a database restart but I have not seen any warning, so be careful when you change something.

I check the log to see the restart, and actually I see two of them:

And may be the reason:


LOG:  parameter "archive_mode" cannot be changed without restarting the server

Yes, that’s the PostgreSQL message, but… there’s more:


LOG:  configuration file "/pgsql/data/postgresql.conf" contains errors; unaffected changes were applied

Ok.. this explains why there were another restart: remove the wrong settings?

No, apparently, “Point-in-time recovery” is Disabled from the console and in the engine as well:


[opc@a gcp]$ PGUSER=postgres PGPASSWORD="**P455w0rd**" psql -h 34.65.191.96   postgres
psql (12.4, server 13.0)
WARNING: psql major version 12, server major version 13.
         Some psql features might not work.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES128-GCM-SHA256, bits: 128, compression: off)
Type "help" for help.

postgres=> show archive_mode;
 archive_mode
--------------
 off
(1 row)

postgres=> show archive_command;
 archive_command
-----------------
 (disabled)

so all good finally.

Recovery without point-in-time

Now that PITR is disabled, the “Clone from an earlier point in time” is disabled, which is very good to not mislead you:

You have backups but cannot use them to clone. I like that the GUI makes it very clear: when you restore a backup you do it either in-place or to another instance that you have created before. We are not in clone creation here. We erase an existing database. And there are many warnings and confirmation: no risk.


[opc@a gcp]$ PGUSER=postgres PGPASSWORD="**P455w0rd**" psql -h 34.65.191.96   postgres
psql (12.4, server 13.0)
WARNING: psql major version 12, server major version 13.
         Some psql features might not work.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES128-GCM-SHA256, bits: 128, compression: off)
Type "help" for help.

postgres=> select max(ts) from DEMO;
              max
-------------------------------
 2020-12-09 23:29:45.363173+00
(1 row)

I selected the backup from 12:29:40 and in my GMT+1 timezone and here is my database state from 23:29:45
time when the backup finished. All perfect.

About PITR and WAL size…

I mentioned earlier that enabling Point In Time recovery uses more storage for the WAL. By default the storage for the database auto-increases. So the risk is only to pay more than expected. Then it is better to monitor it. For this test, I disabled Auto storage increase” which is displayed with a warning for a good reason. PostgreSQL does not like a full filesystem and here I’ll show the consequence.


postgres=> show archive_mode;

 archive_mode
--------------
 on
(1 row)

postgres=> show archive_command;
                                             archive_command
---------------------------------------------------------------------------------------------------------
 /utils/replication_log_processor -disable_log_to_disk -action=archive -file_name=%f -local_file_path=%p
(1 row)

I’m checking, from the database that WAL archiving is on. I have inserted a few millions of rows in my demo table and will run an update to generate lot of WAL:


explain (analyze, wal) update DEMO set ts=current_timestamp;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Update on demo  (cost=0.00..240401.80 rows=11259904 width=14) (actual time=199387.841..199387.842 rows=0 loops=1)
   WAL: records=22519642 fpi=99669 bytes=1985696687
   ->  Seq Scan on demo  (cost=0.00..240401.80 rows=11259904 width=14) (actual time=1111.600..8377.371 rows=11259904 loops=1)
 Planning Time: 0.216 ms
 Execution Time: 199389.368 ms
(5 rows)

vacuum DEMO;
VACUUM

With PostgreSQL 13 it is easy to see the measure the amount of WAL generated to protect the changes: 2 GB here so my 15GB storage will quickly be full.

When the storage reached 15TB my query failed in:


WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited
 abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
SSL SYSCALL error: EOF detected
connection to server was lost
psql: error: could not connect to server: FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode

I’m used to Oracle where the database hangs in that case (if it can’t protect the changes by generating redo, it cannot accept new changes). But with PostgreSQL the instance crashes when there is no space in the filesystem:

And here, the problem is that, after a while, I cannot change anything, like increasing the storage. The instance is in a failure state (“Failed to create or a fatal error occurred during maintenance”) from the cloud point of view. I can’t even clone the database to another one. I can delete some backups to reclaim space but I tried too late when the instance was out of service (I tested on another identical test and was able to restart the instance when reclaiming space quickly enough). I think the only thing that I can do by myself (without cloud ops intervention) is restore the last backup. Fortunately, I’ve created a few manual back-ups as I wanted to see it shorten the recovery window. Because I’ve read that only 7 backups are kept, but those are the daily automatic ones, so the recovery window is 7 days (by default, you can bring it up to 365). You create manual backups when you don’t have PITR and need a restore point (like before an application release or a risky maintenance for example). Or even with PITR enabled and want to reduce the recovery time.

I cannot restore in place, getting the following message “You can’t restore an instance from a backup if it has replicas. To resolve, you can delete the replicas.” Anyway, I’ll never recommend to restore in-place even when you think you cannot do anything else. You never know. Here I am sure that the database is recoverable without data loss. I have backups, I have WAL, and they were fsync’d at commit. Actually, after deleting some backups to reclaim space, what I see in the postgres log looks good. So if this happens to you, contact immediately the support and I guess the cloud ops can check the state and bring it back to operational.

So always keep the failed instance just in case the support can get your data back. And we are in the cloud, provisioning a new instance for a few days is not a problem. I have created a new instance and restore the backup from Dec 10, 2020, 6:54:07 PM to it. I must say that at that point I’ve no idea at which state it will be restored. On one hand I’m in the RESTORE BACKUP action, not point-in-time recovery. But I know that WAL is available up to the point of failure because PITR was enabled. It is always very important to rehearse the recovery scenarios and it is even more critical in a managed cloud because what you know is possible technically may not be possible through the service.


franck@cloudshell:~ (disco-abacus-161115)$ PGUSER=postgres PGPASSWORD="**P455w0rd**" psql -h 34.65.38.32
psql (13.1 (Debian 13.1-1.pgdg100+1), server 13.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=> select max(ts) from demo;
              max
-------------------------------
 2020-12-10 17:54:22.874403+00
(1 row)

postgres=>

This is the backup time so no recovery. Even if the WAL are there, they are not applied and this is confirmed by the PostgreSQL which shows no point-in-time recovery:

As you can see, I start to be a master in querying the Google Cloud logs and didn’t export them to a file 😉

So, because there are no WAL, I think that backups are taken with a consistent filesystem snapshot.

Summary

Here is my takeout from those tests. I really like how the recovery possibilities are presented even if I would prefer “backup” to be named “restore point” to avoid any confusion. But it is really good to differentiate the restore of a specific state with the possibility to clone from any point-in-time. I also like that the logical export/import (pg_dump) is in a different place than backup/recovery/clone because a dump is not a database backup. I like the simplicity of the interface, and the visibility of the log. Google Cloud is a really good platform for a managed PostgreSQL. And no surprise about the recovery window: when you enable point-in-time recovery, you can recover to any time, from many days ago (you configure it for your RPO requirement and the storage cost consequence) to the past second. But be careful with storage: don’t let it be full or it can be fatal. I think that auto-extensible storage is good, with thresholds and alerts of course to stay in control.

What I would see as a nice improvement would be a higher advocacy for point-in-time recovery, a big warning when a change requires the restart of the instance, better messages when something fails besides PostgreSQL, and a no-data-loss possibility to clone the current state even when the instance is broken. But as always, if you practice the recovery scenario in advance you will be well prepared when you need it in a critical and stressful situation. And remember I did this without contacting the database support and I’m convinced, given what I see in the logs, that they could recover my database without data loss. In a managed cloud, like on-premises, contact your DBA rather than guessing and trying things that may break all that further. I was only testing what is available from the console here.

Note that a backup RESTORE keeps the configuration of the destination instance (like PITR, firewall rules,…) but a clone has the same configuration as the source. This may not be what you want and then change it after a clone (maybe PITR is not needed for a test database, and maybe you want to allow different CIDR to connect to).

All these may be different in your context, and in future versions, so the main message of this post is that you should spend some time to understand and test recovery, even in a managed service.

Cet article Recovery in the ☁ with Google Cloud SQL (PostgreSQL) est apparu en premier sur Blog dbi services.

pg_auto_failover: Failover and switchover scenarios

$
0
0

In the last post we had a look at the installation and setup of pg_auto_failover. We currently have one primary cluster and two replicas synchronizing from this primary cluster. But we potentially also have an issue in the setup: The monitor is running beside the primary instance on the same node and if that nodes goes down the monitor is gone. What happens in that case and how can we avoid that? We also did not look at controlled switch-overs, and this is definitely something you want to have in production. From time to time you’ll need to do some maintenance on one of the nodes, and switching the primary cluster to another node is very handy in such situations. Lets start with the simple case and have a look at switch-overs first.

This is the current state of the setup:

postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/6002408 |       yes |             primary |             primary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/6002408 |       yes |           secondary |           secondary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6002408 |       yes |           secondary |           secondary

Before we attempt to do a switch-over you should be aware of your replication settings:

postgres@pgaf1:~$ pg_autoctl get formation settings --pgdata /u02/pgdata/13/monitor/
  Context |    Name |                   Setting | Value                                                       
----------+---------+---------------------------+-------------------------------------------------------------
formation | default |      number_sync_standbys | 1                                                           
  primary |  node_1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
     node |  node_1 |        candidate priority | 50                                                          
     node |  node_2 |        candidate priority | 50                                                          
     node |  node_3 |        candidate priority | 50                                                          
     node |  node_1 |        replication quorum | true                                                        
     node |  node_2 |        replication quorum | true                                                        
     node |  node_3 |        replication quorum | true                                     

What does this tell us:

  • synchronous_standby_names: We’re using synchronous replication and at least one of the two replicas need to confirm a commit (This is a PostgreSQL setting)
  • number_sync_standbys=1: That means at least one standby needs to confirm the commit (This is a pg_auto_failover setting)
  • candidate priority=50: This specifies which replica gets promoted. At the default setting of 50 all replicas have the same chance to be selected for promotion and the monitor will pick the one with the most advanced LSN. (This is a pg_auto_failover setting)
  • replication quorum=true: This mean synchronous replication, a values of false mean asynchronous replication. (This is a pg_auto_failover setting)

You maybe have noticed the “formation” keyword above. A formation is a set of PostgreSQL clusters that are managed together and that means you can use the same monitor to manage multiple sets of PostgreSQL clusters. We are using the default formation in this example.

Lets assume we need to do some maintenance on our primary node and therefore want to switch-over the primary instance to another node. The command to do that is simple:

postgres@pgaf1:~$ pg_autoctl perform switchover --pgdata /u02/pgdata/13/PG1/
16:10:05 15960 INFO  Targetting group 0 in formation "default"
16:10:05 15960 INFO  Listening monitor notifications about state changes in formation "default" and group 0
16:10:05 15960 INFO  Following table displays times when notifications are received
    Time |   Name |  Node |                      Host:Port |       Current State |      Assigned State
---------+--------+-------+--------------------------------+---------------------+--------------------
16:10:05 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |             primary |            draining
16:10:05 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |            draining |            draining
16:10:05 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |           secondary |          report_lsn
16:10:05 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |           secondary |          report_lsn
16:10:06 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |          report_lsn |          report_lsn
16:10:06 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |          report_lsn |          report_lsn
16:10:06 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |          report_lsn |   prepare_promotion
16:10:06 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |   prepare_promotion |   prepare_promotion
16:10:06 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |   prepare_promotion |    stop_replication
16:10:06 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |            draining |      demote_timeout
16:10:06 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |          report_lsn |      join_secondary
16:10:06 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |      demote_timeout |      demote_timeout
16:10:06 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |      join_secondary |      join_secondary
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |    stop_replication |    stop_replication
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |    stop_replication |        wait_primary
16:10:07 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |      demote_timeout |             demoted
16:10:07 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |             demoted |             demoted
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |        wait_primary |        wait_primary
16:10:07 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |      join_secondary |           secondary
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |        wait_primary |             primary
16:10:07 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |             demoted |          catchingup
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |        wait_primary |        join_primary
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |        join_primary |        join_primary
16:10:08 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |           secondary |           secondary
16:10:08 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |          catchingup |          catchingup
16:10:08 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |          catchingup |           secondary
16:10:08 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |        join_primary |             primary
16:10:08 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |           secondary |           secondary
16:10:08 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |             primary |             primary
postgres@pgaf1:~$ 

You’ll get the progress messages to the screen so you can actually see what happens. As the services are started with systemd you can also have a look at the journal:

-- Logs begin at Thu 2020-12-10 15:17:38 CET, end at Thu 2020-12-10 16:11:26 CET. --
Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO  FSM transition from "catchingup" to "secondary": Convinced the monitor that I'm
Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO  Transition complete: current state is now "secondary"
Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO  node 1 "node_1" (pgaf1.it.dbi-services.com:5432) reported new state "secondary"
Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): secondary ➜ sec
Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO  New state for this node (node 1, "node_1") (pgaf1.it.dbi-services.com:5432): se
Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO  node 2 "node_2" (pgaf2.it.dbi-services.com:5432) reported new state "primary"
Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO  New state for node 2 "node_2" (pgaf2.it.dbi-services.com:5432): primary ➜ prima
Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO  New state for node 2 "node_2" (pgaf2.it.dbi-services.com:5432): primary ➜ prima

The second second node was selected as the new primary, and we can of course confirm that:

postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/60026F8 |       yes |           secondary |           secondary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/60026F8 |       yes |             primary |             primary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/60026F8 |       yes |           secondary |           secondary

postgres@pgaf1:~$ 

Next test: What happens when we reboot a node that currently is running a replica? Lets reboot pgaf3 as this one is currently a replica, and it does not run the monitor:

postgres@pgaf3:~$ sudo reboot
postgres@pgaf3:~$ Connection to 192.168.22.192 closed by remote host.
Connection to 192.168.22.192 closed.

Watching at the state the “Reachable” status changes to “no” for the third instance and the LSN falls behind:

postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/60026F8 |       yes |           secondary |           secondary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/60026F8 |       yes |             primary |             primary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6000000 |        no |           secondary |           secondary

Once it is back, the replica is brought back to the configuration and all is fine:

postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/60026F8 |       yes |           secondary |           secondary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/60026F8 |       yes |             primary |             primary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6000000 |       yes |           secondary |           secondary

...
postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/6013120 |       yes |           secondary |           secondary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/6013120 |       yes |             primary |             primary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6013120 |       yes |           secondary |           secondary

But what happens if we shutdown the monitor node?

postgres@pgaf1:~$ sudo systemctl poweroff
postgres@pgaf1:~$ Connection to 192.168.22.190 closed by remote host.
Connection to 192.168.22.190 closed.

Checking the status on the node which currently hosts the primary cluster:

postgres@pgaf2:~$ pg_autoctl show state --pgdata /u02/pgdata/13/PG1/
10:26:52 1293 WARN  Failed to connect to "postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require", retrying until the server is ready
10:26:52 1293 ERROR Connection to database failed: timeout expired
10:26:52 1293 ERROR Failed to connect to "postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require" after 1 attempts in 2 seconds, pg_autoctl stops retrying now
10:26:52 1293 ERROR Failed to retrieve current state from the monitor

As the monitor is down we cannot anymore ask for status. The primary and the remaining replica cluster are still up and running but we lost the possibility to interact with pg_auto_failover. Booting up the monitor node brings is back into the game:

postgres@pgaf2:~$ pg_autoctl show state --pgdata /u02/pgdata/13/PG1/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/6000000 |       yes |           secondary |           secondary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/6013240 |       yes |             primary |             primary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6013240 |       yes |           secondary |           secondary

This has a consequence: The monitor should not run on any of the PostgreSQL nodes but on a separate node which is dedicated to the monitor. As you can manage more than one HA setup with the same monitor this should not an issue, though. But this also means that the monitor is a single point of failure and the health of the monitor is critical for pg_auto_failover.

Cet article pg_auto_failover: Failover and switchover scenarios est apparu en premier sur Blog dbi services.

Viewing all 526 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>