Active session history in PostgreSQL: Say hello to pgSentinel

July 11, 2018, 10:17 pm

≫ Next: pgSentinel: the sampling approach for PostgreSQL

There is new project, currently in beta, which aims to bring active session history (and probably more) to PostgreSQL: pgSentinel. Because PostgreSQL is highly extensible such projects are possible and usually are coming as an extension. pgSentinel is no exception to that so lets see how it can be installed. When you want to try the beta for your own, please connect with the project on twitter.

This is what I got:

postgres@pgbox:/home/postgres/beta/ [pg103] ll
total 120
-rw-r--r--. 1 postgres postgres   1108 Jul  8 22:13 pgsentinel--1.0.sql
-rw-r--r--. 1 postgres postgres    117 Jul  5 22:15 pgsentinel.control
-rwxr-xr-x. 1 postgres postgres 108000 Jul  9 11:12 pgsentinel.so
-rw-r--r--. 1 postgres postgres    623 Jul  9 11:22 readme.txt

You can already see from here that we probably need to load a library because of the pgsentinel.so file. Lets copy that to the correct locations, in my case:

postgres@pgbox:/home/postgres/beta/ [pg103] cp pgsentinel--1.0.sql pgsentinel.control /u01/app/postgres/product/10/db_3/share/extension/
postgres@pgbox:/home/postgres/beta/ [pg103] cp pgsentinel.so /u01/app/postgres/product/10/db_3/lib/

As I plan to run pgbench later to get some load onto the system I’ve created a separate database for installing the extension:

postgres@pgbox:/home/postgres/ [PG10] psql -c "create database bench" postgres
CREATE DATABASE
postgres@pgbox:/home/postgres/ [PG10] pgbench -i -s 10 bench

When we need to load a library we need to tell PostgreSQL about that by setting the shared_preload_libraries parameter. As pgsentinel depends on pg_stat_statements this needs to be installed as well.

postgres=# alter system set shared_preload_libraries='pg_stat_statements,pgsentinel';
ALTER SYSTEM

So once we have that set and the instance is restarted:

postgres@pgbox:/home/postgres/beta/ [PG10] pg_ctl -D $PGDATA restart -m fast

… we should see the new extension in the pg_available_extensions view:

postgres=# select * from pg_available_extensions where name = 'pgsentinel';
    name    | default_version | installed_version |        comment         
------------+-----------------+-------------------+------------------------
 pgsentinel | 1.0             |                   | active session history
(1 row)

Ready to install the extensions:

postgres=# create extension pg_stat_statements;
CREATE EXTENSION
postgres=# create extension pgsentinel ;
CREATE EXTENSION
postgres=# \dx
                                     List of installed extensions
        Name        | Version |   Schema   |                        Description                        
--------------------+---------+------------+-----------------------------------------------------------
 pg_stat_statements | 1.5     | public     | track execution statistics of all SQL statements executed
 pgsentinel         | 1.0     | public     | active session history
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
(3 rows)

So what did we get? One solution is to look at the sql file:

cat pgsentinel--1.0.sql
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION pgsentinel" to load this file. \quit

CREATE FUNCTION pg_active_session_history(
    OUT ash_time timestamptz,
    OUT datid Oid,
    OUT datname text,
    OUT pid integer,
    OUT usesysid Oid,
    OUT usename text,
    OUT application_name text,
    OUT client_addr text,
    OUT client_hostname text,
    OUT client_port integer,
    OUT backend_start timestamptz,
    OUT xact_start timestamptz,
    OUT query_start timestamptz,
    OUT state_change timestamptz,
    OUT wait_event_type text,
    OUT wait_event text,
    OUT state text,
    OUT backend_xid xid,
    OUT backend_xmin xid,
    OUT top_level_query text,
    OUT query text,
    OUT queryid bigint,
    OUT backend_type text
   
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_active_session_history'
LANGUAGE C STRICT VOLATILE PARALLEL SAFE;

-- Register a view on the function for ease of use.
CREATE VIEW pg_active_session_history AS
  SELECT * FROM pg_active_session_history();

GRANT SELECT ON pg_active_session_history TO PUBLIC;

The other solution is to ask PostgreSQL directly:

bench=# \dx+ pgsentinel 
  Objects in extension "pgsentinel"
          Object description          
--------------------------------------
 function pg_active_session_history()
 view pg_active_session_history
(2 rows)

Basically we got a function and a view over that function. Lets have a look at the view then:

postgres=# \d pg_active_session_history
                   View "public.pg_active_session_history"
      Column      |           Type           | Collation | Nullable | Default 
------------------+--------------------------+-----------+----------+---------
 ash_time         | timestamp with time zone |           |          | 
 datid            | oid                      |           |          | 
 datname          | text                     |           |          | 
 pid              | integer                  |           |          | 
 usesysid         | oid                      |           |          | 
 usename          | text                     |           |          | 
 application_name | text                     |           |          | 
 client_addr      | text                     |           |          | 
 client_hostname  | text                     |           |          | 
 client_port      | integer                  |           |          | 
 backend_start    | timestamp with time zone |           |          | 
 xact_start       | timestamp with time zone |           |          | 
 query_start      | timestamp with time zone |           |          | 
 state_change     | timestamp with time zone |           |          | 
 wait_event_type  | text                     |           |          | 
 wait_event       | text                     |           |          | 
 state            | text                     |           |          | 
 backend_xid      | xid                      |           |          | 
 backend_xmin     | xid                      |           |          | 
 top_level_query  | text                     |           |          | 
 query            | text                     |           |          | 
 queryid          | bigint                   |           |          | 
 backend_type     | text                     |           |          |

Most of the columns are already in pg_stat_activity but there is more. Before going further lets generate some load:

postgres@pgbox:/home/postgres/ [PG10] pgbench -c 5 -j 4 -T 60 bench 
starting vacuum...end.
transaction type: 
scaling factor: 10
query mode: simple
number of clients: 5
number of threads: 4
duration: 60 s
number of transactions actually processed: 151397
latency average = 1.982 ms
tps = 2522.898859 (including connections establishing)
tps = 2523.280694 (excluding connections establishing)

Now we should see sampled data in the pg_active_session_history view:

bench=# select ash_time,top_level_query,query,queryid,wait_event_type,wait_event from pg_active_session_history limit 10;
           ash_time            |                               top_level_query                               |                                   query                                    |  queryid   | wait_event_type |  wait_event   
-------------------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------+------------+-----------------+---------------
 2018-07-09 14:51:48.883599+02 | create database bench                                                       | create database bench                                                      | 3222771996 | CPU             | CPU
 2018-07-09 14:52:37.291115+02 | copy pgbench_accounts from stdin                                            | copy pgbench_accounts from stdin                                           | 4164808321 | CPU             | CPU
 2018-07-09 14:52:38.292674+02 | alter table pgbench_accounts add primary key (aid)                          | alter table pgbench_accounts add primary key (aid)                         | 4164808321 | CPU             | CPU
 2018-07-09 14:55:51.111621+02 | UPDATE pgbench_branches SET bbalance = bbalance + 2228 WHERE bid = 4;       | UPDATE pgbench_branches SET bbalance = bbalance + 2228 WHERE bid = 4       |  553956422 | Lock            | transactionid
 2018-07-09 14:55:51.111621+02 | END;                                                                        | END                                                                        | 3376944276 | CPU             | CPU
 2018-07-09 14:55:51.111621+02 | UPDATE pgbench_accounts SET abalance = abalance + -2408 WHERE aid = 973208; | UPDATE pgbench_accounts SET abalance = abalance + -2408 WHERE aid = 973208 | 2992934481 | CPU             | CPU
 2018-07-09 14:55:52.112507+02 | UPDATE pgbench_tellers SET tbalance = tbalance + -4957 WHERE tid = 87;      | UPDATE pgbench_tellers SET tbalance = tbalance + -4957 WHERE tid = 87      | 3459630226 | Client          | ClientRead
 2018-07-09 14:55:52.112507+02 | END;                                                                        | END                                                                        | 3376944276 | LWLock          | WALWriteLock
 2018-07-09 14:55:52.112507+02 | END;                                                                        | END                                                                        | 3376944276 | CPU             | CPU
 2018-07-09 14:55:52.112507+02 | UPDATE pgbench_branches SET bbalance = bbalance + -3832 WHERE bid = 8;      | UPDATE pgbench_branches SET bbalance = bbalance + -3832 WHERE bid = 8      |  553956422 | Lock            | transactionid
(10 rows)

The important point here is that we have the queryid which we can use to map that back to pg_stat_statements. So if we want to know what the shared_blks_* statistics for the update statement with query id 553956422 are, we can do that (or write a join over the two views, of course):

bench=# select shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written from pg_stat_statements where queryid = 553956422;
 shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written 
-----------------+------------------+---------------------+---------------------
          453201 |               29 |                  37 |                   0
(1 row)

Really looks promising, automatic session sampling in PostgreSQL. More tests to come …

Cet article Active session history in PostgreSQL: Say hello to pgSentinel est apparu en premier sur Blog dbi services.

↧

pgSentinel: the sampling approach for PostgreSQL

July 12, 2018, 3:12 pm

≫ Next: What a great event – PGDay Amsterdam

≪ Previous: Active session history in PostgreSQL: Say hello to pgSentinel

Here is the first test I did with the beta of pgSentinel. This Active Session History sampling is a new approach to Postgres tuning. For people coming from Oracle, this is something that has made our life a lot easier to optimize database applications. Here is a quick example showing how it links together some information that are missing without this extension.

The installation of the extension is really easy (nore details on Daniel’s post):
cp pgsentinel.control /usr/pgsql-10/share/extension cp pgsentinel--1.0.sql /usr/pgsql-10/share/extension cp pgsentinel.so /usr/pgsql-10/lib

and declare it in postgresql.conf
grep -i pgSentinel $PGDATA/postgresql.conf shared_preload_libraries = 'pg_stat_statements,pgsentinel' #pgsentinel_ash.pull_frequency = 1 #pgsentinel_ash.max_entries = 1000000
and restart:
/usr/pgsql-10/bin/pg_ctl restart

Then create the views in psql:
CREATE EXTENSION pgsentinel;

I was running PGIO (the SLOB method for PostgreSQL from Kevin Closson https://kevinclosson.net/)

Without the extension, here is what I can see about the current activity from the OS point of view, with ‘top -c':
top - 21:57:23 up 1 day, 11:22, 4 users, load average: 4.35, 4.24, 4.16 Tasks: 201 total, 2 running, 199 sleeping, 0 stopped, 0 zombie %Cpu(s): 27.6 us, 19.0 sy, 0.0 ni, 31.0 id, 19.0 wa, 0.0 hi, 3.4 si, 0.0 st KiB Mem : 4044424 total, 54240 free, 282220 used, 3707964 buff/cache KiB Swap: 421884 total, 386844 free, 35040 used. 3625000 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9766 postgres 20 0 440280 160036 150328 D 50.0 4.0 10:56.63 postgres: postgres pgio [local] SELECT 9762 postgres 20 0 439940 160140 150412 D 43.8 4.0 10:55.95 postgres: postgres pgio [local] SELECT 9761 postgres 20 0 440392 160088 150312 D 37.5 4.0 10:52.29 postgres: postgres pgio [local] SELECT 9763 postgres 20 0 440280 160080 150432 R 37.5 4.0 10:41.94 postgres: postgres pgio [local] SELECT 9538 postgres 20 0 424860 144464 142956 D 6.2 3.6 0:30.79 postgres: writer process
As I described in a previous post, PostgreSQL changes the title of the process to display the current operation. This looks interesting, but not very detailed (only ‘SELECT’ here) and very misleading because here I’m running PGIO with 50% updates. The ‘SELECT’ here is the user call. Not the actual SQL statement running.

We have more information from PG_STAT_ACTIVITY, but again only the top-level call is displayed, as I mentioned in a previous post:
select * from pg_stat_activity where pid=9766; -[ RECORD 1 ]----+--------------------------------------------------------- datid | 17487 datname | pgio pid | 9766 usesysid | 10 usename | postgres application_name | psql client_addr | client_hostname | client_port | -1 backend_start | 2018-07-12 21:28:46.539052+02 xact_start | 2018-07-12 21:28:46.542203+02 query_start | 2018-07-12 21:28:46.542203+02 state_change | 2018-07-12 21:28:46.542209+02 wait_event_type | IO wait_event | DataFileWrite state | active backend_xid | 37554 backend_xmin | 37553 query | SELECT * FROM mypgio('pgio4', 50, 3000, 131072, 255, 8); backend_type | client backend
Here, I know what the user is doing: a call to mypgio() started at 21:28:46. And I know which resources are involved on the system: DataFileWrite. But again the most important is missing, the link between the user call and the system resources. And you can only guess it here because you know that a SELECT do not write to datafiles. There’s something hidden in the middle, which is actually an UPDATE. Of course, we can see this UPDATE in PG_STAT_STATEMENTS. But there, it will not be linked with the current activity, the mypgio() call, nor the DataFileWrite wait event. And we also need some timing information to be able to see the database load over the time.

Here is where the pgSentinel extension fills the gap, providing:

The actual query running, with the queryid which links to PG_STAT_STATEMENTS, but also the full text with all parameter values
multiple samples of the activity, with their timestamp information

Everything is there. The timeline where each sample links together the user call (top_level_query), the running query (queryid and query – which is the text with parameter values), and the wait event (wait_event_type and wait_event).

Then, what do we do with this? This is a fact table with many dimensions. And we can drill down on the database activity.

A quick overview of the load shows that I have, on average, 4 foreground sessions running for my user calls, and very low vacuuming activity:
postgres=# select backend_type postgres-# ,count(*)/(select count(distinct ash_time)::float from pg_active_session_history) as load postgres-# from pg_active_session_history postgres-# group by backend_type postgres-# ; backend_type | load -------------------+-------------------- client backend | 4.09720483938256 autovacuum worker | 0.07467667918231 (2 rows)

I’ll show in a future post how to query this view to drill down into the details. For the moment, here is a short explanation about the reason to go to a sampling approach.

Here is an abstract sequence diagram showing some typical user calls to the database. Several components are involved: CPU for the backed process, or for background processes, the OS, the storage… Our tuning goal is to reduce the user call duration. And then to reduce or optimize the work done in the different layers. With the current statistics available on PostgreSQL, like PG_STAT_ACTIVITY or PG_STAT_STATEMENTS, or available from the OS (strace to measure system call duration) we have a vertical approach on the load. We can look at each component individually:

This is basically what we did on Oracle before ASH (Active Session History) was introduced in 10g, 12 years ago. The activity sampling approach takes an orthogonal point of view. Rather than cumulating statistics for each components, it looks at what happens on the system at specific point in times, across all components. We don’t have all measures (such as how many execution of a query) but only samples. However, each sample gives a complete view from the user call down to the system calls. And 1 second samples are sufficient to address any relevant activity, without taking too much space for short retention. For each sample, we cover all layers end-to-end:

This horizontal approach makes the link between the user calls (the user perception of the database performance) and the system resources where we can analyze and optimize. With this, we can ensure that our tuning activity always focuses on the problem (the user response time) by addressing the root cause on the right component.

Cet article pgSentinel: the sampling approach for PostgreSQL est apparu en premier sur Blog dbi services.

↧

What a great event – PGDay Amsterdam

July 12, 2018, 10:14 pm

≫ Next: PGDay Amsterdam – follow up 1 – Adding columns with a default value and changing the default value right after

≪ Previous: pgSentinel: the sampling approach for PostgreSQL

PostgreSQL conferences are always cool and this time it was in Amsterdam: PGDay Amsterdam. Beside that meeting all the great people was fun again the location was really outstanding: The TOBACCO Theater:

Some impressions:

Here you can see Devrim preparing the opening of the event with the introduction session:

… and then it happened: We finally started:

Jan kicked of the sessions with his talk about the reasons he finally arrived in the PostgreSQL community after working years in another one:

Oleksi took over to speak about ACID, transactions and much more, a great talk:

I had the pleasure to speak about PostgreSQL 11 to close the first sessions before the coffee break:

Stefanie followed with foreign data wrappers and data integration with PostgreSQL (another great one):

And then there was something special: You might know Devrim has a real PostgreSQL tattoo and that was taken as an opportunity to offer temporary tattoos to everyone and that looked like this:

Hans rocked the stage right after:

Devrim right after his talk about WAL:

As in Rapperswil two weeks ago Bruce closed the sessions with his talk: Will PostgreSQL live forever:

There have been other session not mentioned here, which also have been great, but I didn’t ask if it as fine to publish the pictures. I could not attend the party after the event but I am sure that was great as well. See you next year. And never forget: PostgreSQL rocks .

Cet article What a great event – PGDay Amsterdam est apparu en premier sur Blog dbi services.

↧

PGDay Amsterdam – follow up 1 – Adding columns with a default value and changing the default value right after

July 12, 2018, 11:32 pm

≫ Next: PGDay Amsterdam – follow up 2 – Where do null values go to in a hash partitioned table?

≪ Previous: What a great event – PGDay Amsterdam

As always, this time during my talk about the PostgreSQL 11 new features in Amsterdam, there have been question I could not immediately answer. The first one was this: Suppose we add a column with a default value in PostgreSQL 11, what happens when we change that default afterwards? Does the table get rewritten? Do we have more than on distinct default value for that column? Here we go …

The sample table:

postgres=# select version();
                                                            version                                                            
-------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11beta1 build on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 (a,b) 
           select a.*, md5(a::text) 
             from generate_series(1,1000) a;
INSERT 0 1000

Lets add a new column with a default value:

postgres=# alter table t1 add column c text default 'aa';;
ALTER TABLE

This populates the two columns in pg_attribute as described in a previous post:

postgres=# select atthasmissing,attmissingval 
             from pg_attribute 
            where attrelid = 't1'::regclass and attname = 'c';
 atthasmissing | attmissingval 
---------------+---------------
 t             | {aa}
(1 row)

When we check for the distinct values in column “c” we should only see one result (which is “aa”):

postgres=# select c, count(*) from t1 group by c;
 c  | count 
----+-------
 aa |  1000
(1 row)

When I got the question right the concern was: When we change the default now do we see two results when we ask for the distinct values in column “c”? Of course not and the table is not rewritten:

postgres=# alter table t1 alter column c set default 'bb';
ALTER TABLE
postgres=# select c, count(*) from t1 group by c;
 c  | count 
----+-------
 aa |  1000
(1 row)

postgres=# select atthasmissing,attmissingval from pg_attribute where attrelid = 't1'::regclass and attname = 'c';
 atthasmissing | attmissingval 
---------------+---------------
 t             | {aa}
(1 row)

What does that mean? For the existing rows the value is still “aa” as that was true when the column was added. For new values we will get “bb”:

postgres=# \d t1
                  Table "public.t1"
 Column |  Type   | Collation | Nullable |  Default   
--------+---------+-----------+----------+------------
 a      | integer |           |          | 
 b      | text    |           |          | 
 c      | text    |           |          | 'bb'::text

postgres=# insert into t1 (a,b) values (1001,'aa');
INSERT 0 1
postgres=# select c, count(*) from t1 group by c;
 c  | count 
----+-------
 bb |     1
 aa |  1000
(2 rows)

I hope that answers the question. If not, please leave a comment.

Cet article PGDay Amsterdam – follow up 1 – Adding columns with a default value and changing the default value right after est apparu en premier sur Blog dbi services.

↧

PGDay Amsterdam – follow up 2 – Where do null values go to in a hash partitioned table?

July 13, 2018, 9:17 pm

≫ Next: Drilling down the pgSentinel Active Session History

≪ Previous: PGDay Amsterdam – follow up 1 – Adding columns with a default value and changing the default value right after

This is the second follow up which covers this question: When you hash partition a table in PostgreSQL 11 where do null values for the partitioned column go to? Lets go…

In the demo I used this little table:

postgres=# select version();
                                                            version                                                          
-----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bi
(1 row)
postgres=# create table part2 ( a int, list varchar(10) ) partition by hash (a);
CREATE TABLE
postgres=# create table part2_1 partition of part2 FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE
postgres=# create table part2_2 partition of part2 FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE
postgres=# create table part2_3 partition of part2 FOR VALUES WITH (MODULUS 3, REMAINDER 2);
CREATE TABLE
postgres=# \d+ part2
                                          Table "public.part2"
 Column |         Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer               |           |          |         | plain    |              | 
 list   | character varying(10) |           |          |         | extended |              | 
Partition key: HASH (a)
Partitions: part2_1 FOR VALUES WITH (modulus 3, remainder 0),
            part2_2 FOR VALUES WITH (modulus 3, remainder 1),
            part2_3 FOR VALUES WITH (modulus 3, remainder 2)

The data we played with was this:

postgres=# insert into part2 (a,list) values (1,'beer');
INSERT 0 1
postgres=# insert into part2 (a,list) values (2,'whine');
INSERT 0 1
postgres=# insert into part2 (a,list) values (3,'schnaps');
INSERT 0 1
postgres=# select * from only part2_1;
 a | list  
---+-------
 2 | whine
(1 row)

postgres=# select * from only part2_2;
 a |  list   
---+---------
 3 | schnaps
(1 row)

postgres=# select * from only part2_3;
 a | list 
---+------
 1 | beer
(1 row)

We have the data evenly distributed over the three partitions. When we insert a row which contains a NULL value for the column we partitioned on:

postgres=# insert into part2 (a,list) values (null,'cocktail');
INSERT 0 1

… where does that column go to?

postgres=# select * from only part2_1;
 a |   list   
---+----------
 2 | whine
   | cocktail
(2 rows)

postgres=# select * from only part2_2;
 a |  list   
---+---------
 3 | schnaps
(1 row)

postgres=# select * from only part2_3;
 a | list 
---+------
 1 | beer
(1 row)

It goes to the first partition and every following NULL key row gets there as well:

postgres=# insert into part2 (a,list) values (null,'rum');
INSERT 0 1
postgres=# select * from only part2_1;
 a |   list   
---+----------
 2 | whine
   | cocktail
   | rum
(3 rows)

I couldn’t find anything in the documentation about that so I did send a mail to the general mailing list and here is the answer: “The calculated hash value for the null value will be zero, therefore, it will fall to the partition having remainder zero.”

Cet article PGDay Amsterdam – follow up 2 – Where do null values go to in a hash partitioned table? est apparu en premier sur Blog dbi services.

↧

Drilling down the pgSentinel Active Session History

July 15, 2018, 11:35 am

≫ Next: Syncing Active Directory users and groups to PostgreSQL

≪ Previous: PGDay Amsterdam – follow up 2 – Where do null values go to in a hash partitioned table?

In pgSentinel: the sampling approach for PostgreSQL I mentioned that one of the advantages of the ASH approach is the ability to drill down from an overview of the database activity, down to the details where we can do some tuning. The idea is to always focus on the components which are relevant to our tuning goal:

Filter/Group by the relevant dimension to focus on what you want to tune (a program, session, query, time window,…)
Sort by the most active samples, to spend time only where you know you can improve significantly

The idea is to start at a high level. Here is a GROUP BY BACKEND_TYPE to show the activity of the ‘client backend’ and the ‘autovacuum worker':
select count(*), backend_type from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes' group by backend_type order by 1 desc ; count | backend_type -------+------------------- 1183 | client backend 89 | autovacuum worker

I selected only the last 5 minutes (the total retention is defined by pgsentinel_ash.max_entries and the sampling frequency by pgsentinel_ash.pull_frequency).

I ordered by the number of samples for each one, which gives a good idea of the proportion: most of the activity here for ‘client backend’. It may be more interesting to show a percentage, such as 93% activity is from the client and 7% is from the vacuum. However, this removes an interesting measure about the overall activity. The fact that we have 1183 samples within 5 minutes is an indication of the total load. In 5 minutes, we have 300 seconds, which means that each session can have 300 samples, when being 100% active in the database during that time. 1183 samples during 5 minutes mean that we have on average 1183/300 = 4 sessions active. This measure, calculated from the number of samples divided by the number of seconds, and known as Average Active Sessions (AAS) gives two different piece of information:

The overall activity in the database, similar to the load average at OS level
The relative activity of an aggregate (per session, program, event, time…)

AAS (Average Active Sessions)

In the previous post I counted the number of samples with count(distinct ash_time) because I knew that I had several sessions active during the whole time. But if there are periods of inactivity during those 5 minutes, there are no samples at all. And when drilling down to more detail, there will be some samples with no activity for a specific group. Here I calculate the number of seconds covered by the samples, using a window function:
with ash as ( select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes' ) select round(count(*)::numeric/samples,2) as "AAS", backend_type from ash group by samples, backend_type order by 1 desc fetch first 20 rows only ; AAS | backend_type -------+------------------- 3.95 | client backend 0.29 | autovacuum worker (2 rows)
From this output, I know that I have about 4 client sessions running. This is what I want to tune.

Drill down on wait events

Adding the WAIT_EVENT_TYPE to the GROUP BY, I can have more detail about the resources used by those sessions:
with ash as ( select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes' ) select round(count(*)::numeric/samples,2) as "AAS", backend_type,wait_event_type from ash group by samples, backend_type,wait_event_type order by 1 desc fetch first 20 rows only ; AAS | backend_type | wait_event_type -------+-------------------+----------------- 2.57 | client backend | IO 0.94 | client backend | CPU 0.45 | client backend | LWLock 0.16 | autovacuum worker | CPU 0.12 | autovacuum worker | IO 0.00 | autovacuum worker | LWLock (6 rows)
This gives a better idea about which system component may be tuned to reduce the response time or the throughput. IO is the major component here with 2.57 AAS being on an I/O call. Let’s get more information about which kind of I/O.

This gives more information. The average 2.57 sessions active on IO are actually writing for 1.52 of them, reading for 0.46 of them, and waiting for the datafile to be extended for 0.46 of them. That helps to focus on the areas where we might improve the performance, without wasting time on the events which are only a small part of the session activity.

Drill-down on queries

This was a drill-down on the system axis (wait events are system call instrumentation). This is useful when we think something is wrong on the system or the storage. But performance tuning must also drive the investigation on the application axis. The higher level is the user call, the TOP_LEVEL_QUERY:
with ash as ( select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes' ) select round(count(*)::numeric/samples,2) as "AAS", backend_type,top_level_query from ash group by samples, backend_type,top_level_query order by 1 desc fetch first 20 rows only ; AAS | backend_type | top_level_query -------+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.95 | client backend | SELECT * FROM mypgio('pgio3', 50, 3000, 131072, 255, 8); 0.95 | client backend | SELECT * FROM mypgio('pgio2', 50, 3000, 131072, 255, 8); 0.95 | client backend | SELECT * FROM mypgio('pgio4', 50, 3000, 131072, 255, 8); 0.95 | client backend | SELECT * FROM mypgio('pgio1', 50, 3000, 131072, 255, 8); 0.25 | autovacuum worker | autovacuum: VACUUM ANALYZE public.pgio2 0.02 | client backend | commit; 0.01 | client backend | select * from pg_active_session_history where pid=21837 order by ash_time desc fetch first 1 rows only; 0.01 | client backend | with ash as ( + | | select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples + | | from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes' + ...

Here I see 4 user calls responsible for most of the 4 active sessions related to the ‘client backend’, each one with AAS=0.95 and this is actually what is running: the PGIO benchmark (see https://kevinclosson.net/) with 4 sessions calling mypgio function.

The function we see in TOP_LEVEL_QUERY is itself running some queries, and the big advantage of the pgSentinel extension, over pg_stat_activity, is the capture of the actual statement running, with the actual values of the parameters:
with ash as ( select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes' ) select round(count(*)::numeric/samples,2) as "AAS", backend_type,substr(query,1,100) from ash group by samples, backend_type,substr(query,1,100) order by 1 desc fetch first 20 rows only ; AAS | backend_type | substr -------+-------------------+---------------------------------------------------------------------------------------- 0.26 | autovacuum worker | 0.02 | client backend | commit 0.02 | client backend | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 3567 AND 3822 0.01 | client backend | SELECT sum(scratch) FROM pgio4 WHERE mykey BETWEEN 5729 AND 5984 0.01 | client backend | SELECT sum(scratch) FROM pgio4 WHERE mykey BETWEEN 5245 AND 5500 0.01 | client backend | truncate table l_ash.ps 0.01 | client backend | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 3249 AND 3504 0.01 | client backend | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 57 AND 312 0.01 | client backend | UPDATE pgio4 SET scratch = scratch + 1 WHERE mykey BETWEEN 3712 AND 3720 0.01 | client backend | SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 1267 AND 1522 0.01 | client backend | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 703 AND 958 0.01 | client backend | SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 2025 AND 2280 0.01 | client backend | insert into l_ash.ps_diff + | | select ps1.pid,ps1.uname,ps1.pr,ps1.ni,ps1.virt,ps1.res,ps1.shr,ps1.s,ps1. 0.01 | client backend | UPDATE pgio4 SET scratch = scratch + 1 WHERE mykey BETWEEN 2690 AND 2698 0.01 | client backend | SELECT sum(scratch) FROM pgio3 WHERE mykey BETWEEN 5463 AND 5718 0.01 | client backend | SELECT sum(scratch) FROM pgio4 WHERE mykey BETWEEN 1467 AND 1722 0.01 | client backend | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 4653 AND 4908 (20 rows)

Here, no session is at the top. We have a few samples for each execution. This is because each execution is different (different values for the parameters) and they have a balanced execution time. If we had one query being longer with one specific set of parameter values, it would show up at the top here.

Finally, we can also aggregate at a higher level than QUERY with QUERYID which is per prepared statement and do not change when executing with different parameter values. If we want to get the text, then we can join with PG_STAT_STATEMENTS
with ash as ( select *,datid dbid,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes' ) select round(count(*)::numeric/samples,2) as "AAS",dbid, backend_type,queryid,pg_stat_statements.query from ash left outer join pg_stat_statements using(dbid,queryid) group by samples,dbid, backend_type,queryid,pg_stat_statements.query order by 1 desc fetch first 15 rows only ; AAS | dbid | backend_type | queryid | query -------+-------+----------------+------------+------------------------------------------------------------------------------------------------------ 0.89 | 17487 | client backend | 837728477 | SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 100926 AND 101181 0.70 | 17487 | client backend | 3411884874 | SELECT sum(scratch) FROM pgio4 WHERE mykey BETWEEN $1 AND $2 0.68 | 17487 | client backend | 1046864277 | SELECT sum(scratch) FROM pgio3 WHERE mykey BETWEEN 1591 AND 1846 0.67 | 17487 | client backend | 2994234299 | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN $1 AND $2 0.33 | 17487 | client backend | 1648177216 | UPDATE pgio1 SET scratch = scratch + 1 WHERE mykey BETWEEN 2582 AND 2590 0.32 | 17487 | client backend | 3381000939 | UPDATE pgio3 SET scratch = scratch + $1 WHERE mykey BETWEEN $2 AND $3 0.30 | 17487 | client backend | 1109524376 | UPDATE pgio4 SET scratch = scratch + 1 WHERE mykey BETWEEN 5462 AND 5470 0.11 | 17487 | client backend | 3355133240 | UPDATE pgio2 SET scratch = scratch + $1 WHERE mykey BETWEEN $2 AND $3 0.05 | 17547 | client backend | 2771355107 | update l_ash.parameters set value=now(),timestamp=now() where name=$1 0.05 | 17547 | client backend | 1235869898 | update l_ash.parameters set value=$1,timestamp=now() where name=$2 0.02 | 13806 | client backend | 935474258 | select * from pg_active_session_history where pid=$1 order by ash_time desc fetch first $2 rows only 0.01 | 13806 | client backend | 164740364 | with ash as ( +

This shows the main queries running: SELECT and UPDATE on the PGIO1,PGIO2,PGIO3,PGIO4. They run with different parameter values but have the same QUERYID. It seems that PG_STAT_STATEMENTS is not very consistent when capturing the query text: some show the parameter, some other show the values. But you must know that those are the prepared statements. We do not have 0.89 average sessions running the ‘SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 100926 AND 101181′. This is the ‘SELECT sum(scratch) FROM pgio2′ running with different parameter values and for whatever reasons, the PG_STAT_STATEMENTS extension displays one of the set of values rather than ‘BETWEEN $1 AND $2′.

Time dimension

Of course we can also query all samples and drill-down with a graphical tool. For the time axis, this is a better visualization. Here is a quick Excel PivotChart from those 5 minutes samples:

I always have 4 sessions running, as we have seen in the average, but the wait event detail is not uniform during the timeline. This is where you will drill down on the time axis. This can be helpful to investigate a short duration issue. Or to try to understand non-uniform response time. For example, coming from Oracle, I’m not used to this pattern where, from one second to the other, the wait profile is completely different. Probably because of all the background activity such as Vacuum, WAL, sync buffers to disk, garbage collection,… The workload here, PGIO, the SLOB method for PostgreSQL, is short uniform queries. It would be interesting to have some statistics about the response time variation.

Note that in this database cluster, in addition to the PGIO workload, I have a small application running and committing very small changes occasionally and this why you see the peaks with 1 session on WALWrite and 4 sessions waiting on WALWriteLock. This adds to the chaos of waits.

This extension providing active session sampling is only the first component of pgSentinel so do not spend too much time building queries, reports and graphs on this and let’s see when will come with pgSentinel:

pgSentinel is in progress….@postgresql @amplifypostgres @PostgreSQLFR @BertrandDrouvot @ckikof pic.twitter.com/Pwq8vB69MI

— pgSentinel (@Pg_Sentinel) July 11, 2018

Cet article Drilling down the pgSentinel Active Session History est apparu en premier sur Blog dbi services.

↧

Syncing Active Directory users and groups to PostgreSQL

July 23, 2018, 8:55 am

≫ Next: When does PostgreSQL create the table and index files on disk?

≪ Previous: Drilling down the pgSentinel Active Session History

A lot of companies use Active Directory to manage their users and groups. What most of this companies also want to do is to manage their database users and groups in Active Directory. PostgreSQL comes with ldap/kerberos authentication by default but does not provide anything that helps with managing users and groups in an external directory. And even for the authentication the user already needs to be existent in PostgreSQL. One tool you might want to have a look at and that helps with this requirement is pg-ldap-sync.

As usual I am using CentOS 7 for the scope of this post. For getting pg-ldap-sync onto the system PostgreSQL needs to be installed as pg_config is expected to be there. Once you have that several packages need to be installed (the openldap-clients is not required but it is handy to have it just in case you want to test some ldapsearch commands against Active Directory):

[root@pgadsync ~]$ yum install -y ruby rubygem-rake rubygems ruby-devel openldap-clients git

pg-ldap-sync can either be installed directly with ruby commands or you can install it from Git:

[postgres@pgadsync ~]$ git clone https://github.com/larskanis/pg-ldap-sync.git
[postgres@pgadsync ~]$ cd pg-ldap-sync
[postgres@pgadsync pg-ldap-sync]$ gem install bundler
[postgres@pgadsync pg-ldap-sync]$ bundle install
[postgres@pgadsync pg-ldap-sync]$ bundle exec rake install
[postgres@pgadsync pg-ldap-sync]$ which pg_ldap_sync 
~/bin/pg_ldap_sync
[postgres@pgadsync pg-ldap-sync]$ cd ..
[postgres@pgadsync ~]$ bin/pg_ldap_sync --help
Usage: bin/pg_ldap_sync [options]
    -v, --[no-]verbose               Increase verbose level
    -c, --config FILE                Config file [/etc/pg_ldap_sync.yaml]
    -t, --[no-]test                  Don't do any change in the database

And then, of course, you need something in the Active Directory for synchronization. In my test Active Directory I create a new “Organizational Unit” called “PostgreSQL”:

Inside this “Organizational Unit” there is a user which is used for authenticating against Active Directory:

Then we have two other “Organizational Units”, one for the PostgreSQL DBAs and one for the groups we’d like to sync:

There are three people in the pgadmins unit:

There is one group in the groups unit:

… and the group has two members:

This is what we want to synchronize to PostgreSQL. The final requirement is that two roles need to be there is PostgreSQL (you’ll notice later why that is important):

postgres@pgbox:/home/postgres/ [PG10] psql -X postgres
psql (10.3)
Type "help" for help.

postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of 
-----------+------------------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

postgres=# create role ldap_users;
CREATE ROLE
postgres=# create role ldap_groups;
CREATE ROLE
postgres=# \du
                                    List of roles
  Role name  |                         Attributes                         | Member of 
-------------+------------------------------------------------------------+-----------
 ldap_groups | Cannot login                                               | {}
 ldap_users  | Cannot login                                               | {}
 postgres    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

postgres=#

With pg-ldap-sync each instance you want to have synchronized needs a separate yaml file like this one:

# With this sample config the distinction between LDAP-synchronized
# groups/users from is done by the membership to ldap_user and
# ldap_group. These two roles has to be defined manally before
# pg_ldap_sync can run.

# Connection parameters to LDAP server
# see also: http://net-ldap.rubyforge.org/Net/LDAP.html#method-c-new
ldap_connection:
  host: 172.22.30.1
  port: 389
  auth:
    method: :simple
    username: CN=pgadsync,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
    password: xxxxx
#  encryption:
#    method: :simple_tls

# Search parameters for LDAP users which should be synchronized
ldap_users:
  base: OU=pgadmins,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
  # LDAP filter (according to RFC 2254)
  # defines to users in LDAP to be synchronized
#  filter: (&(objectClass=person)(objectClass=organizationalPerson)(givenName=*)(sn=*)(sAMAccountName=*))
  filter: (sAMAccountName=*)
  # this attribute is used as PG role name
  name_attribute: sAMAccountName
  # lowercase name for use as PG role name
  lowercase_name: true

# Search parameters for LDAP groups which should be synchronized
ldap_groups:
  base: OU=pggroups,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
  filter: (cn=dbas)
  # this attribute is used as PG role name
  name_attribute: cn
  # lowercase name for use as PG role name
  lowercase_name: false
  # this attribute must reference to all member DN's of the given group
  member_attribute: member

# Connection parameters to PostgreSQL server
# see also: http://rubydoc.info/gems/pg/PG/Connection#initialize-instance_method
pg_connection:
  host: 192.168.22.99
  dbname: postgres
  user: postgres
  password: postgres

pg_users:
  # Filter for identifying LDAP generated users in the database.
  # It's the WHERE-condition to "SELECT rolname, oid FROM pg_roles"
  filter: oid IN (SELECT pam.member FROM pg_auth_members pam JOIN pg_roles pr ON pr.oid=pam.roleid WHERE pr.rolname='ldap_users')
  # Options for CREATE RULE statements
  create_options: LOGIN IN ROLE ldap_users

pg_groups:
  # Filter for identifying LDAP generated groups in the database.
  # It's the WHERE-condition to "SELECT rolname, oid FROM pg_roles"
  filter: oid IN (SELECT pam.member FROM pg_auth_members pam JOIN pg_roles pr ON pr.oid=pam.roleid WHERE pr.rolname='ldap_groups')
  # Options for CREATE RULE statements
  create_options: NOLOGIN IN ROLE ldap_groups
#grant_options:

When you have a look at the “pg_users” and “pg_groups” you will notice why the two PostgreSQL roles created above are required. They are used to distinguish the users and groups coming from the directory and those created locally.

Ready to sync:

[postgres@pgadsync ~]$ bin/pg_ldap_sync -c etc/pg_ldap_sync.yaml -vv 
I, [2018-07-23T14:23:46.350588 #29270]  INFO -- : found user-dn: CN=dba1,OU=pgadmins,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
I, [2018-07-23T14:23:46.360073 #29270]  INFO -- : found user-dn: CN=dba2,OU=pgadmins,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
I, [2018-07-23T14:23:46.363133 #29270]  INFO -- : found user-dn: CN=dba3,OU=pgadmins,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
I, [2018-07-23T14:23:46.474105 #29270]  INFO -- : found group-dn: CN=dbas,OU=pggroups,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
I, [2018-07-23T14:23:46.517468 #29270]  INFO -- : user stat: create: 3 drop: 0 keep: 0
I, [2018-07-23T14:23:46.517798 #29270]  INFO -- : group stat: create: 1 drop: 0 keep: 0
I, [2018-07-23T14:23:46.518047 #29270]  INFO -- : membership stat: grant: 2 revoke: 0 keep: 0
I, [2018-07-23T14:23:46.518201 #29270]  INFO -- : SQL: CREATE ROLE "dba1" LOGIN IN ROLE ldap_users
I, [2018-07-23T14:23:46.522229 #29270]  INFO -- : SQL: CREATE ROLE "dba2" LOGIN IN ROLE ldap_users
I, [2018-07-23T14:23:46.525156 #29270]  INFO -- : SQL: CREATE ROLE "dba3" LOGIN IN ROLE ldap_users
I, [2018-07-23T14:23:46.528058 #29270]  INFO -- : SQL: CREATE ROLE "dbas" NOLOGIN IN ROLE ldap_groups
I, [2018-07-23T14:23:46.531065 #29270]  INFO -- : SQL: GRANT "dbas" TO "dba3","dba1"

… and that’s it. Users and groups are now available in PostgreSQL:

postgres=# \du
                                        List of roles
  Role name  |                         Attributes                         |     Member of     
-------------+------------------------------------------------------------+-------------------
 dba1        |                                                            | {ldap_users,dbas}
 dba2        |                                                            | {ldap_users}
 dba3        |                                                            | {ldap_users,dbas}
 dbas        | Cannot login                                               | {ldap_groups}
 ldap_groups | Cannot login                                               | {}
 ldap_users  | Cannot login                                               | {}
 postgres    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

When you add anther user to the directory:

… and run the sync again all remaining users will of course not be touched but the new one gets created (notice that I copied the dba4 in the directory, this is why the user is member of the dbas group):

[postgres@pgadsync ~]$ bin/pg_ldap_sync -c etc/pg_ldap_sync.yaml -vv 
I, [2018-07-23T14:27:26.314729 #29273]  INFO -- : found user-dn: CN=dba1,OU=pgadmins,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
I, [2018-07-23T14:27:26.323719 #29273]  INFO -- : found user-dn: CN=dba2,OU=pgadmins,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
I, [2018-07-23T14:27:26.326764 #29273]  INFO -- : found user-dn: CN=dba3,OU=pgadmins,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
I, [2018-07-23T14:27:26.328800 #29273]  INFO -- : found user-dn: CN=dba4,OU=pgadmins,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
I, [2018-07-23T14:27:26.394066 #29273]  INFO -- : found group-dn: CN=dbas,OU=pggroups,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com
I, [2018-07-23T14:27:26.434236 #29273]  INFO -- : found pg-user: "dba1"
I, [2018-07-23T14:27:26.434443 #29273]  INFO -- : found pg-user: "dba2"
I, [2018-07-23T14:27:26.434531 #29273]  INFO -- : found pg-user: "dba3"
I, [2018-07-23T14:27:26.439065 #29273]  INFO -- : found pg-group: "dbas" with members: ["dba3", "dba1"]
I, [2018-07-23T14:27:26.439357 #29273]  INFO -- : user stat: create: 1 drop: 0 keep: 3
I, [2018-07-23T14:27:26.439468 #29273]  INFO -- : group stat: create: 0 drop: 0 keep: 1
I, [2018-07-23T14:27:26.439656 #29273]  INFO -- : membership stat: grant: 1 revoke: 0 keep: 2
I, [2018-07-23T14:27:26.439759 #29273]  INFO -- : SQL: CREATE ROLE "dba4" LOGIN IN ROLE ldap_users
I, [2018-07-23T14:27:26.441692 #29273]  INFO -- : SQL: GRANT "dbas" TO "dba4"

To more tips: When you want the complete ldap path for a user can do it like this:

It is advisable to test the filters you have in the yaml like:

[postgres@pgadsync ~]$ ldapsearch -x -h 172.22.30.1 -D "pgadsync@test.dbiservices.com" -W "(sAMAccountName=*)" -b "OU=pgadmins,OU=PostgreSQL,DC=test,DC=dbiservices,DC=com"  | grep sAMAccountName
Enter LDAP Password: 
# filter: (sAMAccountName=*)
sAMAccountName: dba1
sAMAccountName: dba2
sAMAccountName: dba3
sAMAccountName: dba4

You might wonder how you can assign the permissions then. Just pre-create the role and give the permissions you want:

postgres=# drop role dbas;
DROP ROLE
postgres=# create role dbas in role ldap_groups;
CREATE ROLE
postgres=# grant CONNECT ON DATABASE postgres to dbas;
GRANT

The assignments to that group will come from the directory once you run the next synchronization.

Hope that helps …

Cet article Syncing Active Directory users and groups to PostgreSQL est apparu en premier sur Blog dbi services.

↧

When does PostgreSQL create the table and index files on disk?

August 5, 2018, 5:26 am

≫ Next: Backing up and restoring EDB containers in MiniShift/OpenShift

≪ Previous: Syncing Active Directory users and groups to PostgreSQL

A question that pops up from time to time is: When we create a table or an index in PostgreSQL are the files on disk created immediately or is this something that happens when the first row is inserted? The question mostly is coming from Oracle DBAs because in Oracle you can have deferred segment creation. In PostgreSQL there is no parameter for that so lets do a quick test.

We start with a simple table:

postgres=# create table t1 ( a int );
CREATE TABLE

To get the real file name we can either use the pg_relation_filepath function:

postgres=# select pg_relation_filepath('t1');
 pg_relation_filepath 
----------------------
 base/33845/33933
(1 row)

… or we can use the oid2name utility:

postgres@pgbox:/home/postgres/ [PG10] oid2name -d postgres -t t1
From database "postgres":
  Filenode  Table Name
----------------------
     33933          t1

Now we can easily check if that file is already existing:

postgres@pgbox:/home/postgres/ [PG10] ls -la $PGDATA/base/33845/33933
-rw-------. 1 postgres postgres 0 Jul 24 07:47 /u02/pgdata/10/PG103/base/33845/33933

It is already there but empty. The files for the visibility map and the free space map are not yet created:

postgres@pgbox:/home/postgres/ [PG10] ls -la $PGDATA/base/33845/33933*
-rw-------. 1 postgres postgres 0 Jul 24 07:47 /u02/pgdata/10/PG103/base/33845/33933

What happens when we create an index on that empty table?

postgres=# create index i1 on t1 (a);
CREATE INDEX
postgres=# select pg_relation_filepath('i1');
 pg_relation_filepath 
----------------------
 base/33845/33937
(1 row)
postgres=# \! ls -la $PGDATA/base/33845/33937
-rw-------. 1 postgres postgres 8192 Jul 24 08:06 /u02/pgdata/10/PG103/base/33845/33937

The file is created immediately as well but it is not empty. It is exactly one page (my blocksize is 8k). Using the pageinspect extension we can confirm that this page is just for metadata information:

postgres=# create extension pageinspect;
CREATE EXTENSION
postgres=# SELECT * FROM bt_metap('i1');
 magic  | version | root | level | fastroot | fastlevel 
--------+---------+------+-------+----------+-----------
 340322 |       2 |    0 |     0 |        0 |         0
(1 row)
postgres=# SELECT * FROM bt_page_stats('i1', 0);
ERROR:  block 0 is a meta page
postgres=#

The remaining questions is: When will the free space map and the visibility map be created? After or with the first insert?

postgres=# insert into t1 (a) values (1);
INSERT 0 1
postgres=# \! ls -la $PGDATA/base/33845/33933*
-rw-------. 1 postgres postgres 8192 Jul 24 08:19 /u02/pgdata/10/PG103/base/33845/33933

Definitely not. The answer is: vacuum:

postgres=# vacuum t1;
VACUUM
postgres=# \! ls -la $PGDATA/base/33845/33933*
-rw-------. 1 postgres postgres  8192 Jul 24 08:19 /u02/pgdata/10/PG103/base/33845/33933
-rw-------. 1 postgres postgres 24576 Jul 24 08:22 /u02/pgdata/10/PG103/base/33845/33933_fsm
-rw-------. 1 postgres postgres  8192 Jul 24 08:22 /u02/pgdata/10/PG103/base/33845/33933_vm

Hope that helps …

Cet article When does PostgreSQL create the table and index files on disk? est apparu en premier sur Blog dbi services.

↧

Backing up and restoring EDB containers in MiniShift/OpenShift

August 8, 2018, 7:38 am

≫ Next: Bringing up your customized PostgreSQL instance on Azure

≪ Previous: When does PostgreSQL create the table and index files on disk?

The last blogs in the series are already some days old: Setting up MiniShift, Deploying EDB containers in MiniShift/OpenShift, Customizing PostgreSQL parameters in EDB containers in MiniShift/OpenShift, Scaling the EDB containers in MiniShift/OpenShift, EDB Failover Manager in EDB containers in Minishift/OpenShift and EDB Failover Manager in EDB containers in Minishift/OpenShift – Failovers. What is missing is how you can backup and restore instances running in this container deployment and that is the topic of this post.

What you usually use to backup and restore EDB Postgres is BART and the container world is no exception to that. Lets see how that works.

My current deployment looks like this:

Two pgpool containers are serving three database containers which you can also check on the command line:

dwe@dwe:~$ oc get pods -o wide -L role
NAME                 READY     STATUS    RESTARTS   AGE       IP           NODE        ROLE
edb-as10-0-1-b8lvj   1/1       Running   0          3m        172.17.0.9   localhost   masterdb
edb-as10-0-1-gj76h   1/1       Running   0          1m        172.17.0.5   localhost   standbydb
edb-as10-0-1-sb5lt   1/1       Running   0          2m        172.17.0.4   localhost   standbydb
edb-pgpool-1-qzk5v   1/1       Running   0          3m        172.17.0.7   localhost   queryrouter
edb-pgpool-1-rvtl6   1/1       Running   0          3m        172.17.0.6   localhost   queryrouter

What we want to do is to backup the database instances or at least one of them. What you need to prepare before deploying the BART container is shared storage between the databases containers and the BART container. The is especially important for the restore case as the restore procedure needs to access the backup which is hosted in the BART container. Notice that this storage configuration has the “Read-Write-Many” attributes:

When I initially deployed the database containers I provided exactly these storage claim and volume as a parameter so I have that available in the database containers:

This means in any of the database containers I will be able to see the backup volume:

dwe@dwe:~$ oc rsh edb-as10-0-1-b8lvj
sh-4.2$ ls -la /edbbackup/
total 12
drwxrwx---  3 root    root 4096 Aug  6 11:49 .
drwxr-xr-x 86 root    root 4096 Aug  8 14:03 ..
drwxrwxr-x  4 edbuser root 4096 Aug  6 11:49 edb-bart-1-89k7s
sh-4.2$ ls -la /edbbackup/edb-bart-1-89k7s/
total 16
drwxrwxr-x 4 edbuser root 4096 Aug  6 11:49 .
drwxrwx--- 3 root    root 4096 Aug  6 11:49 ..
drwxrwxr-x 2 edbuser root 4096 Aug  6 11:49 bart_log
drwxrwxr-x 3 edbuser root 4096 Aug  6 11:49 pgbackup
sh-4.2$ ls -la /edbbackup/edb-bart-1-89k7s/pgbackup/
total 12
drwxrwxr-x 3 edbuser root 4096 Aug  6 11:49 .dwe@dwe:~$ oc rsh edb-as10-0-1-b8lvj
sh-4.2$ ls -la /edbbackup/
total 12
drwxrwx---  3 root    root 4096 Aug  6 11:49 .
drwxr-xr-x 86 root    root 4096 Aug  8 14:03 ..
drwxrwxr-x  4 edbuser root 4096 Aug  6 11:49 edb-bart-1-89k7s
sh-4.2$ ls -la /edbbackup/edb-bart-1-89k7s/
total 16
drwxrwxr-x 4 edbuser root 4096 Aug  6 11:49 .
drwxrwx--- 3 root    root 4096 Aug  6 11:49 ..
drwxrwxr-x 2 edbuser root 4096 Aug  6 11:49 bart_log
drwxrwxr-x 3 edbuser root 4096 Aug  6 11:49 pgbackup
sh-4.2$ ls -la /edbbackup/edb-bart-1-89k7s/pgbackup/
total 12
drwxrwxr-x 3 edbuser root 4096 Aug  6 11:49 .
drwxrwxr-x 4 edbuser root 4096 Aug  6 11:49 ..
drwxr-xr-x 4 edbuser root 4096 Aug  6 11:52 edb
sh-4.2$ ls -la /edbbackup/edb-bart-1-89k7s/pgbackup/edb/
total 16
drwxr-xr-x 4 edbuser root 4096 Aug  6 11:52 .
drwxrwxr-x 3 edbuser root 4096 Aug  6 11:49 ..
drwxr-xr-x 2 edbuser root 4096 Aug  6 11:52 1533556356576
drwxr-xr-x 2 edbuser root 4096 Aug  6 11:49 archived_wals
drwxrwxr-x 4 edbuser root 4096 Aug  6 11:49 ..
drwxr-xr-x 4 edbuser root 4096 Aug  6 11:52 edb
sh-4.2$ ls -la /edbbackup/edb-bart-1-89k7s/pgbackup/edb/
total 16
drwxr-xr-x 4 edbuser root 4096 Aug  6 11:52 .
drwxrwxr-x 3 edbuser root 4096 Aug  6 11:49 ..
drwxr-xr-x 2 edbuser root 4096 Aug  6 11:52 1533556356576
drwxr-xr-x 2 edbuser root 4096 Aug  6 11:49 archived_wals

The same storage configuration then needs to be provided to the BART deployment. Here is the yaml file for the BART deployment:

apiVersion: v1
kind: Template
metadata:
   name: edb-as10-custom
   annotations:
    description: "Custom EDB Postgres Advanced Server 10.0 Deployment Config"
    tags: "database,epas,postgres,postgresql"
    iconClass: "icon-postgresql"
objects:
- apiVersion: v1 
  kind: Service
  metadata:
    name: ${DATABASE_NAME}-service 
    labels:
      role: loadbalancer
      cluster: ${DATABASE_NAME}
  spec:
    selector:                  
      lb: ${DATABASE_NAME}-pgpool
    ports:
    - name: lb 
      port: ${PGPORT}
      targetPort: 9999
    sessionAffinity: None
    type: LoadBalancer
- apiVersion: v1 
  kind: DeploymentConfig
  metadata:
    name: ${DATABASE_NAME}-pgpool
  spec:
    replicas: 2
    selector:
      lb: ${DATABASE_NAME}-pgpool
    strategy:
      resources: {}
      rollingParams:
        intervalSeconds: 1
        maxSurge: 25%
        maxUnavailable: 25%
        timeoutSeconds: 600
        updatePeriodSeconds: 1
      type: Rolling
    template:
      metadata:
        labels:
          lb: ${DATABASE_NAME}-pgpool
          role: queryrouter
          cluster: ${DATABASE_NAME}
      spec:
        containers:
        - name: edb-pgpool
          env:
          - name: DATABASE_NAME
            value: ${DATABASE_NAME} 
          - name: PGPORT
            value: ${PGPORT} 
          - name: REPL_USER
            value: ${REPL_USER} 
          - name: ENTERPRISEDB_PASSWORD
            value: 'postgres'
          - name: REPL_PASSWORD
            value: 'postgres'
          - name: ACCEPT_EULA
            value: ${ACCEPT_EULA}
          image: localhost:5000/edb/edb-pgpool:v3.5
          imagePullPolicy: IfNotPresent
          readinessProbe:
            exec:
              command:
              - /var/lib/edb/testIsReady.sh
            initialDelaySeconds: 60
            timeoutSeconds: 5
    triggers:
    - type: ConfigChange
- apiVersion: v1
  kind: DeploymentConfig
  metadata:
    name: ${DATABASE_NAME}-as10-0
  spec:
    replicas: 1
    selector:
      db: ${DATABASE_NAME}-as10-0 
    strategy:
      resources: {}
      rollingParams:
        intervalSeconds: 1
        maxSurge: 25%
        maxUnavailable: 25%
        timeoutSeconds: 600
        updatePeriodSeconds: 1
      type: Rolling
    template:
      metadata:
        creationTimestamp: null
        labels:
          db: ${DATABASE_NAME}-as10-0 
          cluster: ${DATABASE_NAME}
      spec:
        containers:
        - name: edb-as10 
          env:
          - name: DATABASE_NAME 
            value: ${DATABASE_NAME} 
          - name: DATABASE_USER 
            value: ${DATABASE_USER} 
          - name: DATABASE_USER_PASSWORD
            value: 'postgres'
          - name: ENTERPRISEDB_PASSWORD
            value: 'postgres'
          - name: REPL_USER
            value: ${REPL_USER} 
          - name: REPL_PASSWORD
            value: 'postgres'
          - name: PGPORT
            value: ${PGPORT} 
          - name: RESTORE_FILE
            value: ${RESTORE_FILE} 
          - name: LOCALEPARAMETER
            value: ${LOCALEPARAMETER}
          - name: CLEANUP_SCHEDULE
            value: ${CLEANUP_SCHEDULE}
          - name: EFM_EMAIL
            value: ${EFM_EMAIL}
          - name: NAMESERVER
            value: ${NAMESERVER}
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: POD_NODE
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName 
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP 
          - name: ACCEPT_EULA
            value: ${ACCEPT_EULA}
          image: localhost:5000/edb/edb-as:v10.3
          imagePullPolicy: IfNotPresent 
          readinessProbe:
            exec:
              command:
              - /var/lib/edb/testIsReady.sh
            initialDelaySeconds: 60
            timeoutSeconds: 5 
          livenessProbe:
            exec:
              command:
              - /var/lib/edb/testIsHealthy.sh
            initialDelaySeconds: 600 
            timeoutSeconds: 60 
          ports:
          - containerPort: ${PGPORT} 
          volumeMounts:
          - name: ${PERSISTENT_VOLUME}
            mountPath: /edbvolume
          - name: ${BACKUP_PERSISTENT_VOLUME}
            mountPath: /edbbackup
          - name: pg-initconf
            mountPath: /initconf
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        volumes:
        - name: ${PERSISTENT_VOLUME}
          persistentVolumeClaim:
            claimName: ${PERSISTENT_VOLUME_CLAIM}
        - name: ${BACKUP_PERSISTENT_VOLUME}
          persistentVolumeClaim:
            claimName: ${BACKUP_PERSISTENT_VOLUME_CLAIM}
        - name: pg-initconf
          configMap:
            name: postgres-map
              
    triggers:
    - type: ConfigChange
parameters:
- name: DATABASE_NAME
  displayName: Database Name
  description: Name of Postgres database (leave edb for default)
  value: 'edb'
- name: DATABASE_USER
  displayName: Default database user (leave enterprisedb for default)
  description: Default database user
  value: 'enterprisedb'
- name: REPL_USER
  displayName: Repl user
  description: repl database user
  value: 'repl'
- name: PGPORT
  displayName: Database Port
  description: Database Port (leave 5444 for default)
  value: "5444"
- name: LOCALEPARAMETER
  displayName: Locale
  description: Locale of database
  value: ''
- name: CLEANUP_SCHEDULE
  displayName: Host Cleanup Schedule
  description: Standard cron schedule - min (0 - 59), hour (0 - 23), day of month (1 - 31), month (1 - 12), day of week (0 - 6) (0 to 6 are Sunday to Saturday, or use names; 7 is Sunday, the same as 0). Leave it empty if you dont want to cleanup.
  value: '0:0:*:*:*'
- name: EFM_EMAIL
  displayName: Email
  description: Email for EFM
  value: 'none@none.com'
- name: NAMESERVER
  displayName: Name Server for Email
  description: Name Server for Email
  value: '8.8.8.8'
- name: PERSISTENT_VOLUME
  displayName: Persistent Volume
  description: Persistent volume name
  value: ''
  required: true
- name: PERSISTENT_VOLUME_CLAIM 
  displayName: Persistent Volume Claim
  description: Persistent volume claim name
  value: ''
  required: true
- name: BACKUP_PERSISTENT_VOLUME
  displayName: Backup Persistent Volume
  description: Backup Persistent volume name
  value: ''
  required: false
- name: BACKUP_PERSISTENT_VOLUME_CLAIM
  displayName: Backup Persistent Volume Claim
  description: Backup Persistent volume claim name
  value: ''
  required: false
- name: RESTORE_FILE
  displayName: Restore File
  description: Restore file location
  value: ''
- name: ACCEPT_EULA
  displayName: Accept end-user license agreement (leave 'Yes' for default)
  description: Indicates whether user accepts the end-user license agreement
  value: 'Yes'
  required: true

Once that is imported we can deploy the BART container:

Notice that this is actually the same storage configuration as it was used to setup the database containers.

What I didn’t tell you is that you need to do another step before. As the BART container is supposed to backup all the instances in a project we need to pass the BART configuration file to the container via a configMap. In this setup I only have one instance so the configMap would look like this:

Here you would add all the instances you’d need to backup per project. Once the BART container is ready:

dwe@dwe:~$ oc get pods
NAME                 READY     STATUS    RESTARTS   AGE
edb-as10-0-1-b8lvj   1/1       Running   0          17m
edb-as10-0-1-gj76h   1/1       Running   0          14m
edb-as10-0-1-sb5lt   1/1       Running   0          16m
edb-bart-1-7cgfv     1/1       Running   0          19s
edb-pgpool-1-qzk5v   1/1       Running   0          17m
edb-pgpool-1-rvtl6   1/1       Running   0          17m

… you can connect to it and perform a manual backup:

dwe@dwe:~$ oc rsh edb-bart-1-7cgfv
sh-4.2$ bart backup -s edb
INFO:  creating backup for server 'edb'
INFO:  backup identifier: '1533738106320'
65043/65043 kB (100%), 1/1 tablespace

INFO:  backup completed successfully
INFO:  backup checksum: 16fba63925ac3e77d474a36496c2a902 of base.tar
INFO:  
BACKUP DETAILS:
BACKUP STATUS: active
BACKUP IDENTIFIER: 1533738106320
BACKUP NAME: none
BACKUP PARENT: none
BACKUP LOCATION: /edbbackup/edb-bart-1-7cgfv/pgbackup/edb/1533738106320
BACKUP SIZE: 63.52 MB
BACKUP FORMAT: tar
BACKUP TIMEZONE: UTC
XLOG METHOD: fetch
BACKUP CHECKSUM(s): 1
 ChkSum                             File      
 16fba63925ac3e77d474a36496c2a902   base.tar  

TABLESPACE(s): 0
START WAL LOCATION: 000000010000000000000008
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2018-08-08 14:21:46 UTC
STOP TIME: 2018-08-08 14:21:47 UTC
TOTAL DURATION: 1 sec(s)

This backup is now available on the BART container but in addition it is accessible in the database container:

dwe@dwe:~$ oc rsh edb-as10-0-1-b8lvj
sh-4.2$ ls -la /edbbackup/edb-bart-1-7cgfv/pgbackup/edb/1533738106320/
total 65060
drwxr-xr-x 2 edbuser root     4096 Aug  8 14:21 .
drwxr-xr-x 4 edbuser root     4096 Aug  8 14:21 ..
-rwxr-xr-x 1 edbuser root      664 Aug  8 14:21 backupinfo
-rwxr-xr-x 1 edbuser root 66605568 Aug  8 14:21 base.tar

In case you’d need to restore that you would deploy a new database configuration specifying this backup as the “Restore file”:

One downside with the current versions of the containers: You can not do point in time recovery. Only restores from full backups are supported until now. This will change in the next release, though.

Have fun with the containers …

Cet article Backing up and restoring EDB containers in MiniShift/OpenShift est apparu en premier sur Blog dbi services.

↧

Bringing up your customized PostgreSQL instance on Azure

August 13, 2018, 4:53 am

≫ Next: Using the managed PostgreSQL service in Azure

≪ Previous: Backing up and restoring EDB containers in MiniShift/OpenShift

The Azure cloud becomes more and more popular so I gave it try and started simple. The goal was to provision a VM, compiling and installing PostgreSQL and then connecting to the instance. There is also a fully managed PostgreSQL service but I wanted to do it on my own just to get a feeling about the command line tools. Here is how I’ve done it.

Obviously you need to login which is just a matter of this:

dwe@dwe:~$ cd /var/tmp
dwe@dwe:/var/tmp$ az login

For doing anything in Azure you’ll need to create a resource group which is like container holding your resources. As a resource group needs to be created in a specific location the next step is to get a list of those:

dwe@dwe:/var/tmp$ az account list-locations
[
  {
    "displayName": "East Asia",
    "id": "/subscriptions/030698d5-42d6-41a1-8740-355649c409e7/locations/eastasia",
    "latitude": "22.267",
    "longitude": "114.188",
    "name": "eastasia",
    "subscriptionId": null
  },
  {
    "displayName": "Southeast Asia",
    "id": "/subscriptions/030698d5-42d6-41a1-8740-355649c409e7/locations/southeastasia",
    "latitude": "1.283",
    "longitude": "103.833",
    "name": "southeastasia",
    "subscriptionId": null
  },
...

Once you have selected a location the resource group can be created:

dwe@dwe:/var/tmp$ az group create --name PGTEST --location "westeurope"
{
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST",
  "location": "westeurope",
  "managedBy": null,
  "name": "PGTEST",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null
}

All you need to do for creating a CentOS VM is this simple command:

dwe@dwe:/var/tmp$ az vm create -n MyPg -g PGTEST --image centos --data-disk-sizes-gb 10 --size Standard_DS2_v2 --generate-ssh-keys
{
  "fqdns": "",
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.Compute/virtualMachines/MyPg",
  "location": "westeurope",
  "macAddress": "xx-xx-xx-xx-xx-xx",
  "powerState": "VM running",
  "privateIpAddress": "x.x.x.x",
  "publicIpAddress": "x.x.x.x",
  "resourceGroup": "PGTEST",
  "zones": ""
}

While the VM is getting created you can watch the resources appearing in the portal:

As soon as the VM is ready connecting via ssh is possible (the keys have automatically been added, no password required):

dwe@dwe:/var/tmp$ ssh x.x.x.x
The authenticity of host 'xx.xx.x.x (xx.xx.x.x)' can't be established.
ECDSA key fingerprint is SHA256:YzNOzg30JH0A3U1R+6WzuJEd3+7N4GmwpSVkznhuTuE.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'xx.xx.x.x' (ECDSA) to the list of known hosts.
[dwe@MyPg ~]$ ls -la /etc/yum.repos.d/
total 44
drwxr-xr-x.  2 root root  209 Sep 25  2017 .
drwxr-xr-x. 86 root root 8192 Aug  2 08:05 ..
-rw-r--r--.  1 root root 1706 Sep 25  2017 CentOS-Base.repo
-rw-r--r--.  1 root root 1309 Nov 29  2016 CentOS-CR.repo
-rw-r--r--.  1 root root  649 Nov 29  2016 CentOS-Debuginfo.repo
-rw-r--r--.  1 root root  314 Nov 29  2016 CentOS-fasttrack.repo
-rw-r--r--.  1 root root  630 Nov 29  2016 CentOS-Media.repo
-rw-r--r--.  1 root root 1331 Nov 29  2016 CentOS-Sources.repo
-rw-r--r--.  1 root root 2893 Nov 29  2016 CentOS-Vault.repo
-rw-r--r--.  1 root root  282 Sep 25  2017 OpenLogic.repo
[dwe@MyPg ~]$ sudo su -
[root@MyPg ~]# cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core) 
[root@MyPg ~]#

Of course we want to update all the operating system packages to the latest release before moving on. Be careful here to really exclude the WALinuxAgent because otherwise the agent will be upgraded as well (and restarted) and the script execution will fail as you lose connectivity:

dwe@dwe:/var/tmp$ az vm extension set --publisher Microsoft.Azure.Extensions --version 2.0 --name CustomScript --vm-name MyPg --resource-group PGTEST --settings '{"commandToExecute":"yum update -y --exclude=WALinuxAgent"}'
{
  "autoUpgradeMinorVersion": true,
  "forceUpdateTag": null,
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.Compute/virtualMachines/MyPg/extensions/CustomScript",
  "instanceView": null,
  "location": "westeurope",
  "name": "CustomScript",
  "protectedSettings": null,
  "provisioningState": "Succeeded",
  "publisher": "Microsoft.Azure.Extensions",
  "resourceGroup": "PGTEST",
  "settings": {
    "commandToExecute": "yum update -y --exclude=WALinuxAgent"
  },
  "tags": null,
  "type": "Microsoft.Compute/virtualMachines/extensions",
  "typeHandlerVersion": "2.0",
  "virtualMachineExtensionType": "CustomScript"
}

When we want to compile PostgreSQL we need some packages for that, so (not all of them required for compiling PostgreSQL but this is what we usually install):

dwe@dwe:/var/tmp$ az vm extension set --publisher Microsoft.Azure.Extensions --version 2.0 --name CustomScript --vm-name MyPg --resource-group PGTEST --settings '{"commandToExecute":"yum install -y gcc openldap-devel python-devel readline-devel redhat-lsb bison flex perl-ExtUtils-Embed zlib-devel crypto-utils openssl-devel pam-devel libxml2-devel libxslt-devel openssh-clients bzip2 net-tools wget screen unzip sysstat xorg-x11-xauth systemd-devel bash-completion"}'

{
  "autoUpgradeMinorVersion": true,
  "forceUpdateTag": null,
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.Compute/virtualMachines/MyPg/extensions/CustomScript",
  "instanceView": null,
  "location": "westeurope",
  "name": "CustomScript",
  "protectedSettings": null,
  "provisioningState": "Succeeded",
  "publisher": "Microsoft.Azure.Extensions",
  "resourceGroup": "PGTEST",
  "settings": {
    "commandToExecute": "yum install -y gcc openldap-devel python-devel readline-devel redhat-lsb bison flex perl-ExtUtils-Embed zlib-devel crypto-utils openssl-devel pam-devel libxml2-devel libxslt-devel openssh-clients bzip2 net-tools wget screen unzip sysstat xorg-x11-xauth systemd-devel bash-completion"
  },
  "tags": null,
  "type": "Microsoft.Compute/virtualMachines/extensions",
  "typeHandlerVersion": "2.0",
  "virtualMachineExtensionType": "CustomScript"
}

Preparation work for the user, group and directories:

dwe@dwe:~$ az vm extension set --publisher Microsoft.Azure.Extensions --version 2.0 --name CustomScript --vm-name MyPg --resource-group PGTEST --settings '{"commandToExecute":"groupadd postgres; useradd -m -g postgres postgres; mkdir -p /u01/app; chown postgres:postgres /u01/app; mkdir -p /u02/pgdata; chown postgres:postgres /u02/pgdata"}'
{
  "autoUpgradeMinorVersion": true,
  "forceUpdateTag": null,
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.Compute/virtualMachines/MyPg/extensions/CustomScript",
  "instanceView": null,
  "location": "westeurope",
  "name": "CustomScript",
  "protectedSettings": null,
  "provisioningState": "Succeeded",
  "publisher": "Microsoft.Azure.Extensions",
  "resourceGroup": "PGTEST",
  "settings": {
    "commandToExecute": "groupadd postgres; useradd -m -g postgres postgres; mkdir -p /u01/app; chown postgres:postgres /u01/app; mkdir -p /u02/pgdata; chown postgres:postgres /u02/pgdata"
  },
  "tags": null,
  "type": "Microsoft.Compute/virtualMachines/extensions",
  "typeHandlerVersion": "2.0",
  "virtualMachineExtensionType": "CustomScript"
}

For the next steps we will just copy over this script and then execute it:

dwe@dwe:~$ cat installPG.sh 
#!/bin/bash
cd /u01/app; wget https://ftp.postgresql.org/pub/source/v10.5/postgresql-10.5.tar.bz2
tar -axf postgresql-10.5.tar.bz2
rm -f postgresql-10.5.tar.bz2
cd postgresql-10.5
PGHOME=/u01/app/postgres/product/10/db_5/
SEGSIZE=2
BLOCKSIZE=8
WALSEGSIZE=16
./configure --prefix=${PGHOME} \
            --exec-prefix=${PGHOME} \
            --bindir=${PGHOME}/bin \
            --libdir=${PGHOME}/lib \
            --sysconfdir=${PGHOME}/etc \
            --includedir=${PGHOME}/include \
            --datarootdir=${PGHOME}/share \
            --datadir=${PGHOME}/share \
            --with-pgport=5432 \
            --with-perl \
            --with-python \
            --with-openssl \
            --with-pam \
            --with-ldap \
            --with-libxml \
            --with-libxslt \
            --with-segsize=${SEGSIZE} \
            --with-blocksize=${BLOCKSIZE} \
            --with-wal-segsize=${WALSEGSIZE}  \
	    --with-systemd 
make -j 4 all
make install
cd contrib
make -j 4 install

dwe@dwe:~$ scp installPG.sh x.x.x.x:/var/tmp/
installPG.sh                                                                                                100% 1111     1.1KB/s   00:00

Of course you could also add the yum commands to the same script but I wanted to show both ways. Using the CustomScript feature and copying over a script for execution. Lets execute that:

dwe@dwe:~$ az vm extension set --publisher Microsoft.Azure.Extensions --version 2.0 --name CustomScript --vm-name MyPg --resource-group PGTEST --settings '{"commandToExecute":"chmod +x /var/tmp/installPG.sh; sudo su - postgres -c /var/tmp/installPG.sh"}'

  "autoUpgradeMinorVersion": true,
  "forceUpdateTag": null,
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.Compute/virtualMachines/MyPg/extensions/CustomScript",
  "instanceView": null,
  "location": "westeurope",
  "name": "CustomScript",
  "protectedSettings": null,
  "provisioningState": "Succeeded",
  "publisher": "Microsoft.Azure.Extensions",
  "resourceGroup": "PGTEST",
  "settings": {
    "commandToExecute": "chmod +x /var/tmp/installPG.sh; sudo su - postgres -c /var/tmp/installPG.sh"
  },
  "tags": null,
  "type": "Microsoft.Compute/virtualMachines/extensions",
  "typeHandlerVersion": "2.0",
  "virtualMachineExtensionType": "CustomScript"
}

Binaries ready. Initialize the cluster:

dwe@dwe:~$ az vm extension set --publisher Microsoft.Azure.Extensions --version 2.0 --name CustomScript --vm-name MyPg --resource-group PGTEST --settings '{"commandToExecute":"sudo su - postgres -c \"/u01/app/postgres/product/10/db_5/bin/initdb -D /u02/pgdata/PG1\""}'
{
  "autoUpgradeMinorVersion": true,
  "forceUpdateTag": null,
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.Compute/virtualMachines/MyPg/extensions/CustomScript",
  "instanceView": null,
  "location": "westeurope",
  "name": "CustomScript",
  "protectedSettings": null,
  "provisioningState": "Succeeded",
  "publisher": "Microsoft.Azure.Extensions",
  "resourceGroup": "PGTEST",
  "settings": {
    "commandToExecute": "sudo su - postgres -c \"/u01/app/postgres/product/10/db_5/bin/initdb -D /u02/pgdata/PG1\""
  },
  "tags": null,
  "type": "Microsoft.Compute/virtualMachines/extensions",
  "typeHandlerVersion": "2.0",
  "virtualMachineExtensionType": "CustomScript"
}

Startup:

dwe@dwe:~$ az vm extension set --publisher Microsoft.Azure.Extensions --version 2.0 --name CustomScript --vm-name MyPg --resource-group PGTEST --settings '{"commandToExecute":"sudo su - postgres -c \"/u01/app/postgres/product/10/db_5/bin/pg_ctl -D /u02/pgdata/PG1 start\""}'
{
  "autoUpgradeMinorVersion": true,
  "forceUpdateTag": null,xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx030698d5-42d6-41a1-8740-355649c409e7/resourceGroups/PGTEST/providers/Microsoft.Compute/virtualMachines/MyPg/extensions/CustomScript",
  "instanceView": null,
  "location": "westeurope",
  "name": "CustomScript",
  "protectedSettings": null,
  "provisioningState": "Succeeded",
  "publisher": "Microsoft.Azure.Extensions",
  "resourceGroup": "PGTEST",
  "settings": {
    "commandToExecute": "sudo su - postgres -c \"/u01/app/postgres/product/10/db_5/bin/pg_ctl -D /u02/pgdata/PG1 start\""
  },
  "tags": null,
  "type": "Microsoft.Compute/virtualMachines/extensions",
  "typeHandlerVersion": "2.0",
  "virtualMachineExtensionType": "CustomScript"
}

… and the instance is up and running:

dwe@dwe:~$ ssh x.x.x.x
Last login: Mon Aug 13 10:43:53 2018 from ip-37-201-6-36.hsi13.unitymediagroup.de
[dwe@MyPg ~]$ sudo su - postgres
Last login: Mon Aug 13 11:33:52 UTC 2018 on pts/0
[postgres@MyPg ~]$ /u01/app/postgres/product/10/db_5/bin/psql -c 'select version()'
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)
[postgres@MyPg ~]$

When you want to access this instance from outside Azure you will need to open the port:

dwe@dwe:~$ az vm open-port --resource-group PGTEST --name MyPg --port 5432

Once you have configured PostgreSQL for accepting connections:

[postgres@MyPg ~]$ /u01/app/postgres/product/10/db_5/bin/psql
psql (10.5)
Type "help" for help.

postgres=# alter system set listen_addresses = '*';
ALTER SYSTEM
postgres=# alter user postgres password 'secret';
ALTER ROLE
postgres=# show port ;
 port 
------
 5432
(1 row)

postgres=# \q
[postgres@MyPg ~]$ echo "host    all             all             37.201.6.36/32   md5" >> /u02/pgdata/PG1/pg_hba.conf 
[postgres@MyPg ~]$ /u01/app/postgres/product/10/db_5/bin/pg_ctl -D /u02/pgdata/PG1/ restart

… you can access the instance from your outside Azure:

dwe@dwe:~$ psql -h 137.117.157.183 -U postgres
Password for user postgres: 
psql (9.5.13, server 10.5)
WARNING: psql major version 9.5, server major version 10.
         Some psql features might not work.
Type "help" for help.

postgres=#

Put all that into a well written script and you can have your customized PostgreSQL instance ready in Azure in a couple of minutes. Now that I have a feeling on how that works in general I’ll look into the managed PostgreSQL service in another post.

Cet article Bringing up your customized PostgreSQL instance on Azure est apparu en premier sur Blog dbi services.

↧

Using the managed PostgreSQL service in Azure

August 13, 2018, 10:59 pm

≫ Next: The size of Oracle Home: from 9GB to 600MB – What about PostgreSQL?

≪ Previous: Bringing up your customized PostgreSQL instance on Azure

In the last post we had a look on how you can bring up a customized PostgreSQL instance in the Azure cloud. Now I want to check what you can do with the managed service. For the managed service I am expecting that I can bring up a PostgreSQL quite easily and fast and that I can add replicas on demand. Lets see what is there and how you can use it.

Of course we need to login again:

dwe@dwe:~$ cd /var/tmp
dwe@dwe:/var/tmp$ az login

The az command for working with PostgreSQL is simply “postgres”:

dwe@dwe:~$ az postgres --help

Group
    az postgres : Manage Azure Database for PostgreSQL servers.

Subgroups:
    db          : Manage PostgreSQL databases on a server.
    server      : Manage PostgreSQL servers.
    server-logs : Manage server logs.

Does not look like we can do much but you never know so lets bring up an instance. Again, we need a resource group first:

dwe@dwe:~$ az group create --name PGTEST --location "westeurope"
{
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST",
  "location": "westeurope",
  "managedBy": null,
  "name": "PGTEST",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null
}

Lets try to bring up an instance with a little storage (512MB), SSL enabled and the standard postgres user:

dwe@dwe:~$ az postgres server create --name mymanagedpg1 --resource-group PGTEST --sku-name B_Gen4_2 --ssl-enforcement Enabled --storage-size 512 --admin-user postgres --admin-password xxxxx --location westeurope
Deployment failed. Correlation ID: e3cd6d04-3557-4c2a-b70f-7c11a61c395d. Server name 'PG1' cannot be empty or null. It can only be made up of lowercase letters 'a'-'z', the numbers 0-9 and the hyphen. The hyphen may not lead or trail in the name.

Ok, seems upper case letters are not allowed, try again:

dwe@dwe:~$ az postgres server create --name mymanagedpg1 --resource-group PGTEST --sku-name B_Gen4_2 --ssl-enforcement Enabled --storage-size 512 --admin-user postgres --admin-password postgres --location westeurope
Deployment failed. Correlation ID: e50ca5d6-0e38-48b8-8015-786233c0d103. The storage size of 512 MB does not meet the minimum required storage of 5120 MB.

Ok, we need a minimum of 5120 MB of storage, again:

dwe@dwe:~$ az postgres server create --name mymanagedpg1 --resource-group PGTEST --sku-name B_Gen4_2 --ssl-enforcement Enabled --storage-size 5120 --admin-user postgres --admin-password postgres --location westeurope
Deployment failed. Correlation ID: 470975ce-1ee1-4531-8703-55947772fb51. Password validation failed. The password does not meet policy requirements because it is not complex enough.

This one is good as it at least denies the postgres/postgres combination. Again with a better password:

dwe@dwe:~$ az postgres server create --name mymanagedpg1 --resource-group PGTEST --sku-name B_Gen4_2 --ssl-enforcement Enabled --storage-size 5120 --admin-user postgres --admin-password "xxx" --location westeurope
{
  "administratorLogin": "postgres",
  "earliestRestoreDate": "2018-08-13T12:30:10.763000+00:00",
  "fullyQualifiedDomainName": "mymanagedpg1.postgres.database.azure.com",
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.DBforPostgreSQL/servers/mymanagedpg1",
  "location": "westeurope",
  "name": "mymanagedpg1",
  "resourceGroup": "PGTEST",
  "sku": {
    "capacity": 2,
    "family": "Gen4",
    "name": "B_Gen4_2",
    "size": null,
    "tier": "Basic"
  },
  "sslEnforcement": "Enabled",
  "storageProfile": {
    "backupRetentionDays": 7,
    "geoRedundantBackup": "Disabled",
    "storageMb": 5120
  },
  "tags": null,
  "type": "Microsoft.DBforPostgreSQL/servers",
  "userVisibleState": "Ready",
  "version": "9.6"
}

Better. What I am not happy with is that the default seems to be PostgreSQL 9.6. PostgreSQL 10 is out around a year now and that should definitely by the default. In the portal it looks like this and there you can also find the information required for connecting to the instance:

So lets try to connect:

dwe@dwe:~$ psql -h mymanagedpg1.postgres.database.azure.com -U postgres@mymanagedpg1
psql: FATAL:  no pg_hba.conf entry for host "x.x.x.xx", user "postgres", database "postgres@mymanagedpg1", SSL on
FATAL:  SSL connection is required. Please specify SSL options and retry.

How do we manage that with the managed PostgreSQL service? Actually there is no az command to modify pg_hba_conf but what we need to do is to create a firewall rule:

dwe@dwe:~$ az postgres server firewall-rule create -g PGTEST -s mymanagedpg1 -n allowall --start-ip-address 0.0.0.0 --end-ip-address 255.255.255.255
{
  "endIpAddress": "255.255.255.255",
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.DBforPostgreSQL/servers/mymanagedpg1/firewallRules/allowall",
  "name": "allowall",
  "resourceGroup": "PGTEST",
  "startIpAddress": "0.0.0.0",
  "type": "Microsoft.DBforPostgreSQL/servers/firewallRules"
}

Of course you should not open to the whole world as I am doing here. When the rule is in place connections do work:

dwe@dwe:~$ psql -h mymanagedpg1.postgres.database.azure.com -U postgres@mymanagedpg1 postgres
Password for user postgres@mymanagedpg1: 
psql (9.5.13, server 9.6.9)
WARNING: psql major version 9.5, server major version 9.6.
         Some psql features might not work.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=>

There is an additional database called “azure_maintenance” and we are not allowed to connect there:

postgres=> \l
                                                               List of databases
       Name        |      Owner      | Encoding |          Collate           |           Ctype            |          Access privileges          
-------------------+-----------------+----------+----------------------------+----------------------------+-------------------------------------
 azure_maintenance | azure_superuser | UTF8     | English_United States.1252 | English_United States.1252 | azure_superuser=CTc/azure_superuser
 postgres          | azure_superuser | UTF8     | English_United States.1252 | English_United States.1252 | 
 template0         | azure_superuser | UTF8     | English_United States.1252 | English_United States.1252 | =c/azure_superuser                 +
                   |                 |          |                            |                            | azure_superuser=CTc/azure_superuser
 template1         | azure_superuser | UTF8     | English_United States.1252 | English_United States.1252 | =c/azure_superuser                 +
                   |                 |          |                            |                            | azure_superuser=CTc/azure_superuser
(4 rows)
postgres=> \c azure_maintenance
FATAL:  permission denied for database "azure_maintenance"
DETAIL:  User does not have CONNECT privilege.
Previous connection kept

The minor release is one release behind but as the latest minor release was released this week that seems to be fine:

postgres=> select version();
                           version                           
-------------------------------------------------------------
 PostgreSQL 9.6.9, compiled by Visual C++ build 1800, 64-bit
(1 row)

postgres=>

I would probably not compile PostgreSQL with “Visual C++” but given that we use a Microsoft product, surprise, we are running on Windows:

postgres=> select name,setting from pg_settings where name = 'archive_azure_location';
          name          |          setting          
------------------------+---------------------------
 archive_azure_location | c:\BackupShareDir\Archive
(1 row)

… and the PostgreSQL source code was modified as this parameter does not exist in the community version.

Access to the server logs is quite easy:

dwe@dwe:~$ az postgres server-logs list --resource-group PGTEST --server-name mymanagedpg1
[
  {
    "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.DBforPostgreSQL/servers/mymanagedpg1/logFiles/postgresql-2018-08-13_122334.log",
    "lastModifiedTime": "2018-08-13T12:59:26+00:00",
    "logFileType": "text",
    "name": "postgresql-2018-08-13_122334.log",
    "resourceGroup": "PGTEST",
    "sizeInKb": 6,
    "type": "Microsoft.DBforPostgreSQL/servers/logFiles",
    "url": "https://wasd2prodweu1afse118.file.core.windows.net/74484e5541e04b5a8556eac6a9eb37c8/pg_log/postgresql-2018-08-13_122334.log?sv=2015-04-05&sr=f&sig=ojGG2km5NFrfQ8dJ0btz8bhmwNMe0F7oq0iTRum%2FjJ4%3D&se=2018-08-13T14%3A06%3A16Z&sp=r"
  },
  {
    "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.DBforPostgreSQL/servers/mymanagedpg1/logFiles/postgresql-2018-08-13_130000.log",
    "lastModifiedTime": "2018-08-13T13:00:00+00:00",
    "logFileType": "text",
    "name": "postgresql-2018-08-13_130000.log",
    "resourceGroup": "PGTEST",
    "sizeInKb": 0,
    "type": "Microsoft.DBforPostgreSQL/servers/logFiles",
    "url": "https://wasd2prodweu1afse118.file.core.windows.net/74484e5541e04b5a8556eac6a9eb37c8/pg_log/postgresql-2018-08-13_130000.log?sv=2015-04-05&sr=f&sig=k8avZ62KyLN8RW0ZcIigyPZa40EKNBJvNvneViHjyeI%3D&se=2018-08-13T14%3A06%3A16Z&sp=r"
  }
]

We can just download the logs and have a look at them:

dwe@dwe:~$ wget "https://wasd2prodweu1afse118.file.core.windows.net/74484e5541e04b5a8556eac6a9eb37c8/pg_log/postgresql-2018-08-13_122334.log?sv=2015-04-05&sr=f&sig=Mzy2dQ%2BgRPY8lfkUAP5X%2FkXSxoxWSwrphy7BphaTjLk%3D&se=2018-08-13T14%3A07%3A29Z&sp=r" 
we@dwe:~$ more postgresql-2018-08-13_122334.log\?sv\=2015-04-05\&sr\=f\&sig\=Mzy2dQ%2BgRPY8lfkUAP5X%2FkXSxoxWSwrphy7BphaTjLk%3D\&se\=2018-08-13T14%3A07%3A29Z\&sp\=r
2018-08-13 12:23:34 UTC-5b717845.6c-LOG:  could not bind IPv6 socket: A socket operation was attempted to an unreachable host.
	
2018-08-13 12:23:34 UTC-5b717845.6c-HINT:  Is another postmaster already running on port 20686? If not, wait a few seconds and retry.
2018-08-13 12:23:34 UTC-5b717846.78-LOG:  database system was shut down at 2018-08-13 12:23:32 UTC
2018-08-13 12:23:35 UTC-5b717846.78-LOG:  database startup complete in 1 seconds, startup began 2 seconds after last stop
...

The PostgreSQL configuration is accessible quite easy:

dwe@dwe:~$ az postgres server configuration list --resource-group PGTEST --server-name mymanagedpg1 | head -20
[
  {
    "allowedValues": "on,off",
    "dataType": "Boolean",
    "defaultValue": "on",
    "description": "Enable input of NULL elements in arrays.",
    "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.DBforPostgreSQL/servers/mymanagedpg1/configurations/array_nulls",
    "name": "array_nulls",
    "resourceGroup": "PGTEST",
    "source": "system-default",
    "type": "Microsoft.DBforPostgreSQL/servers/configurations",
    "value": "on"
  },
  {
    "allowedValues": "safe_encoding,on,off",
    "dataType": "Enumeration",
    "defaultValue": "safe_encoding",
    "description": "Sets whether \"\\'\" is allowed in string literals.",
    "id": "/subscriptions/030698d5-42d6-41a1-8740-355649c409e7/resourceGroups/PGTEST/providers/Microsoft.DBforPostgreSQL/servers/mymanagedpg1/configurations/backslash_quote",
    "name": "backslash_quote",

Setting a parameter is easy as well:

dwe@dwe:~$ az postgres server configuration set --name work_mem --value=32 --resource-group PGTEST --server-name mymanagedpg1
Deployment failed. Correlation ID: 634fd473-0c28-43a7-946e-ecbb26faf961. The value '32' for configuration 'work_mem' is not valid. The allowed values are '4096-2097151'.
dwe@dwe:~$ az postgres server configuration set --name work_mem --value=4096 --resource-group PGTEST --server-name mymanagedpg1
{
  "allowedValues": "4096-2097151",
  "dataType": "Integer",
  "defaultValue": "4096",
  "description": "Sets the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files.",
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/PGTEST/providers/Microsoft.DBforPostgreSQL/servers/mymanagedpg1/configurations/work_mem",
  "name": "work_mem",
  "resourceGroup": "PGTEST",
  "source": "system-default",
  "type": "Microsoft.DBforPostgreSQL/servers/configurations",
  "value": "4096"
}

The interesting point is what happens when we change a parameter that requires a restart:

dwe@dwe:~$ az postgres server configuration set --name shared_buffers --value=4096 --resource-group PGTEST --server-name mymanagedpg1
Deployment failed. Correlation ID: d849b302-1c41-4b13-a2d5-6b24f144be89. The configuration 'shared_buffers' does not exist for PostgreSQL server version 9.6.
dwe@dwe:~$ psql -h mymanagedpg1.postgres.database.azure.com -U postgres@mymanagedpg1 postgresPassword for user postgres@mymanagedpg1: 
psql (9.5.13, server 9.6.9)
WARNING: psql major version 9.5, server major version 9.6.
         Some psql features might not work.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=> show shared_buffers ;
 shared_buffers 
----------------
 512MB
(1 row)
postgres=>

So memory configuration depends on the pricing models, more information here. If you want to scale up or down “you can independently change the vCores, the hardware generation, the pricing tier (except to and from Basic), the amount of storage, and the backup retention period”.

Some final thoughts: Bringing an instance up is quite easy and simple. The default PostgreSQL version is 9.6.x, which is not a good choice in my opinion, version 10 already got the 5th minor release and is stable and the most recent version. Scaling up and down is a matter of changing basic stuff such as cores, memory, storage and pricing models. For many workloads this is probably fine, if you want to have more control you’d do better in provisioning VMs and then do the PostgreSQL stuff for your own. High availability is not implemented by adding replicas but by creating new nodes, attaching the storage to that node and then bring it up. This might be sufficient, it might be not, depends on your requirements.

In a next post we will build our own PostgreSQL HA solution on Azure.

Cet article Using the managed PostgreSQL service in Azure est apparu en premier sur Blog dbi services.

↧

The size of Oracle Home: from 9GB to 600MB – What about PostgreSQL?

August 17, 2018, 11:20 am

≫ Next: When we do a pg_dump and right afterwards truncate a table which is in the dump, what happens?

≪ Previous: Using the managed PostgreSQL service in Azure

A recent blog post from Franck and a tweet around that topic is the inspiration for this blog post, thanks Jan for requesting :). In short it is about how small you can get the binaries. Is that important? At least when it comes to Docker images it might get important as you usually try make the image as small as possible. Well, comparing PostgreSQL and Oracle in that field is unfair as Oracle comes with many stuff by default which PostgreSQL is just not shipping (e.g. Apex, SQL Developer, …), so please treat this more a as fun post, please.

The way we usually compile PostgreSQL is this (not in /var/tmp in real life):

postgres@pgbox:/home/postgres/ [pg103] cd /var/tmp/
postgres@pgbox:/var/tmp/ [pg103] wget https://ftp.postgresql.org/pub/source/v10.5/postgresql-10.5.tar.bz2
postgres@pgbox:/var/tmp/ [pg103] tar -axf postgresql-10.5.tar.bz2
postgres@pgbox:/var/tmp/ [pg103] cd postgresql-10.5/
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] PGHOME=/var/tmp/pg105
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] SEGSIZE=2
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] BLOCKSIZE=8
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] WALSEGSIZE=16
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] ./configure --prefix=${PGHOME} \
                                                             --exec-prefix=${PGHOME} \
                                                             --bindir=${PGHOME}/bin \
                                                             --libdir=${PGHOME}/lib \
                                                             --sysconfdir=${PGHOME}/etc \
                                                             --includedir=${PGHOME}/include \
                                                             --datarootdir=${PGHOME}/share \
                                                             --datadir=${PGHOME}/share \
                                                             --with-pgport=5432 \
                                                             --with-perl \
                                                             --with-python \
                                                             --with-openssl \
                                                             --with-pam \
                                                             --with-ldap \
                                                             --with-libxml \
                                                             --with-libxslt \
                                                             --with-segsize=${SEGSIZE} \
                                                             --with-blocksize=${BLOCKSIZE} \
                                                             --with-wal-segsize=${WALSEGSIZE}  \
                                                             --with-systemd
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] make -j 4 all
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] make install
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] cd contrib
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] make -j 4 install

When we do this against the PostgreSQL 10.5 source code the result is this (without the documentation, of course, but containing all the extensions ):

postgres@pgbox:/var/tmp/postgresql-10.5/contrib/ [pg103] du -sh /var/tmp/pg105/
28M	/var/tmp/pg105/

Can we get that even smaller? Let’s try to skip the extensions:

postgres@pgbox:/var/tmp/postgresql-10.5/contrib/ [pg103] cd ..
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] make clean
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] ./configure --prefix=${PGHOME} \
                                                             --exec-prefix=${PGHOME} \
                                                             --bindir=${PGHOME}/bin \
                                                             --libdir=${PGHOME}/lib \
                                                             --sysconfdir=${PGHOME}/etc \
                                                             --includedir=${PGHOME}/include \
                                                             --datarootdir=${PGHOME}/share \
                                                             --datadir=${PGHOME}/share \
                                                             --with-pgport=5432 \
                                                             --with-perl \
                                                             --with-python \
                                                             --with-openssl \
                                                             --with-pam \
                                                             --with-ldap \
                                                             --with-libxml \
                                                             --with-libxslt \
                                                             --with-segsize=${SEGSIZE} \
                                                             --with-blocksize=${BLOCKSIZE} \
                                                             --with-wal-segsize=${WALSEGSIZE}  \
                                                             --with-systemd
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] make -j 4 all
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] rm -rf /var/tmp/pg105/
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] make install

What do we have now?

postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] du -sh /var/tmp/pg105/
25M	/var/tmp/pg105/

We saved another 3MB. Can we do more? Let’s try to skip all the “–with” flags that enable perl and so on for the configure command:

postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] make clean
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] ./configure --prefix=${PGHOME} \
                                                             --exec-prefix=${PGHOME} \
                                                             --bindir=${PGHOME}/bin \
                                                             --libdir=${PGHOME}/lib \
                                                             --sysconfdir=${PGHOME}/etc \
                                                             --includedir=${PGHOME}/include \
                                                             --datarootdir=${PGHOME}/share \
                                                             --datadir=${PGHOME}/share \
                                                             --with-pgport=5432 \
                                                             --with-segsize=${SEGSIZE} \
                                                             --with-blocksize=${BLOCKSIZE} \
                                                             --with-wal-segsize=${WALSEGSIZE}  \
                                                             --with-systemd
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] make -j 4 all
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] rm -rf /var/tmp/pg105/
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] make install

Do we see a change?

postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] du -sh /var/tmp/pg105/
25M	/var/tmp/pg105/

No, that does not change anything. Franck stripped the Oracle binaries and libraries, so lets try to do the same (although I am not sure right now if that is supported):

postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] du -sh /var/tmp/pg105/
25M	/var/tmp/pg105/
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] strip /var/tmp/pg105/bin/*
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] strip /var/tmp/pg105/lib/*
strip: Warning: '/var/tmp/pg105/lib/pkgconfig' is not an ordinary file
strip: Warning: '/var/tmp/pg105/lib/postgresql' is not an ordinary file
postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] du -sh /var/tmp/pg105/
24M	/var/tmp/pg105/

So, another 1MB less. Can we still initialize and start PostgreSQL?

postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] /var/tmp/pg105/bin/initdb -D /var/tmp/testpg
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locales
COLLATE: en_US.utf8
CTYPE: en_US.utf8
MESSAGES: en_US.utf8
MONETARY: de_CH.UTF-8
NUMERIC: de_CH.UTF-8
TIME: en_US.UTF-8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /var/tmp/testpg ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

/var/tmp/pg105/bin/pg_ctl -D /var/tmp/testpg -l logfile start

postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] /var/tmp/pg105/bin/pg_ctl -D /var/tmp/testpg/ start
waiting for server to start....2018-08-17 18:57:50.329 CEST [8528] LOG: listening on IPv6 address "::1", port 5432
2018-08-17 18:57:50.329 CEST [8528] LOG: listening on IPv4 address "127.0.0.1", port 5432
2018-08-17 18:57:50.334 CEST [8528] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2018-08-17 18:57:50.354 CEST [8529] LOG: database system was shut down at 2018-08-17 18:57:31 CEST
2018-08-17 18:57:50.358 CEST [8528] LOG: database system is ready to accept connections
done
server started

Looks good and we are able to connect:

postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] /var/tmp/pg105/bin/psql -c "select version()" postgres
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

Time: 1.428 ms

What else can we do? When you do not need the utilities on the server you could just remove them (as said, this is a fun post, don’t do this):

postgres@pgbox:/var/tmp/postgresql-10.5/ [pg103] cd /var/tmp/pg105/bin
postgres@pgbox:/var/tmp/pg105/bin/ [pg103] rm clusterdb createdb createuser dropdb dropuser pg_archivecleanup pg_basebackup pg_dump pg_dumpall pg_isready pg_receivewal pg_recvlogical pg_resetwal pg_restore pg_rewind pg_test_fsync pg_test_timing pg_upgrade pg_waldump reindexdb vacuumdb

We could probably even remove pgbench and psql but these two I will need to show that the server is still working. What do we have now?

postgres@pgbox:/var/tmp/pg105/bin/ [pg103] du -sh /var/tmp/pg105/
21M	/var/tmp/pg105/

Another 3MB less. Can we still restart and connect?

postgres@pgbox:/var/tmp/pg105/bin/ [pg103] /var/tmp/pg105/bin/pg_ctl -D /var/tmp/testpg/ stop
waiting for server to shut down....2018-08-17 19:08:49.588 CEST [9144] LOG:  received fast shutdown request
2018-08-17 19:08:49.593 CEST [9144] LOG:  aborting any active transactions
2018-08-17 19:08:49.597 CEST [9144] LOG:  worker process: logical replication launcher (PID 9151) exited with exit code 1
2018-08-17 19:08:49.598 CEST [9146] LOG:  shutting down
2018-08-17 19:08:49.625 CEST [9144] LOG:  database system is shut down
 done
server stopped
postgres@pgbox:/var/tmp/pg105/bin/ [pg103] /var/tmp/pg105/bin/pg_ctl -D /var/tmp/testpg/ start
waiting for server to start....2018-08-17 19:08:51.949 CEST [9368] LOG:  listening on IPv6 address "::1", port 9999
2018-08-17 19:08:51.949 CEST [9368] LOG:  listening on IPv4 address "127.0.0.1", port 9999
2018-08-17 19:08:51.953 CEST [9368] LOG:  listening on Unix socket "/tmp/.s.PGSQL.9999"
2018-08-17 19:08:51.966 CEST [9369] LOG:  database system was shut down at 2018-08-17 19:08:49 CEST
2018-08-17 19:08:51.969 CEST [9368] LOG:  database system is ready to accept connections
 done
server started
postgres@pgbox:/var/tmp/pg105/bin/ [pg103] /var/tmp/pg105/bin/psql -c "select version()" postgres
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

Time: 2.043 ms

Looks good. Now lets do the final step and remove the rest which is not required for the server, but before that we do an initdb as we can not do that afterwards:

postgres@pgbox:/var/tmp/pg105/bin/ [pg103] /var/tmp/pg105/bin/pg_ctl -D /var/tmp/testpg/ stop
waiting for server to shut down....2018-08-17 19:10:31.693 CEST [9368] LOG: received fast shutdown request
2018-08-17 19:10:31.696 CEST [9368] LOG: aborting any active transactions
2018-08-17 19:10:31.696 CEST [9368] LOG: worker process: logical replication launcher (PID 9375) exited with exit code 1
2018-08-17 19:10:31.697 CEST [9370] LOG: shutting down
2018-08-17 19:10:31.712 CEST [9368] LOG: database system is shut down
done
server stopped
postgres@pgbox:/var/tmp/pg105/bin/ [pg103] rm -rf /var/tmp/testpg/
postgres@pgbox:/var/tmp/pg105/bin/ [pg103] /var/tmp/pg105/bin/initdb -D /var/tmp/testpg
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

Data page checksums are disabled.

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

/var/tmp/pg105/bin/pg_ctl -D /var/tmp/testpg -l logfile start

So, remove the rest:

postgres@pgbox:/var/tmp/pg105/bin/ [pg103] rm pg_config pg_controldata psql pgbench initdb ecpg pgbench pg_ctl
postgres@pgbox:/var/tmp/pg105/bin/ [pg103] du -sh /var/tmp/pg105/
20M	/var/tmp/pg105/

We are down to 20MB but we can still start the instance:

postgres@pgbox:/var/tmp/pg105/bin/ [pg103] /var/tmp/pg105/bin/postgres -D /var/tmp/testpg/ &
[1] 9486
postgres@pgbox:/var/tmp/pg105/bin/ [pg103] 2018-08-17 19:13:54.917 CEST [9486] LOG:  listening on IPv6 address "::1", port 9999
2018-08-17 19:13:54.917 CEST [9486] LOG:  listening on IPv4 address "127.0.0.1", port 9999
2018-08-17 19:13:54.924 CEST [9486] LOG:  listening on Unix socket "/tmp/.s.PGSQL.9999"
2018-08-17 19:13:54.955 CEST [9487] LOG:  database system was shut down at 2018-08-17 19:10:56 CEST
2018-08-17 19:13:54.960 CEST [9486] LOG:  database system is ready to accept connections

postgres@pgbox:/var/tmp/pg105/bin/ [pg103] ps -ef | grep postgres
root      1061   941  0 18:26 ?        00:00:00 sshd: postgres [priv]
postgres  1064  1061  0 18:26 ?        00:00:02 sshd: postgres@pts/0
postgres  1065  1064  0 18:26 pts/0    00:00:01 -bash
postgres  9486  1065  0 19:13 pts/0    00:00:00 /var/tmp/pg105/bin/postgres -D /var/tmp/testpg/
postgres  9488  9486  0 19:13 ?        00:00:00 postgres: checkpointer process   
postgres  9489  9486  0 19:13 ?        00:00:00 postgres: writer process   
postgres  9490  9486  0 19:13 ?        00:00:00 postgres: wal writer process   
postgres  9491  9486  0 19:13 ?        00:00:00 postgres: autovacuum launcher process   
postgres  9492  9486  0 19:13 ?        00:00:00 postgres: stats collector process   
postgres  9493  9486  0 19:13 ?        00:00:00 postgres: bgworker: logical replication launcher  
postgres  9496  1065  0 19:14 pts/0    00:00:00 ps -ef
postgres  9497  1065  0 19:14 pts/0    00:00:00 grep --color=auto postgres

Using another psql on that box we can confirm that we can connect:

postgres@pgbox:/var/tmp/pg105/bin/ [pg103] /u01/app/postgres/product/10/db_4/bin/psql -c "select version()" postgres
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

Still too much? What else can we? What is consuming space:

postgres@pgbox:/var/tmp/pg105/bin/ [pg103] du -sh /var/tmp/pg105/*
6.6M	/var/tmp/pg105/bin
5.9M	/var/tmp/pg105/include
4.1M	/var/tmp/pg105/lib
2.9M	/var/tmp/pg105/share

We can not do more in the “bin” directory, nothing left to delete:

postgres@pgbox:/var/tmp/pg105/bin/ [pg103] ls -l /var/tmp/pg105/bin
total 6660
-rwxr-xr-x. 1 postgres postgres 6817480 Aug 17 18:56 postgres
lrwxrwxrwx. 1 postgres postgres       8 Aug 17 18:54 postmaster -> postgres

Everything else will probably safe us a few bytes such as the sample files:

postgres@pgbox:/var/tmp/pg105/ [pg103] find . -name *sample*
./share/postgresql/tsearch_data/synonym_sample.syn
./share/postgresql/tsearch_data/thesaurus_sample.ths
./share/postgresql/tsearch_data/hunspell_sample.affix
./share/postgresql/tsearch_data/ispell_sample.affix
./share/postgresql/tsearch_data/ispell_sample.dict
./share/postgresql/tsearch_data/hunspell_sample_long.affix
./share/postgresql/tsearch_data/hunspell_sample_long.dict
./share/postgresql/tsearch_data/hunspell_sample_num.affix
./share/postgresql/tsearch_data/hunspell_sample_num.dict
./share/postgresql/pg_hba.conf.sample
./share/postgresql/pg_ident.conf.sample
./share/postgresql/postgresql.conf.sample
./share/postgresql/recovery.conf.sample
./share/postgresql/pg_service.conf.sample
./share/postgresql/psqlrc.sample

So how much space do we consume for the PostgreSQL installation and the files which make up the instance?

postgres@pgbox:/var/tmp/pg105/bin/ [pg103] du -sh /var/tmp/pg105/
20M	/var/tmp/pg105/
postgres@pgbox:/var/tmp/pg105/bin/ [pg103] du -sh /var/tmp/testpg/
41M	/var/tmp/testpg/

… 61MB. When we add the wal file Jan mentioned in his tweet we come the 77MB. Not much.

The final question is if PostgreSQL is still working. Let’s use pgbench from another installation on the same server against this:

postgres@pgbox:/var/tmp/pg105/ [pg103] /u01/app/postgres/product/10/db_3/bin/pgbench -i -s 10 postgres
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
creating tables...
100000 of 1000000 tuples (10%) done (elapsed 0.08 s, remaining 0.75 s)
200000 of 1000000 tuples (20%) done (elapsed 0.24 s, remaining 0.95 s)
300000 of 1000000 tuples (30%) done (elapsed 0.42 s, remaining 0.98 s)
400000 of 1000000 tuples (40%) done (elapsed 0.49 s, remaining 0.74 s)
500000 of 1000000 tuples (50%) done (elapsed 0.70 s, remaining 0.70 s)
600000 of 1000000 tuples (60%) done (elapsed 0.88 s, remaining 0.58 s)
700000 of 1000000 tuples (70%) done (elapsed 0.95 s, remaining 0.41 s)
800000 of 1000000 tuples (80%) done (elapsed 1.14 s, remaining 0.29 s)
900000 of 1000000 tuples (90%) done (elapsed 1.32 s, remaining 0.15 s)
1000000 of 1000000 tuples (100%) done (elapsed 1.41 s, remaining 0.00 s)
vacuum...
set primary keys...
done.
postgres@pgbox:/var/tmp/pg105/ [pg103] /u01/app/postgres/product/10/db_3/bin/pgbench -s 10 postgres
scale option ignored, using count from pgbench_branches table (10)
starting vacuum...end.
transaction type: 
scaling factor: 10
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 4.436 ms
tps = 225.435296 (including connections establishing)
tps = 285.860401 (excluding connections establishing)

Looks good. So you can come down to 20MB for the PostgreSQL installation and another 41Mb for the files you need to start the instance. You could even drop the postgres database to save another 7MB. But remember: Please don’t do that, you are still fine with around 30MB

Cet article The size of Oracle Home: from 9GB to 600MB – What about PostgreSQL? est apparu en premier sur Blog dbi services.

↧

When we do a pg_dump and right afterwards truncate a table which is in the dump, what happens?

August 20, 2018, 11:16 pm

≫ Next: Masking Data With PostgreSQL

≪ Previous: The size of Oracle Home: from 9GB to 600MB – What about PostgreSQL?

Being at customers is always the best way to learn. Today while discussing that pg_dump will always produce a consistent dump because it uses the “repeatable read” isolation level this question came up: What happens when we dump a database and while the dump is running we truncate a table in that database? Does that block? Well, the answer is in the documentation: pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers).

What is not in the documentation is that pg_dump uses the “repeatable read” isolation level, but it is documented in the source code:

postgres@pgbox:/home/postgres/postgresql-10.4/ [PG10] vi src/bin/pg_dump/pg_dump.c
...
 *      Note that pg_dump runs in a transaction-snapshot mode transaction,
 *      so it sees a consistent snapshot of the database including system
 *      catalogs. However, it relies in part on various specialized backend
 *      functions like pg_get_indexdef(), and those things tend to look at
 *      the currently committed state.  So it is possible to get 'cache
 *      lookup failed' error if someone performs DDL changes while a dump is
 *      happening. The window for this sort of thing is from the acquisition
 *      of the transaction snapshot to getSchemaData() (when pg_dump acquires
 *      AccessShareLock on every table it intends to dump). It isn't very large,
 *      but it can happen.
...

For the moment lets ignore the rest of that paragraph and focus on the original question. For that lets create some sample data we can dump:

postgres=# create database dump;
CREATE DATABASE
postgres=# \c dump
You are now connected to database "dump" as user "postgres".
dump=# create table t_dump as 
       select a.*, md5(a::text) 
         from generate_series ( 1, 3000000 ) a;
SELECT 3000000

As we need two sessions for this demo we increase the time it takes for the dump by compressing at the highest level:

postgres@pgbox:/home/postgres/ [PG10] pg_dump --compress=9 dump > test.dump

In a second session, while the dump is running, we truncate the table?

dump=# truncate table t_dump;
TRUNCATE TABLE
Time: 9411.574 ms (00:09.412)

And surprise: Yes, the pg_dump operation is blocking the truncate (you can see that from the time it took, usually a truncate is instant). So the documentation is not quite accurate. Before going further, does the same happen when we modify the table while the dump is running? Same test as above for the dump and in the second session:

dump=# alter table t_dump add c text;
ALTER TABLE
Time: 11093.535 ms (00:11.094)

Same here, blocking (otherwise the addition of a column would have been instant). So when you do a DDL against a table while a dump is running that DDL has to wait until the dump completed.

Coming back to the remaining sentences of the paragraph from the source code. pg_dump acquires an AccessShareLock while it is running and we can verify this in the second session while the dump is running:

dump=# select database, relation::regclass, mode from pg_locks where relation = 't_dump'::regclass;
 database | relation |      mode       
----------+----------+-----------------
    33985 | t_dump   | AccessShareLock
(1 row)

This does not lock the table for reading or writing but it does lock the table for DDLs. We can confirm that as well when we do a select and an insert in the second session while the dump is running in the first session:

dump=# insert into t_dump (a,md5,c) values (-1,'aaa','bbb');
INSERT 0 1
Time: 8.131 ms
dump=# select * from t_dump where a = -1;
 a  | md5 |  c  
----+-----+-----
 -1 | aaa | bbb
(1 row)

No issues here. When we manually lock the table in “AccessShareLock” in the first session we will not be able to alter it in the second session.
Session 1:

dump=# begin;
BEGIN
dump=# lock table t_dump IN ACCESS SHARE MODE;
LOCK TABLE
dump=#

… and in the second session try some DDL:

dump=# alter table t_dump alter COLUMN c set default 'a';
-- blocks

Creating an index on that table while is locked in that mode works:

dump=# create index i1 on t_dump (c);
CREATE INDEX

… while dropping an index while the table is locked in that mode will block as well:

dump=# drop index i1;
-- blocks

So the final advice: Plan to do your dumps when there is no DDL activity.

Cet article When we do a pg_dump and right afterwards truncate a table which is in the dump, what happens? est apparu en premier sur Blog dbi services.

↧

Masking Data With PostgreSQL

September 13, 2018, 8:01 am

≫ Next: EDB containers for OpenShift 2.3 – PEM integration

≪ Previous: When we do a pg_dump and right afterwards truncate a table which is in the dump, what happens?

I was searching a tool for anonymizing data in a PostgreSQL database and I have tested the extension pg_anonymizer.
PostgreSQL_anonymizer is a set of SQL functions that remove personally identifiable values from a PostgreSQL table and replace them with random-but-plausible values. The goal is to avoid any identification from the data record while remaining suitable for testing, data analysis and data processing.
In this blog I am showing how this extension can be used. I am using a PostgreSQL 10 database.
The first step is to install the extension pg_anonymizer. In my case I did it with with pgxn client

[postgres@pgserver2 ~]$ pgxn install postgresql_anonymizer --pg_config /u01/app/postgres/product/10/db_1/bin/pg_config
INFO: best version: postgresql_anonymizer 0.0.3
INFO: saving /tmp/tmpVf3psT/postgresql_anonymizer-0.0.3.zip
INFO: unpacking: /tmp/tmpVf3psT/postgresql_anonymizer-0.0.3.zip
INFO: building extension
gmake: Nothing to be done for `all'.
INFO: installing extension
/usr/bin/mkdir -p '/u01/app/postgres/product/10/db_1/share/extension'
/usr/bin/mkdir -p '/u01/app/postgres/product/10/db_1/share/extension/anon'
/usr/bin/install -c -m 644 .//anon.control '/u01/app/postgres/product/10/db_1/share/extension/'
/usr/bin/install -c -m 644 .//anon/anon--0.0.3.sql  '/u01/app/postgres/product/10/db_1/share/extension/anon/'
[postgres@pgserver2 ~]$

We can then verify that under /u01/app/postgres/product/10/db_1/share/extension we have a file anon.control and a directory named anon

[postgres@pgserver2 extension]$ ls -ltra anon*
-rw-r--r--. 1 postgres postgres 167 Sep 13 10:54 anon.control

anon:
total 18552
drwxrwxr-x. 3 postgres postgres    12288 Sep 13 10:54 ..
drwxrwxr-x. 2 postgres postgres       28 Sep 13 10:54 .
-rw-r--r--. 1 postgres postgres 18980156 Sep 13 10:54 anon--0.0.3.sql

Let’s create a database named prod and let’s create the required extensions. tsm_system_rows should delivered by the contrib.

prod=# \c prod
You are now connected to database "prod" as user "postgres".
prod=#
prod=# CREATE EXTENSION tsm_system_rows;;
CREATE EXTENSION
prod=#

prod=# CREATE EXTENSION anon;
CREATE EXTENSION
prod=#


prod=# \dx
                                    List of installed extensions
      Name       | Version |   Schema   |                        Description

-----------------+---------+------------+----------------------------------------------------
--------
 anon            | 0.0.3   | anon       | Data anonymization tools
 plpgsql         | 1.0     | pg_catalog | PL/pgSQL procedural language
 tsm_system_rows | 1.0     | public     | TABLESAMPLE method which accepts number of rows as
a limit
(3 rows)

prod=#

The extension will create following functions in the schema anon. These functions can be used to mask some data.

prod=# set search_path=anon;
SET
prod=# \df
                                                               List of functions
 Schema |           Name           |     Result data type     |                          Argu
ment data types                           |  Type
--------+--------------------------+--------------------------+------------------------------
------------------------------------------+--------
 anon   | random_city              | text                     |
                                          | normal
 anon   | random_city_in_country   | text                     | country_name text
                                          | normal
 anon   | random_company           | text                     |
                                          | normal
 anon   | random_country           | text                     |
                                          | normal
 anon   | random_date              | timestamp with time zone |
                                          | normal
 anon   | random_date_between      | timestamp with time zone | date_start timestamp with tim
e zone, date_end timestamp with time zone | normal
 anon   | random_email             | text                     |
                                          | normal
 anon   | random_first_name        | text                     |
                                          | normal
 anon   | random_iban              | text                     |
                                          | normal
 anon   | random_int_between       | integer                  | int_start integer, int_stop integer
                            | normal
 anon   | random_last_name         | text                     |
                            | normal
 anon   | random_phone             | text                     | phone_prefix text DEFAULT '0'::text
                            | normal
 anon   | random_region            | text                     |
                            | normal
 anon   | random_region_in_country | text                     | country_name text
                            | normal
 anon   | random_siren             | text                     |
                            | normal
 anon   | random_siret             | text                     |
                            | normal
 anon   | random_string            | text                     | l integer
                            | normal
 anon   | random_zip               | text                     |
                            | normal
(18 rows)

prod=#

Now in the database prod let’s create a table with some data.

prod=# \d customers
                      Table "public.customers"
   Column   |         Type          | Collation | Nullable | Default
------------+-----------------------+-----------+----------+---------
 first_name | character varying(30) |           |          |
 last_name  | character varying(30) |           |          |
 email_add  | character varying(30) |           |          |
 country    | character varying(60) |           |          |
 iban       | character varying(60) |           |          |
 amount     | integer               |           |          |

prod=#

prod=# table customers;
 first_name | last_name |        email_add        |   country    |            iban            |   amount
------------+-----------+-------------------------+--------------+----------------------------+------------
 Michel     | Delco     | michel.delco@yaaa.fr    | FRANCE       | FR763000600001123456890189 |    5000000
 Denise     | Blanchot  | denise.blanchot@yaaa.de | GERMANY      | DE91100000000123456789     | 1000000000
 Farid      | Dim       | farid.dim@yaaa.sa       | Saudi Arabia | SA4420000001234567891234   |    2500000
(3 rows)

prod=#

Let’s say that I want some people to access to all data for this table, but I don’t want them to see the real email, the real country and the real iban of the customers.
One solution should be to create a view with anonymous data for these columns. This will replace them with random-but-plausible values for these columns

prod=# create view Customers_anon as select first_name as Firstname ,last_name  as Lastnmame,anon.random_email() as Email ,anon.random_country() as Country, anon.random_iban() as Iban ,amount as Amount from customers;
CREATE VIEW

And then grant the access privilege to concerned people

prod=# select * from customers_anon ;
 firstname | lastnmame |             email             | country |            iban            |   amount
-----------+-----------+-------------------------------+---------+----------------------------+------------
 Michel    | Delco     | wlothean0@springer.com        | Spain   |  AD1111112222C3C3C3C3C3C3  |    5000000
 Denise    | Blanchot  | emoraloy@dropbox.com          | Myanmar |  AD1111112222C3C3C3C3C3C3  | 1000000000
 Farid     | Dim       | vbritlandkt@deliciousdays.com | India   |  AD1111112222C3C3C3C3C3C3  |    2500000
(3 rows)

prod=#

Thanks to Damien for this project. Just not that this project is in its early development and should be used with care (see Damien comment below).

Cet article Masking Data With PostgreSQL est apparu en premier sur Blog dbi services.

↧

EDB containers for OpenShift 2.3 – PEM integration

September 17, 2018, 9:19 am

≫ Next: Inheriting super user privileges over a role automatically in PostgreSQL

≪ Previous: Masking Data With PostgreSQL

A few days ago EnterpriseDB announced the availability of version 2.3 of the EDB containers for OpenShift. The main new feature in this release is the integration of PEM (Postgres Enterprise Manager), so in this post we’ll look at how we can bring up a PEM server in OpenShift. If you did not follow the lats posts about EDB containers in OpenShift here is the summary:

The first step you need to do is to download the updated container images. You’ll notice that there are two new containers which have not been available before the 2.3 release:

edb-pemserver: Obviously this is the PEM server
admintool: a utility container for supporting database upgrades and launching PEM agents on the database containers

For downloading the latest release of the EDB container images for OpenShift, the procedure is the following:

docker run -d -p 5000:5000 --restart=always --name registry registry:2
docker login containers.enterprisedb.com

docker pull containers.enterprisedb.com/edb/edb-as:v10
docker tag containers.enterprisedb.com/edb/edb-as:v10 localhost:5000/edb/edb-as:v10
docker push localhost:5000/edb/edb-as:v10

docker pull containers.enterprisedb.com/edb/edb-pgpool:v3.6
docker tag containers.enterprisedb.com/edb/edb-pgpool:v3.6 localhost:5000/edb/edb-pgpool:v3.6
docker push localhost:5000/edb/edb-pgpool:v3.6

docker pull containers.enterprisedb.com/edb/edb-pemserver:v7.3
docker tag containers.enterprisedb.com/edb/edb-pemserver:v7.3 localhost:5000/edb/edb-pemserver:v7.3
docker push localhost:5000/edb/edb-pemserver:v7.3

docker pull containers.enterprisedb.com/edb/edb-admintool
docker tag containers.enterprisedb.com/edb/edb-admintool localhost:5000/edb/edb-admintool
docker push localhost:5000/edb/edb-admintool

docker pull containers.enterprisedb.com/edb/edb-bart:v2.1
docker tag containers.enterprisedb.com/edb/edb-bart:v2.1 localhost:5000/edb/edb-bart:v2.1
docker push localhost:5000/edb/edb-bart:v2.1

In my case I have quite a few EDB containers available now (…and I could go ahead and delete the old ones, of course):

docker@minishift:~$ docker images | grep edb
containers.enterprisedb.com/edb/edb-as          v10                 1d118c96529b        45 hours ago        1.804 GB
localhost:5000/edb/edb-as                       v10                 1d118c96529b        45 hours ago        1.804 GB
containers.enterprisedb.com/edb/edb-admintool   latest              07fda249cf5c        10 days ago         531.6 MB
localhost:5000/edb/edb-admintool                latest              07fda249cf5c        10 days ago         531.6 MB
containers.enterprisedb.com/edb/edb-pemserver   v7.3                78954c316ca9        10 days ago         1.592 GB
localhost:5000/edb/edb-pemserver                v7.3                78954c316ca9        10 days ago         1.592 GB
containers.enterprisedb.com/edb/edb-bart        v2.1                e2410ed4cf9b        10 days ago         571 MB
localhost:5000/edb/edb-bart                     v2.1                e2410ed4cf9b        10 days ago         571 MB
containers.enterprisedb.com/edb/edb-pgpool      v3.6                e8c600ab993a        10 days ago         561.1 MB
localhost:5000/edb/edb-pgpool                   v3.6                e8c600ab993a        10 days ago         561.1 MB
containers.enterprisedb.com/edb/edb-as                              00adaa0d4063        3 months ago        979.3 MB
localhost:5000/edb/edb-as                                           00adaa0d4063        3 months ago        979.3 MB
localhost:5000/edb/edb-pgpool                   v3.5                e7efdb0ae1be        4 months ago        564.1 MB
containers.enterprisedb.com/edb/edb-pgpool      v3.5                e7efdb0ae1be        4 months ago        564.1 MB
localhost:5000/edb/edb-as                       v10.3               90b79757b2f7        4 months ago        842.7 MB
containers.enterprisedb.com/edb/edb-bart        v2.0                48ee2c01db92        4 months ago        590.6 MB
localhost:5000/edb/edb-bart                     2.0                 48ee2c01db92        4 months ago        590.6 MB
localhost:5000/edb/edb-bart                     v2.0                48ee2c01db92        4 months ago        590.6 MB

The only bits I changed in the yaml file that describes my EDB AS deployment compared to the previous posts are these (check the high-lightened lines, there are only two):

apiVersion: v1
kind: Template
metadata:
   name: edb-as10-custom
   annotations:
    description: "Custom EDB Postgres Advanced Server 10.0 Deployment Config"
    tags: "database,epas,postgres,postgresql"
    iconClass: "icon-postgresql"
objects:
- apiVersion: v1 
  kind: Service
  metadata:
    name: ${DATABASE_NAME}-service 
    labels:
      role: loadbalancer
      cluster: ${DATABASE_NAME}
  spec:
    selector:                  
      lb: ${DATABASE_NAME}-pgpool
    ports:
    - name: lb 
      port: ${PGPORT}
      targetPort: 9999
    sessionAffinity: None
    type: LoadBalancer
- apiVersion: v1 
  kind: DeploymentConfig
  metadata:
    name: ${DATABASE_NAME}-pgpool
  spec:
    replicas: 2
    selector:
      lb: ${DATABASE_NAME}-pgpool
    strategy:
      resources: {}
      rollingParams:
        intervalSeconds: 1
        maxSurge: 25%
        maxUnavailable: 25%
        timeoutSeconds: 600
        updatePeriodSeconds: 1
      type: Rolling
    template:
      metadata:
        labels:
          lb: ${DATABASE_NAME}-pgpool
          role: queryrouter
          cluster: ${DATABASE_NAME}
      spec:
        containers:
        - name: edb-pgpool
          env:
          - name: DATABASE_NAME
            value: ${DATABASE_NAME} 
          - name: PGPORT
            value: ${PGPORT} 
          - name: REPL_USER
            value: ${REPL_USER} 
          - name: ENTERPRISEDB_PASSWORD
            value: 'postgres'
          - name: REPL_PASSWORD
            value: 'postgres'
          - name: ACCEPT_EULA
            value: ${ACCEPT_EULA}
          image: localhost:5000/edb/edb-pgpool:v3.6
          imagePullPolicy: IfNotPresent
          readinessProbe:
            exec:
              command:
              - /var/lib/edb/testIsReady.sh
            initialDelaySeconds: 60
            timeoutSeconds: 5
    triggers:
    - type: ConfigChange
- apiVersion: v1
  kind: DeploymentConfig
  metadata:
    name: ${DATABASE_NAME}-as10-0
  spec:
    replicas: 1
    selector:
      db: ${DATABASE_NAME}-as10-0 
    strategy:
      resources: {}
      rollingParams:
        intervalSeconds: 1
        maxSurge: 25%
        maxUnavailable: 25%
        timeoutSeconds: 600
        updatePeriodSeconds: 1
      type: Rolling
    template:
      metadata:
        creationTimestamp: null
        labels:
          db: ${DATABASE_NAME}-as10-0 
          cluster: ${DATABASE_NAME}
      spec:
        containers:
        - name: edb-as10 
          env:
          - name: DATABASE_NAME 
            value: ${DATABASE_NAME} 
          - name: DATABASE_USER 
            value: ${DATABASE_USER} 
          - name: DATABASE_USER_PASSWORD
            value: 'postgres'
          - name: ENTERPRISEDB_PASSWORD
            value: 'postgres'
          - name: REPL_USER
            value: ${REPL_USER} 
          - name: REPL_PASSWORD
            value: 'postgres'
          - name: PGPORT
            value: ${PGPORT} 
          - name: RESTORE_FILE
            value: ${RESTORE_FILE} 
          - name: LOCALEPARAMETER
            value: ${LOCALEPARAMETER}
          - name: CLEANUP_SCHEDULE
            value: ${CLEANUP_SCHEDULE}
          - name: EFM_EMAIL
            value: ${EFM_EMAIL}
          - name: NAMESERVER
            value: ${NAMESERVER}
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: POD_NODE
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName 
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP 
          - name: ACCEPT_EULA
            value: ${ACCEPT_EULA}
          image: localhost:5000/edb/edb-as:v10
          imagePullPolicy: IfNotPresent 
          readinessProbe:
            exec:
              command:
              - /var/lib/edb/testIsReady.sh
            initialDelaySeconds: 60
            timeoutSeconds: 5 
          livenessProbe:
            exec:
              command:
              - /var/lib/edb/testIsHealthy.sh
            initialDelaySeconds: 600 
            timeoutSeconds: 60 
          ports:
          - containerPort: ${PGPORT} 
          volumeMounts:
          - name: ${PERSISTENT_VOLUME}
            mountPath: /edbvolume
          - name: ${BACKUP_PERSISTENT_VOLUME}
            mountPath: /edbbackup
          - name: pg-initconf
            mountPath: /initconf
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        volumes:
        - name: ${PERSISTENT_VOLUME}
          persistentVolumeClaim:
            claimName: ${PERSISTENT_VOLUME_CLAIM}
        - name: ${BACKUP_PERSISTENT_VOLUME}
          persistentVolumeClaim:
            claimName: ${BACKUP_PERSISTENT_VOLUME_CLAIM}
        - name: pg-initconf
          configMap:
            name: postgres-map
    triggers:
    - type: ConfigChange
parameters:
- name: DATABASE_NAME
  displayName: Database Name
  description: Name of Postgres database (leave edb for default)
  value: 'edb'
- name: DATABASE_USER
  displayName: Default database user (leave enterprisedb for default)
  description: Default database user
  value: 'enterprisedb'
- name: REPL_USER
  displayName: Repl user
  description: repl database user
  value: 'repl'
- name: PGPORT
  displayName: Database Port
  description: Database Port (leave 5444 for default)
  value: "5444"
- name: LOCALEPARAMETER
  displayName: Locale
  description: Locale of database
  value: ''
- name: CLEANUP_SCHEDULE
  displayName: Host Cleanup Schedule
  description: Standard cron schedule - min (0 - 59), hour (0 - 23), day of month (1 - 31), month (1 - 12), day of week (0 - 6) (0 to 6 are Sunday to Saturday, or use names; 7 is Sunday, the same as 0). Leave it empty if you dont want to cleanup.
  value: '0:0:*:*:*'
- name: EFM_EMAIL
  displayName: Email
  description: Email for EFM
  value: 'none@none.com'
- name: NAMESERVER
  displayName: Name Server for Email
  description: Name Server for Email
  value: '8.8.8.8'
- name: PERSISTENT_VOLUME
  displayName: Persistent Volume
  description: Persistent volume name
  value: ''
  required: true
- name: PERSISTENT_VOLUME_CLAIM 
  displayName: Persistent Volume Claim
  description: Persistent volume claim name
  value: ''
  required: true
- name: BACKUP_PERSISTENT_VOLUME
  displayName: Backup Persistent Volume
  description: Backup Persistent volume name
  value: ''
  required: false
- name: BACKUP_PERSISTENT_VOLUME_CLAIM
  displayName: Backup Persistent Volume Claim
  description: Backup Persistent volume claim name
  value: ''
  required: false
- name: RESTORE_FILE
  displayName: Restore File
  description: Restore file location
  value: ''
- name: ACCEPT_EULA
  displayName: Accept end-user license agreement (leave 'Yes' for default)
  description: Indicates whether user accepts the end-user license agreement
  value: 'Yes'
  required: true

As the template starts with one replica I scaled that to three so finally the setup we start with for PEM is this (one master and two replicas, which is the minimum you need for automated failover anyway):

dwe@dwe:~$ oc get pods -o wide -L role
edb-as10-0-1-4ptdr   1/1       Running   0          7m        172.17.0.5   localhost   standbydb
edb-as10-0-1-8mw7m   1/1       Running   0          5m        172.17.0.6   localhost   standbydb
edb-as10-0-1-krzpp   1/1       Running   0          8m        172.17.0.9   localhost   masterdb
edb-pgpool-1-665mp   1/1       Running   0          8m        172.17.0.8   localhost   queryrouter
edb-pgpool-1-mhgnq   1/1       Running   0          8m        172.17.0.7   localhost   queryrouter

Nothing special happened so far except that we downloaded the new container images, pushed that to the local registry and adjusted the deployment yaml to reference the latest version of the containers. What we want to do now is to create the PEM repository container so that we can add the database to PEM which will give us monitoring and alerting. As PEM requires persistent storage as well we need a new storage definition:

You can of course also get the storage definition using the “oc” command:

dwe@dwe:~$ oc get pvc
NAME                STATUS    VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
edb-bart-claim      Bound     pv0091    100Gi      RWO,ROX,RWX                   16h
edb-pem-claim       Bound     pv0056    100Gi      RWO,ROX,RWX                   50s
edb-storage-claim   Bound     pv0037    100Gi      RWO,ROX,RWX                   16h

The yaml file for the PEM server is this one (notice that the container image referenced is coming from the local registry):

apiVersion: v1
kind: Template
metadata:
   name: edb-pemserver
   annotations:
    description: "Standard EDB Postgres Enterprise Manager Server 7.3 Deployment Config"
    tags: "pemserver"
    iconClass: "icon-postgresql"
objects:
- apiVersion: v1
  kind: Service
  metadata:
    name: ${DATABASE_NAME}-webservice 
    labels:
      name: ${DATABASE_NAME}-webservice
  spec:
    selector:
      role: pemserver 
    ports:
    - name: https
      port: 30443
      nodePort: 30443
      protocol: TCP
      targetPort: 8443
    - name: http
      port: 30080
      nodePort: 30080
      protocol: TCP
      targetPort: 8080
    type: NodePort
- apiVersion: v1
  kind: DeploymentConfig
  metadata:
    name: edb-pemserver
  spec:
    replicas: 1
    selector:
      app: pemserver 
    strategy:
      resources: {}
      rollingParams:
        intervalSeconds: 1
        maxSurge: 25%
        maxUnavailable: 25%
        timeoutSeconds: 600
        updatePeriodSeconds: 1
      type: Rolling
    template:
      metadata:
        creationTimestamp: null
        labels:
          app: pemserver 
          cluster: ${DATABASE_NAME} 
      spec:
        containers:
        - name: pem-db
          env:
          - name: DATABASE_NAME
            value: ${DATABASE_NAME} 
          - name: DATABASE_USER
            value: ${DATABASE_USER}
          - name: ENTERPRISEDB_PASSWORD
            value: "postgres"
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: POD_NODE
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
          - name: PGPORT
            value: ${PGPORT}
          - name: RESTORE_FILE
            value: ${RESTORE_FILE}
          - name: ENABLE_HA_MODE
            value: "No"
          - name: ACCEPT_EULA
            value: ${ACCEPT_EULA}
            image: localhost:5000/edb/edb-as:v10
          imagePullPolicy: Always 
          volumeMounts:
          - name: ${PERSISTENT_VOLUME}
            mountPath: /edbvolume
        - name: pem-webclient 
          image: localhost:5000/edb/edb-pemserver:v7.3
          imagePullPolicy: Always 
          env:
          - name: DATABASE_NAME 
            value: ${DATABASE_NAME} 
          - name: DATABASE_USER 
            value: ${DATABASE_USER} 
          - name: ENTERPRISEDB_PASSWORD
            value: "postgres"
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: POD_NODE
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName 
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP 
          - name: PGPORT
            value: ${PGPORT}
          - name: CIDR_ADDR
            value: ${CIDR_ADDR}
          - name: ACCEPT_EULA
            value: ${ACCEPT_EULA}
          - name: DEBUG_MODE
            value: ${DEBUG_MODE}
          ports:
          - containerPort: ${PGPORT} 
          volumeMounts:
          - name: ${PERSISTENT_VOLUME}
            mountPath: /edbvolume
          - name: httpd-shm
            mountPath: /run/httpd
        volumes:
        - name: ${PERSISTENT_VOLUME}
          persistentVolumeClaim:
            claimName: ${PERSISTENT_VOLUME_CLAIM}
        - name: httpd-shm 
          emptyDir:
            medium: Memory 
        dnsPolicy: ClusterFirst
        restartPolicy: Always
    triggers:
    - type: ConfigChange
parameters:
- name: DATABASE_NAME
  displayName: Database Name
  description: Name of Postgres database (leave edb for default)
  value: 'pem'
  required: true
- name: DATABASE_USER
  displayName: Default database user (leave enterprisedb for default)
  description: Default database user
  value: 'enterprisedb'
- name: PGPORT
  displayName: Database Port
  description: Database Port (leave 5444 for default)
  value: '5444'
  required: true
- name: PERSISTENT_VOLUME
  displayName: Persistent Volume
  description: Persistent volume name
  value: 'edb-data-pv'
  required: true
- name: PERSISTENT_VOLUME_CLAIM 
  displayName: Persistent Volume Claim
  description: Persistent volume claim name
  value: 'edb-data-pvc'
  required: true
- name: RESTORE_FILE
  displayName: Restore File
  description: Restore file location
  value: ''
- name: CIDR_ADDR 
  displayName: CIDR address block for PEM 
  description: CIDR address block for PEM (leave '0.0.0.0/0' for default) 
  value: '0.0.0.0/0' 
- name: ACCEPT_EULA
  displayName: Accept end-user license agreement (leave 'Yes' for default)
  description: Indicates whether user accepts the end-user license agreement
  value: 'Yes'
  required: true

Again, don’t process the template right now, just save it as a template:

Once we have that available we can start to deploy the PEM server from the catalog:

Of course we need to reference the storage definition we created above:

Leave everything else at its defaults and create the deployment:

A few minutes later you should have PEM ready:

For connecting to PEM with your browser have a look at the service definition to get the port:

Once you have that you can connect to PEM:

In the next post we’ll look at how we can add our existing database deployment to our just created PEM server so we can monitor the instances and configure alerting.

Cet article EDB containers for OpenShift 2.3 – PEM integration est apparu en premier sur Blog dbi services.

↧

Inheriting super user privileges over a role automatically in PostgreSQL

October 16, 2018, 7:18 am

≫ Next: pgconf.eu finally kicked off

≪ Previous: EDB containers for OpenShift 2.3 – PEM integration

In a recent project at a customer where we synchronize the users and group out of Active Directory we hit a little issue I was not aware of before. Suppose you have created a role in PostgreSQL, you made that role a superuser and then granted that role to another role. What happens when you login using the other role? Will you have the super user privileges by default? Sounds confusing, I know, so lets do a test.

To start with we create a simple role and make that role a super user:

postgres=# create role my_admin;
CREATE ROLE
postgres=# alter role my_admin superuser;
ALTER ROLE

Of course you could also do that in one step:

postgres=# create role my_admin superuser;
CREATE ROLE

As a second step lets create a new user that is a member of the admin group and inherits the permissions of that role automatically:

postgres=# create user my_dba login password 'admin' in role my_admin inherit;
CREATE ROLE
postgres=# \du
                                    List of roles
 Role name |                         Attributes                         | Member of  
-----------+------------------------------------------------------------+------------
 my_admin  | Superuser, Cannot login                                    | {}
 my_dba    |                                                            | {my_admin}
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

The questions now is: When we login using the my_dba user are we superuser automatically?

postgres@pgbox:/home/postgres/ [PGDEV] psql -X -U my_dba postgres
psql (12devel)
Type "help" for help.

postgres=> \du
                                    List of roles
 Role name |                         Attributes                         | Member of  
-----------+------------------------------------------------------------+------------
 my_admin  | Superuser, Cannot login                                    | {}
 my_dba    |                                                            | {my_admin}
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

postgres=> create database db1;
ERROR:  permission denied to create database
postgres=>

… and we are not. What we can do is:

postgres=> set role my_admin;
SET
postgres=# create database db1;
CREATE DATABASE

The reason for that is that some privileges are not inherited automatically and these are: LOGIN, SUPERUSER, CREATEDB, and CREATEROLE.

What you can do is put something like that into “.psqlrc”:

set role my_admin

… or do it like that:

postgres=# alter user my_dba set role my_admin;
ALTER ROLE

This will explicitly set the role with each login and the super user privileges will be there. When you have a bit more complicated scenario where roles are assigned based on patterns in the username you could do something like this and add it to .psqlrc as well (or put that into a file and then execute that file in .psqlrc):

DO $$
DECLARE
  lv_username pg_roles.rolname%TYPE := current_user;
BEGIN
  if ( substr(lv_username,1,2) = 'xx'
       and
       position ('yy' in lv_username) > 0
     )
  then
    execute 'set role my_admin';
  end if;
  perform 1;
END $$;

… or whatever checks you need to identify the correct user names. Hope that helps …

Cet article Inheriting super user privileges over a role automatically in PostgreSQL est apparu en premier sur Blog dbi services.

↧

pgconf.eu finally kicked off

October 24, 2018, 4:30 am

≫ Next: Some psql features you are maybe not aware of

≪ Previous: Inheriting super user privileges over a role automatically in PostgreSQL

So, finally it started: Magnus kicked off the 10th annual PostgreSQL Conference Europe this morning in Lisbon. With 450 attendees the conference is even bigger this year than it was last year in Warsaw and it will probably be even bigger next year. One can really feel the increasing interest in PostgreSQL in Europe (and probably around the world as well). Even Tom Lane is attending this year.

Conferences are not only about technical content, social events are important as well. You can meet people, have great discussion, enjoy local food and drinks. And that is exactly what we did yesterday evening when the Swiss PostgreSQL community came together for lunch:

Conferences are not only about fun, sometimes you have to work on your queue. Working at conferences on the other side gives you the possibility to chose nice working places:

… and of course you have to work hard on preparing the booth:

But once you’ve done all that you are ready for the conference:

… and then the mess starts: There is such an impressive line up of speakers, where do you go? Not an easy choice and you will obviously miss one or the other session. But hey, that’s the PostgreSQL community: Everybody is open for questions and discussions, just jump in.

One of the benefits of sponsoring is that you get a big thank you when the conference starts and that you can have your logo on the official t-shirt:

And that brings us to the final thoughts of this post: Why are we doing that? The answer is quite simple: Without sponsoring, organizing such a big community event is impossible. As you know PostgreSQL is a pure community project so it depends on the community not only on the technical but also on the financial level. When you make money with community projects you should give something back and sponsoring is one way of doing that.

Finally, we are committed to open source technologies. You can see that e.g. in the events we are organizing, on our blog and events such as this one. Three days of great content, great discussion and fun ahead.

Cet article pgconf.eu finally kicked off est apparu en premier sur Blog dbi services.

↧

Some psql features you are maybe not aware of

October 25, 2018, 2:36 am

≫ Next: Deep dive Postgres at the #pgconfeu conference

≪ Previous: pgconf.eu finally kicked off

It is the time of The 10th Annual PostgreSQL Conference Europe, so this is the perfect time to blog about some tips and tricks around psql you’ll love. psql is such a powerful tool that you really should use it every day. It saves you so much work and is packed with features that makes your life so much easier. In this post we’ll look at some features you maybe didn’t know before.

Lets start with something very simple: You probably know the “\l” shortcut to display all the databases:

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres

Did you know you also can pass the shortcuts from your shell directly into psql?

postgres@pgbox:/home/postgres/ [PGDEV] psql -c '\l' postgres
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres

But there is even a faster way for retrieving that information:

postgres@pgbox:/home/postgres/ [PGDEV] psql -l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres

Did you know you can log the complete psql session to a logfile?

postgres@pgbox:/home/postgres/ [PGDEV] psql -X -L /var/tmp/log postgres
psql (12devel)
Type "help" for help.

postgres=# select 1;
 ?column? 
----------
        1
(1 row)

postgres=# select 2;
 ?column? 
----------
        2
(1 row)

postgres=# \! cat /var/tmp/log
********* QUERY **********
select 1;
**************************

 ?column? 
----------
        1
(1 row)

********* QUERY **********
select 2;
**************************

postgres=#

You probably know that copy is the fastest way to get data into and out of PostgreSQL. Did you know you can copy from a program?

postgres=# create table lottery ( draw_date date, winning_numbers text, mega_ball integer, multiplier integer );
CREATE TABLE
postgres=# copy lottery from 
                program 'curl https://data.ny.gov/api/views/5xaw-6ayf/rows.csv?accessType=DOWNLOAD' 
                with (header true, delimiter ',', format csv);
COPY 1713
postgres=# select * from lottery limit 5;
 draw_date  | winning_numbers | mega_ball | multiplier 
------------+-----------------+-----------+------------
 2002-05-17 | 15 18 25 33 47  |        30 |           
 2002-05-21 | 04 28 39 41 44  |         9 |           
 2002-05-24 | 02 04 32 44 52  |        36 |           
 2002-05-28 | 06 21 22 29 32  |        24 |           
 2002-05-31 | 12 28 45 46 52  |        47 |           
(5 rows)

That basically means, whatever “program” you use: As long as the result is something psql understands you can use it.

How often do you dynamically build SQL statements you want to execute right after? There is a quite effective solution for that in psql:

postgres=# select 'create table t'||i||'( a int )' from generate_series(1,10) i; \gexec
         ?column?          
---------------------------
 create table t1( a int )
 create table t2( a int )
 create table t3( a int )
 create table t4( a int )
 create table t5( a int )
 create table t6( a int )
 create table t7( a int )
 create table t8( a int )
 create table t9( a int )
 create table t10( a int )
(10 rows)

CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE

Did you know you can store the result of a query into a variable and use that later in other statements?

postgres=# select 3 as var; \gset
 var 
-----
   3
(1 row)

postgres=# \echo :var
3
postgres=# select * from lottery where multiplier = :var;
 draw_date  | winning_numbers | mega_ball | multiplier 
------------+-----------------+-----------+------------
 2011-02-18 | 05 06 07 30 45  |        42 |          3
 2011-03-01 | 01 12 19 20 47  |        25 |          3
 2011-04-01 | 13 14 35 36 53  |        19 |          3
 2011-04-08 | 06 40 45 50 56  |        11 |          3
 2011-04-15 | 22 23 33 39 48  |        29 |          3
 2011-04-22 | 03 18 46 51 53  |        17 |          3
 2011-04-26 | 19 29 32 38 55  |        15 |          3
 2011-05-06 | 06 18 26 37 41  |         9 |          3
 2011-05-24 | 09 12 21 42 43  |        42 |          3
 2011-05-31 | 28 30 31 37 55  |        13 |          3
 2011-06-03 | 20 23 41 49 53  |        31 |          3
 2011-06-10 | 18 21 27 37 38  |         7 |          3
...

The last one for today is one of my favorites: As with the Linux watch command you can watch in psql:

postgres=# select now();
              now              
-------------------------------
 2018-10-23 21:57:17.298083+02
(1 row)

postgres=# \watch
Tue 23 Oct 2018 09:57:19 PM CEST (every 2s)

              now              
-------------------------------
 2018-10-23 21:57:19.277413+02
(1 row)

Tue 23 Oct 2018 09:57:21 PM CEST (every 2s)

              now              
-------------------------------
 2018-10-23 21:57:21.364605+02
(1 row)

Btw: You can see that the PostgreSQL Conference Europe is a technical conference when you take a look at the exhibition area during the sessions: Almost empty

Cet article Some psql features you are maybe not aware of est apparu en premier sur Blog dbi services.

↧

Deep dive Postgres at the #pgconfeu conference

October 26, 2018, 1:37 am

≫ Next: Some more zheap testing

≪ Previous: Some psql features you are maybe not aware of

Today I followed many good technical sessions at the European Postgres conference. The Postgres conferences are really technical oriented, you will find no marketing sessions there and you learn a lot of things.
As promised yesterday, I wrote today my first blog about the new Postgres storage engine ZHEAP/UNDO, which is a very interesting feature, with very interesting results.

Before you continue to read this blog, if you didn’t read my blog from yesterday,read it first :-) link

First test : table creation

We create 2 tables, one with the default Postgres storage engine HEAP, and one with the new storage enfine ZHEAP.

PSQL> create table heap2 as select a.*, md5(a::varchar), now() from generate_series(1,5000000) a;
 
SELECT 5000000
Time: 12819.369 ms (00:12.819)

PSQL> create table zheap2  with (storage_engine='zheap') as select a.*, md5(a::varchar), now() from generate_series(1,5000000) a;
SELECT 5000000
Time: 19155.004 ms (00:19.155)

You noticed, that with Postgres you can choose your storage engine at table level :-). The table creation with ZHEAP is slower, but is is normal because now we have to create the UNDO segment also.

Second test : Size of the both tables

Before to start the tests we will check the size of the HEAP and ZHEAP tables, as announced yesterday the HEAP table should be smaller, because we have less header information.

PSQL>  select pg_size_pretty(pg_relation_size('heap2'));
 pg_size_pretty 
----------------
 365 MB
PSQL> select pg_size_pretty(pg_relation_size('zheap2'));
 pg_size_pretty 
----------------
 289 MB

The ZHEAP tables is smaller, it exactly what Amit explain us yesterday, because the block header with ZHEAP is smaller. If you want to learn more read his presentation from yesterday. Again the link is on my blog from yesterday.

Third test : Update on the table

To get the bloat effect on the HEAP table, we will now update the full table and see what happen.

PSQL> update heap2 set a=a+12222222;
UPDATE 5000000
Time: 19834.911 ms (00:19.835)

PSQL> update zheap2 set a=a+12222222;
UPDATE 5000000
Time: 26956.043 ms (00:26.956)

PSQL> select pg_size_pretty(pg_relation_size('zheap2'));
 pg_size_pretty 
----------------
 289 MB
PSQL> vacuum heap2;
PSQL> select pg_size_pretty(pg_relation_size('heap2'));
 pg_size_pretty 
----------------
 730 MB

The same as for the creation the update is a bit longer, but the update with ZHEAP write many information into the log file.We should test again this update with disabling the writing of information into the log file about creating undo segment.
But as you can see, the most important information here is that the table don’t bloat as the HEAP table, now the HEAP table is 2 times bigger despite I executed a VACUUM.

Fourth test: test of the ROLLBACK

To test the ROLLBACK we have to open first a transaction with BEGIN;

PSQL>  begin;
BEGIN
PSQL>* update heap2 set a=a+12222222;
UPDATE 5000000
Time: 22071.462 ms (00:22.071)
PSQL> * rollback;
ROLLBACK
Time: 1.437 ms

PSQL> begin;
BEGIN
PSQL> * update zheap2 set a=a+12222222;
UPDATE 5000000
Time: 28210.845 ms (00:28.211)
PSQL> * rollback;
ROLLBACK
Time: 0.567 ms

This is the part where I’m the most surprised, the ROLLBACK for ZHEAP is so fast as for HEAP, I can’t explain that. I will leave my colleague Daniel Westermann making deeper tests :-). Because with ZHEAP he has to apply the undo blocks, where HEAP tables only mark the transactions as aborted.

Fifth tests : Check of the query performances

For this test we have to first flush the filesystem cache and to restart the database, to be sure that nothing is cached.

postgres@dbi-pg-tun:/home/postgres/zheap/ [ZHEAP] pgstop 
waiting for server to shut down.... done
server stopped

postgres@dbi-pg-tun:/home/postgres/ [ZHEAP] sudo sync
postgres@dbi-pg-tun:/home/postgres/ [ZHEAP] sudo echo 3 > /proc/sys/vm/drop_caches

postgres@dbi-pg-tun:/home/postgres/zheap/ [ZHEAP] pgstart
waiting for server to start.... done
server started

Now we are ready for the last test

postgres@dbi-pg-tun:/home/postgres/zheap/ [ZHEAP] sqh
PSQL> select count(*) from heap2;
  count  
---------
 5000000
Time: 3444.869 ms (00:03.445)

PSQL> select count(*) from zheap2;
  count  
---------
 5000000
Time: 593.894 ms

As you can see the query performance are improved significantly for full table scan :-), because the table didn’t bloat as for the HEAP table. For you information I started additionally 2 times a full update before to restart the database and the HEAP table is now 3 times bigger.

PSQL> select pg_size_pretty(pg_relation_size('heap2'));
 pg_size_pretty 
----------------
 1095 MB

Time: 0.508 ms
PSQL> select pg_size_pretty(pg_relation_size('zheap2'));
 pg_size_pretty 
----------------
 289 MB

Conclusion of these tests

Postgres allow the usage or not of UNDO’s at the table level
We are surprise how fast the ROLLBACK are, but this must be tested again, I don’t understand why
Select performance are improved significantly for full table scan
The storage will not bloat anymore with ZHEAP
Finally only the updates are a little bit slower

It will be interesting to follow the discussions around this feature on the mailing list.

Cet article Deep dive Postgres at the #pgconfeu conference est apparu en premier sur Blog dbi services.

↧

Some more zheap testing

November 3, 2018, 6:59 am

≫ Next: An index only scan in PostgreSQL is not always index only

≪ Previous: Deep dive Postgres at the #pgconfeu conference

Hervé already did some tests with zheap and documented his results yesterday. After some more discussions with Amit who did the session about zHeap at the conference here in Lisbon (you can find the slides here). I thought it might be a good idea to do some more testing on that probably upcoming feature. Lets go.

If you want to test it for your own, here is a simple script that clones the repository, compiles and installs from source and then start the PostgreSQL instance:

postgres@pgbox:/home/postgres/ [ZHEAP] cat refresh_zheap.sh 
#!/bin/bash

rm -rf zheap
git clone https://github.com/EnterpriseDB/zheap
cd zheap
PGHOME=/u01/app/postgres/product/zheap/db_1/
SEGSIZE=2
BLOCKSIZE=8
./configure --prefix=${PGHOME} \
            --exec-prefix=${PGHOME} \
            --bindir=${PGHOME}/bin \
            --libdir=${PGHOME}/lib \
            --sysconfdir=${PGHOME}/etc \
            --includedir=${PGHOME}/include \
            --datarootdir=${PGHOME}/share \
            --datadir=${PGHOME}/share \
            --with-pgport=5432 \
            --with-perl \
            --with-python \
            --with-openssl \
            --with-pam \
            --with-ldap \
            --with-libxml \
            --with-libxslt \
            --with-segsize=${SEGSIZE} \
            --with-blocksize=${BLOCKSIZE} \
	    --with-systemd
make all
make install
cd contrib
make install
rm -rf /u02/pgdata/zheap
/u01/app/postgres/product/zheap/db_1/bin/initdb -D /u02/pgdata/zheap
pg_ctl -D /u02/pgdata/zheap start
psql -c "alter system set logging_collector='on'" postgres
psql -c "alter system set log_truncate_on_rotation='on'" postgres
psql -c "alter system set log_filename='postgresql-%a.log'" postgres
psql -c "alter system set log_line_prefix='%m - %l - %p - %h - %u@%d '" postgres
psql -c "alter system set log_directory='pg_log'" postgres
pg_ctl -D /u02/pgdata/zheap restart -m fast

First of all, when you startup PostgreSQL you’ll get two new background worker processes:

postgres@pgbox:/home/postgres/ [ZHEAP] ps -ef | egrep "discard|undo"
postgres  1483  1475  0 14:40 ?        00:00:00 postgres: discard worker   
postgres  1484  1475  0 14:40 ?        00:00:01 postgres: undo worker launcher   
postgres  1566  1070  0 14:51 pts/0    00:00:00 grep -E --color=auto discard|undo

The “discard worker” is responsible for getting rid of all the undo segments that are not required anymore and the “undo worker launcher” obviously is responsible for launching undo worker processes for doing the rollbacks.

There is a new parameter which controls the default storage engine (at least the parameter is there as of now, maybe that will change in the future), so lets change that to zheap before we populate a sample database (“heap” is the default value):

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "alter system set storage_engine='zheap'" postgres
ALTER SYSTEM
Time: 12.722 ms
postgres@pgbox:/home/postgres/ [ZHEAP] pg_ctl -D $PGDATA restart -m fast
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "show storage_engine" postgres
 storage_engine 
----------------
 zheap
(1 row)

Lets use pgbench to create the sample data:

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "create database zheap" postgres
CREATE DATABASE
Time: 763.284 ms
postgres@pgbox:/home/postgres/ [ZHEAP] time pgbench -i -s 100 zheap
...
done.

real	0m23.375s
user	0m2.293s
sys	0m0.772s

That should have created the tables using the zheap storage engine:

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "\d+ pgbench_accounts" zheap
                                  Table "public.pgbench_accounts"
  Column  |     Type      | Collation | Nullable | Default | Storage  | Stats target | Description 
----------+---------------+-----------+----------+---------+----------+--------------+-------------
 aid      | integer       |           | not null |         | plain    |              | 
 bid      | integer       |           |          |         | plain    |              | 
 abalance | integer       |           |          |         | plain    |              | 
 filler   | character(84) |           |          |         | extended |              | 
Indexes:
    "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Options: storage_engine=zheap, fillfactor=100

When we do the same using the “heap” storage format how long does that take?:

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "alter system set storage_engine='heap'" postgres
ALTER SYSTEM
Time: 8.790 ms
postgres@pgbox:/home/postgres/ [ZHEAP] pg_ctl -D $PGDATA restart -m fast
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "create database heap" postgres
CREATE DATABASE
Time: 889.847 ms
postgres@pgbox:/home/postgres/ [ZHEAP] time pgbench -i -s 100 heap
...

real	0m30.471s
user	0m2.355s
sys	0m0.419s
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "\d+ pgbench_accounts" heap
                                  Table "public.pgbench_accounts"
  Column  |     Type      | Collation | Nullable | Default | Storage  | Stats target | Description 
----------+---------------+-----------+----------+---------+----------+--------------+-------------
 aid      | integer       |           | not null |         | plain    |              | 
 bid      | integer       |           |          |         | plain    |              | 
 abalance | integer       |           |          |         | plain    |              | 
 filler   | character(84) |           |          |         | extended |              | 
Indexes:
    "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Options: fillfactor=100

postgres@pgbox:/home/postgres/ [ZHEAP]

I ran that test several times but the difference of about 5 to 6 seconds is consistent. zheap is faster here, but that is coming from vacuum. When you run the same test again but skip the vacuum ( the “-n” option of pgbench) at the end, heap is faster:

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "create database heap" postgres
CREATE DATABASE
Time: 562.155 ms
postgres@pgbox:/home/postgres/ [ZHEAP] time pgbench -i -n -s 100 heap
done.

real	0m21.650s
user	0m2.316s
sys	0m0.225s

But anyway: As zheap has to create undo segments more needs to go to disk initially. heap needs to run vacuum, not immediately but for sure some time later. When you compare a pure insert only workload, without vacuum, heap is faster. The great thing is, that you can decide what you want to use on the table level. Some tables might be better created with the zheap storage engine, others may be better created with heap. The important bit is that you have full control.

Hervé already compared the size of his tables in the last post. Do we see the same here when we compare the size of the entire databases?

postgres@pgbox:/home/postgres/ [ZHEAP] vacuumdb heap
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "\l+" postgres
                                                                   List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   |  Size   | Tablespace |                Description         
-----------+----------+----------+------------+------------+-----------------------+---------+------------+------------------------------------
 heap      | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 1503 MB | pg_default | 
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 7867 kB | pg_default | default administrative connection d
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +| 7721 kB | pg_default | unmodifiable empty database
           |          |          |            |            | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +| 7721 kB | pg_default | default template for new databases
           |          |          |            |            | postgres=CTc/postgres |         |            | 
 zheap     | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 1250 MB | pg_default | 
(5 rows)

Yes, heap is 253MB smaller. That difference should even get bigger once we populate the “filler” column of the pgbench_accounts table, which is currently NULL:

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "update pgbench_accounts set filler = 'aaaaaa'" zheap
UPDATE 10000000
Time: 55768.488 ms (00:55.768)
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "update pgbench_accounts set filler = 'aaaaaa'" heap
UPDATE 10000000
Time: 52598.782 ms (00:52.599)
postgres@pgbox:/home/postgres/ [ZHEAP] vacuumdb heap
vacuumdb: vacuuming database "heap"
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "\l+" postgres
                                                                   List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   |  Size   | Tablespace |                Description              
-----------+----------+----------+------------+------------+-----------------------+---------+------------+-----------------------------------------
 heap      | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 3213 MB | pg_default | 
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 7867 kB | pg_default | default administrative connection databa
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +| 7721 kB | pg_default | unmodifiable empty database
           |          |          |            |            | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +| 7721 kB | pg_default | default template for new databases
           |          |          |            |            | postgres=CTc/postgres |         |            | 
 zheap     | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 1250 MB | pg_default |

As expected and consistent with what Herve has seen in his tests. The update against the heap table was a bit faster (around 3 seconds) but again: zheap hast to create undo segments and that causes additional writes on disk. Three seconds against a 10 million row table is not that huge, by the way, and how often do you update the complete table?

Now lets run a standard pgbench workload against these database and check what we can see there. For the zheap database with 1 connection for 60 seconds this is the best result I got after ten runs:

postgres@pgbox:/home/postgres/ [ZHEAP] pgbench -c 1 -T 60 zheap
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 29265
latency average = 2.050 ms
tps = 487.726916 (including connections establishing)
tps = 487.786025 (excluding connections establishing)

The same against the heap:

postgres@pgbox:/home/postgres/ [ZHEAP] pgbench -c 1 -T 60 heap
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 24992
latency average = 2.401 ms
tps = 416.485499 (including connections establishing)
tps = 416.516805 (excluding connections establishing)

The numbers changed a bit for every execution but always zheap was better than heap (Be aware that I am on little VM here), so at least there is no regression in performance but rather an improvement for this workload.

For the select only workload (the “-S” option) this is the best result for heap:

postgres@pgbox:/home/postgres/ [ZHEAP] for i in {1..10}; do pgbench -c 1 -S -T 60 heap; done
...
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 64954
latency average = 0.924 ms
tps = 1082.514439 (including connections establishing)
tps = 1082.578288 (excluding connections establishing)
...

And this is the best result for zheap:

postgres@pgbox:/home/postgres/ [ZHEAP] for i in {1..10}; do pgbench -c 1 -S -T 60 zheap; done
...
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 109023
latency average = 0.550 ms
tps = 1816.787280 (including connections establishing)
tps = 1817.485717 (excluding connections establishing)
...

With this workload the difference is even more clear: zheap clearly wins.

As noted before: all these test have been done locally on a little VM, so be careful with these number. We should have access to a great storage system with some good servers soon and once we have that I’ll do some more tests and publish the results.

For now it is somehow clear that zheap is an improvement for several types of workloads while heap still is better for others. In the next post I’ll try to do some tests to help the developers, meaning: Can we break it?

Cet article Some more zheap testing est apparu en premier sur Blog dbi services.

↧