Quantcast
Channel: Archives des PostgreSQL - dbi Blog
Viewing all 526 articles
Browse latest View live

Displaying the contents of a PostgreSQL data file with pg_filedump

$
0
0

Did you ever wonder what exactly is in a PostgreSQL data file? Usually you don’t care, I agree. But there might be situations where knowing how you can do this might be a great help. Maybe your file is corrupted and you want to recover as much data as possible? Maybe you just want to do some research. There is a utility called pg_filedump which makes this pretty easy. Lets go …

Before you try to install pg_filedump you’ll need to make sure that all the header files are there in your PostgreSQL installation. Once you have that the installation is as simple as:

postgres@pgbox:/home/postgres/ [PG10] tar -axf pg_filedump-REL_10_0-c0e4028.tar.gz 
postgres@pgbox:/home/postgres/ [PG10] cd pg_filedump-REL_10_0-c0e4028
postgres@pgbox:/home/postgres/pg_filedump-REL_10_0-c0e4028/ [PG10] make
postgres@pgbox:/home/postgres/pg_filedump-REL_10_0-c0e4028/ [PG10] make install

If everything went fine the utility should be there:

postgres@pgbox:/u02/pgdata/PG10/ [PG10] pg_filedump -h

Version 10.0 (for PostgreSQL 10.x)
Copyright (c) 2002-2010 Red Hat, Inc.
Copyright (c) 2011-2017, PostgreSQL Global Development Group

Usage: pg_filedump [-abcdfhikxy] [-R startblock [endblock]] [-D attrlist] [-S blocksize] [-s segsize] [-n segnumber] file

Display formatted contents of a PostgreSQL heap/index/control file
Defaults are: relative addressing, range of the entire file, block
               size as listed on block 0 in the file

The following options are valid for heap and index files:
  -a  Display absolute addresses when formatting (Block header
      information is always block relative)
  -b  Display binary block images within a range (Option will turn
      off all formatting options)
  -d  Display formatted block content dump (Option will turn off
      all other formatting options)
  -D  Decode tuples using given comma separated list of types
      Supported types:
        bigint bigserial bool char charN date float float4 float8 int
        json macaddr name oid real serial smallint smallserial text
        time timestamp timetz uuid varchar varcharN xid xml
      ~ ignores all attributes left in a tuple
  -f  Display formatted block content dump along with interpretation
  -h  Display this information
  -i  Display interpreted item details
  -k  Verify block checksums
  -R  Display specific block ranges within the file (Blocks are
      indexed from 0)
        [startblock]: block to start at
        [endblock]: block to end at
      A startblock without an endblock will format the single block
  -s  Force segment size to [segsize]
  -n  Force segment number to [segnumber]
  -S  Force block size to [blocksize]
  -x  Force interpreted formatting of block items as index items
  -y  Force interpreted formatting of block items as heap items

The following options are valid for control files:
  -c  Interpret the file listed as a control file
  -f  Display formatted content dump along with interpretation
  -S  Force block size to [blocksize]

Report bugs to 

As we want to dump a file we obviously need a table with some data, so:

postgres=# create table t1 ( a int, b varchar(50));
CREATE TABLE
postgres=# insert into t1 (a,b) select a, md5(a::varchar) from generate_series(1,10) a;
INSERT 0 10

Get the name of the file:

postgres=# select * from pg_relation_filenode('t1');
 pg_relation_filenode 
----------------------
                24702
(1 row)

Look it up in PGDATA:

postgres@pgbox:/home/postgres/ [PG10] cd $PGDATA
postgres@pgbox:/u02/pgdata/PG10/ [PG10] find . -name 24702
./base/13212/24702

… and dump it:

postgres@pgbox:/u02/pgdata/PG10/ [PG10] pg_filedump ./base/13212/24702

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 10.0
*
* File: ./base/13212/24702
* Options used: None
*
* Dump created on: Wed Nov  8 10:39:33 2017
*******************************************************************
Error: Unable to read full page header from block 0.
  ===> Read 0 bytes

Hm, nothing in there. Why? The reasons is easy: The data is there in PostgreSQL but it is only WAL logged at the moment and not yet in the datafile as no checkpoint happened (in this case):

postgres=#  checkpoint;
CHECKPOINT
Time: 100.567 ms

Do it again:

postgres@pgbox:/u02/pgdata/PG10/ [PG10] pg_filedump ./base/13212/24702

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 10.0
*
* File: ./base/13212/24702
* Options used: None
*
* Dump created on: Wed Nov  8 10:40:45 2017
*******************************************************************

Block    0 ********************************************************
----- Block Offset: 0x00000000 Offsets: Lower 64 (0x0040) Block: Size 8192 Version 4 Upper 7552 (0x1d80) LSN: logid 0 recoff 0x478b2c48 Special 8192 (0x2000) Items: 10 Free Space: 7488 Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 () Length (including item array): 64 ------ Item 1 -- Length: 61 Offset: 8128 (0x1fc0) Flags: NORMAL Item 2 -- Length: 61 Offset: 8064 (0x1f80) Flags: NORMAL Item 3 -- Length: 61 Offset: 8000 (0x1f40) Flags: NORMAL Item 4 -- Length: 61 Offset: 7936 (0x1f00) Flags: NORMAL Item 5 -- Length: 61 Offset: 7872 (0x1ec0) Flags: NORMAL Item 6 -- Length: 61 Offset: 7808 (0x1e80) Flags: NORMAL Item 7 -- Length: 61 Offset: 7744 (0x1e40) Flags: NORMAL Item 8 -- Length: 61 Offset: 7680 (0x1e00) Flags: NORMAL Item 9 -- Length: 61 Offset: 7616 (0x1dc0) Flags: NORMAL Item 10 -- Length: 61 Offset: 7552 (0x1d80) Flags: NORMAL *** End of File Encountered. Last Block Read: 0 ***

Here we go. What can we learn from that output. This is not really human readable but at least we see that there are ten rows. We can also list the actual contents of the rows:

postgres@pgbox:/u02/pgdata/PG10/ [PG10] pg_filedump -f ./base/13212/24702

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 10.0
*
* File: ./base/13212/24702
* Options used: -f 
*
* Dump created on: Wed Nov  8 10:41:21 2017
*******************************************************************

Block    0 ********************************************************
----- Block Offset: 0x00000000 Offsets: Lower 64 (0x0040) Block: Size 8192 Version 4 Upper 7552 (0x1d80) LSN: logid 0 recoff 0x478b2c48 Special 8192 (0x2000) Items: 10 Free Space: 7488 Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 () Length (including item array): 64 0000: 00000000 482c8b47 00000000 4000801d ....H,.G....@... 0010: 00200420 00000000 c09f7a00 809f7a00 . . ......z...z. 0020: 409f7a00 009f7a00 c09e7a00 809e7a00 @.z...z...z...z. 0030: 409e7a00 009e7a00 c09d7a00 809d7a00 @.z...z...z...z. ------ Item 1 -- Length: 61 Offset: 8128 (0x1fc0) Flags: NORMAL 1fc0: 96020000 00000000 00000000 00000000 ................ 1fd0: 01000200 02081800 01000000 43633463 ............Cc4c 1fe0: 61343233 38613062 39323338 32306463 a4238a0b923820dc 1ff0: 63353039 61366637 35383439 62 c509a6f75849b Item 2 -- Length: 61 Offset: 8064 (0x1f80) Flags: NORMAL 1f80: 96020000 00000000 00000000 00000000 ................ 1f90: 02000200 02081800 02000000 43633831 ............Cc81 1fa0: 65373238 64396434 63326636 33366630 e728d9d4c2f636f0 1fb0: 36376638 39636331 34383632 63 67f89cc14862c Item 3 -- Length: 61 Offset: 8000 (0x1f40) Flags: NORMAL 1f40: 96020000 00000000 00000000 00000000 ................ 1f50: 03000200 02081800 03000000 43656363 ............Cecc 1f60: 62633837 65346235 63653266 65323833 bc87e4b5ce2fe283 1f70: 30386664 39663261 37626166 33 08fd9f2a7baf3 Item 4 -- Length: 61 Offset: 7936 (0x1f00) Flags: NORMAL 1f00: 96020000 00000000 00000000 00000000 ................ 1f10: 04000200 02081800 04000000 43613837 ............Ca87 1f20: 66663637 39613266 33653731 64393138 ff679a2f3e71d918 1f30: 31613637 62373534 32313232 63 1a67b7542122c Item 5 -- Length: 61 Offset: 7872 (0x1ec0) Flags: NORMAL 1ec0: 96020000 00000000 00000000 00000000 ................ 1ed0: 05000200 02081800 05000000 43653464 ............Ce4d 1ee0: 61336237 66626263 65323334 35643737 a3b7fbbce2345d77 1ef0: 37326230 36373461 33313864 35 72b0674a318d5 Item 6 -- Length: 61 Offset: 7808 (0x1e80) Flags: NORMAL 1e80: 96020000 00000000 00000000 00000000 ................ 1e90: 06000200 02081800 06000000 43313637 ............C167 1ea0: 39303931 63356138 38306661 66366662 9091c5a880faf6fb 1eb0: 35653630 38376562 31623264 63 5e6087eb1b2dc Item 7 -- Length: 61 Offset: 7744 (0x1e40) Flags: NORMAL 1e40: 96020000 00000000 00000000 00000000 ................ 1e50: 07000200 02081800 07000000 43386631 ............C8f1 1e60: 34653435 66636565 61313637 61356133 4e45fceea167a5a3 1e70: 36646564 64346265 61323534 33 6dedd4bea2543 Item 8 -- Length: 61 Offset: 7680 (0x1e00) Flags: NORMAL 1e00: 96020000 00000000 00000000 00000000 ................ 1e10: 08000200 02081800 08000000 43633966 ............Cc9f 1e20: 30663839 35666239 38616239 31353966 0f895fb98ab9159f 1e30: 35316664 30323937 65323336 64 51fd0297e236d Item 9 -- Length: 61 Offset: 7616 (0x1dc0) Flags: NORMAL 1dc0: 96020000 00000000 00000000 00000000 ................ 1dd0: 09000200 02081800 09000000 43343563 ............C45c 1de0: 34386363 65326532 64376662 64656131 48cce2e2d7fbdea1 1df0: 61666335 31633763 36616432 36 afc51c7c6ad26 Item 10 -- Length: 61 Offset: 7552 (0x1d80) Flags: NORMAL 1d80: 96020000 00000000 00000000 00000000 ................ 1d90: 0a000200 02081800 0a000000 43643364 ............Cd3d 1da0: 39343436 38303261 34343235 39373535 9446802a44259755 1db0: 64333865 36643136 33653832 30 d38e6d163e820 *** End of File Encountered. Last Block Read: 0 ***

But this does not help much either. When you want to see the contents in human readable format use the “-D” switch and provide the list of data types you want to decode:

postgres@pgbox:/u02/pgdata/PG10/ [PG10] pg_filedump -D int,varchar ./base/13212/24702

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 10.0
*
* File: ./base/13212/24702
* Options used: -D int,varchar 
*
* Dump created on: Wed Nov  8 10:42:58 2017
*******************************************************************

Block    0 ********************************************************
----- Block Offset: 0x00000000 Offsets: Lower 64 (0x0040) Block: Size 8192 Version 4 Upper 7552 (0x1d80) LSN: logid 0 recoff 0x478b2c48 Special 8192 (0x2000) Items: 10 Free Space: 7488 Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 () Length (including item array): 64 ------ Item 1 -- Length: 61 Offset: 8128 (0x1fc0) Flags: NORMAL COPY: 1 c4ca4238a0b923820dcc509a6f75849b Item 2 -- Length: 61 Offset: 8064 (0x1f80) Flags: NORMAL COPY: 2 c81e728d9d4c2f636f067f89cc14862c Item 3 -- Length: 61 Offset: 8000 (0x1f40) Flags: NORMAL COPY: 3 eccbc87e4b5ce2fe28308fd9f2a7baf3 Item 4 -- Length: 61 Offset: 7936 (0x1f00) Flags: NORMAL COPY: 4 a87ff679a2f3e71d9181a67b7542122c Item 5 -- Length: 61 Offset: 7872 (0x1ec0) Flags: NORMAL COPY: 5 e4da3b7fbbce2345d7772b0674a318d5 Item 6 -- Length: 61 Offset: 7808 (0x1e80) Flags: NORMAL COPY: 6 1679091c5a880faf6fb5e6087eb1b2dc Item 7 -- Length: 61 Offset: 7744 (0x1e40) Flags: NORMAL COPY: 7 8f14e45fceea167a5a36dedd4bea2543 Item 8 -- Length: 61 Offset: 7680 (0x1e00) Flags: NORMAL COPY: 8 c9f0f895fb98ab9159f51fd0297e236d Item 9 -- Length: 61 Offset: 7616 (0x1dc0) Flags: NORMAL COPY: 9 45c48cce2e2d7fbdea1afc51c7c6ad26 Item 10 -- Length: 61 Offset: 7552 (0x1d80) Flags: NORMAL COPY: 10 d3d9446802a44259755d38e6d163e820

And now we can see it. This is the same data as if you’d do a select on the table:

postgres=# select * from  t1;
 a  |                b                 
----+----------------------------------
  1 | c4ca4238a0b923820dcc509a6f75849b
  2 | c81e728d9d4c2f636f067f89cc14862c
  3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
  4 | a87ff679a2f3e71d9181a67b7542122c
  5 | e4da3b7fbbce2345d7772b0674a318d5
  6 | 1679091c5a880faf6fb5e6087eb1b2dc
  7 | 8f14e45fceea167a5a36dedd4bea2543
  8 | c9f0f895fb98ab9159f51fd0297e236d
  9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
 10 | d3d9446802a44259755d38e6d163e820
(10 rows)

What happens when we do an update?:

postgres=# update t1 set b = 'a' where a = 4;
UPDATE 1
postgres=# checkpoint ;
CHECKPOINT

How does it look like in the file?

postgres@pgbox:/u02/pgdata/PG10/ [PG10] pg_filedump -D int,varchar ./base/13212/24702

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 10.0
*
* File: ./base/13212/24702
* Options used: -D int,varchar 
*
* Dump created on: Wed Nov  8 11:12:35 2017
*******************************************************************

Block    0 ********************************************************
----- Block Offset: 0x00000000 Offsets: Lower 68 (0x0044) Block: Size 8192 Version 4 Upper 7520 (0x1d60) LSN: logid 0 recoff 0x478c2998 Special 8192 (0x2000) Items: 11 Free Space: 7452 Checksum: 0x0000 Prune XID: 0x00000298 Flags: 0x0000 () Length (including item array): 68 ------ Item 1 -- Length: 61 Offset: 8128 (0x1fc0) Flags: NORMAL COPY: 1 c4ca4238a0b923820dcc509a6f75849b Item 2 -- Length: 61 Offset: 8064 (0x1f80) Flags: NORMAL COPY: 2 c81e728d9d4c2f636f067f89cc14862c Item 3 -- Length: 61 Offset: 8000 (0x1f40) Flags: NORMAL COPY: 3 eccbc87e4b5ce2fe28308fd9f2a7baf3 Item 4 -- Length: 61 Offset: 7936 (0x1f00) Flags: NORMAL COPY: 4 a87ff679a2f3e71d9181a67b7542122c Item 5 -- Length: 61 Offset: 7872 (0x1ec0) Flags: NORMAL COPY: 5 e4da3b7fbbce2345d7772b0674a318d5 Item 6 -- Length: 61 Offset: 7808 (0x1e80) Flags: NORMAL COPY: 6 1679091c5a880faf6fb5e6087eb1b2dc Item 7 -- Length: 61 Offset: 7744 (0x1e40) Flags: NORMAL COPY: 7 8f14e45fceea167a5a36dedd4bea2543 Item 8 -- Length: 61 Offset: 7680 (0x1e00) Flags: NORMAL COPY: 8 c9f0f895fb98ab9159f51fd0297e236d Item 9 -- Length: 61 Offset: 7616 (0x1dc0) Flags: NORMAL COPY: 9 45c48cce2e2d7fbdea1afc51c7c6ad26 Item 10 -- Length: 61 Offset: 7552 (0x1d80) Flags: NORMAL COPY: 10 d3d9446802a44259755d38e6d163e820 Item 11 -- Length: 30 Offset: 7520 (0x1d60) Flags: NORMAL COPY: 4 a *** End of File Encountered. Last Block Read: 0 ***

The a=4 row is still there but we got a new one (Item 11) which is our update. Remember that it is the job of vacuum to recycle the dead/old rows:

postgres=# vacuum t1;
VACUUM
postgres=# checkpoint ;
CHECKPOINT

Again (just displaying the data here):

 ------ 
 Item   1 -- Length:   61  Offset: 8128 (0x1fc0)  Flags: NORMAL
COPY: 1	c4ca4238a0b923820dcc509a6f75849b
 Item   2 -- Length:   61  Offset: 8064 (0x1f80)  Flags: NORMAL
COPY: 2	c81e728d9d4c2f636f067f89cc14862c
 Item   3 -- Length:   61  Offset: 8000 (0x1f40)  Flags: NORMAL
COPY: 3	eccbc87e4b5ce2fe28308fd9f2a7baf3
 Item   4 -- Length:    0  Offset:   11 (0x000b)  Flags: REDIRECT
 Item   5 -- Length:   61  Offset: 7936 (0x1f00)  Flags: NORMAL
COPY: 5	e4da3b7fbbce2345d7772b0674a318d5
 Item   6 -- Length:   61  Offset: 7872 (0x1ec0)  Flags: NORMAL
COPY: 6	1679091c5a880faf6fb5e6087eb1b2dc
 Item   7 -- Length:   61  Offset: 7808 (0x1e80)  Flags: NORMAL
COPY: 7	8f14e45fceea167a5a36dedd4bea2543
 Item   8 -- Length:   61  Offset: 7744 (0x1e40)  Flags: NORMAL
COPY: 8	c9f0f895fb98ab9159f51fd0297e236d
 Item   9 -- Length:   61  Offset: 7680 (0x1e00)  Flags: NORMAL
COPY: 9	45c48cce2e2d7fbdea1afc51c7c6ad26
 Item  10 -- Length:   61  Offset: 7616 (0x1dc0)  Flags: NORMAL
COPY: 10	d3d9446802a44259755d38e6d163e820
 Item  11 -- Length:   30  Offset: 7584 (0x1da0)  Flags: NORMAL
COPY: 4	a

… and “Item 4″ is gone (somewhere else). The same happens when you delete data:

postgres=# delete from t1 where a = 4;
DELETE 1
postgres=# vacuum t1;
VACUUM
postgres=# checkpoint;
CHECKPOINT

You’ll notice that both, Items 4 and 11, are now gone (UNUSED):

 ------ 
 Item   1 -- Length:   61  Offset: 8128 (0x1fc0)  Flags: NORMAL
COPY: 1	c4ca4238a0b923820dcc509a6f75849b
 Item   2 -- Length:   61  Offset: 8064 (0x1f80)  Flags: NORMAL
COPY: 2	c81e728d9d4c2f636f067f89cc14862c
 Item   3 -- Length:   61  Offset: 8000 (0x1f40)  Flags: NORMAL
COPY: 3	eccbc87e4b5ce2fe28308fd9f2a7baf3
 Item   4 -- Length:    0  Offset:    0 (0x0000)  Flags: UNUSED
 Item   5 -- Length:   61  Offset: 7936 (0x1f00)  Flags: NORMAL
COPY: 5	e4da3b7fbbce2345d7772b0674a318d5
 Item   6 -- Length:   61  Offset: 7872 (0x1ec0)  Flags: NORMAL
COPY: 6	1679091c5a880faf6fb5e6087eb1b2dc
 Item   7 -- Length:   61  Offset: 7808 (0x1e80)  Flags: NORMAL
COPY: 7	8f14e45fceea167a5a36dedd4bea2543
 Item   8 -- Length:   61  Offset: 7744 (0x1e40)  Flags: NORMAL
COPY: 8	c9f0f895fb98ab9159f51fd0297e236d
 Item   9 -- Length:   61  Offset: 7680 (0x1e00)  Flags: NORMAL
COPY: 9	45c48cce2e2d7fbdea1afc51c7c6ad26
 Item  10 -- Length:   61  Offset: 7616 (0x1dc0)  Flags: NORMAL
COPY: 10	d3d9446802a44259755d38e6d163e820
 Item  11 -- Length:    0  Offset:    0 (0x0000)  Flags: UNUSED

So far for the introduction of pg_filedump, more to come in more detail.

 

Cet article Displaying the contents of a PostgreSQL data file with pg_filedump est apparu en premier sur Blog dbi services.


Auto pre-warming in EDB Postgres Advanced Server 10

$
0
0

Some days ago EDB Postgres Advanced Server 10 was released and one feature which might be handy is auto pre-warming. What this does is to save all the buffers (or better a description of the buffers) which are currently loaded in to shared_buffers to disk and then re-read the buffers automatically when the instance is restarted. Lets see how it works.

Before getting the feature to work we need to look at two parameters which control the behavior:

  • pg_prewarm.autoprewarm: Enabled or disabled the feature
  • pg_prewarm.autoprewarm_interval: The interval the current state is written to disk or 0 to only write once when the instance shutsdown

Another requirement is to load the library when the instance starts:

postgres=# alter system set shared_preload_libraries ='$libdir/dbms_pipe,$libdir/edb_gen,$libdir/dbms_aq,$libdir/pg_prewarm';
ALTER SYSTEM

Once the instance is restarted we can proceed with the configuration:

postgres=# alter system set pg_prewarm.autoprewarm=true;
ALTER SYSTEM
postgres=# alter system set pg_prewarm.autoprewarm_interval='10s';
ALTER SYSTEM

By doing this we told the server to write the current state of the buffers to disk every 10 seconds. You’ll also notice a new background worker process which is responsible for doing the work:

postgres=# \! ps -ef | grep prewarm | egrep -v "ps|grep"
postgres  3682  3675  0 12:05 ?        00:00:00 postgres: bgworker: autoprewarm   

Lets load something into shared_buffers:

postgres=# insert into t1 select a, md5(a::varchar) from generate_series(1,1000) a;
INSERT 0 1000
postgres=# select count(*) from t1;
 count 
-------
  1000
(1 row)
postgres=# explain (analyze,buffers) select count(*) from t1;
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Aggregate  (cost=21.50..21.51 rows=1 width=8) (actual time=0.492..0.492 rows=1 loops=1)
   Buffers: shared hit=9
   ->  Seq Scan on t1  (cost=0.00..19.00 rows=1000 width=0) (actual time=0.019..0.254 rows=1000 loops=1)
         Buffers: shared hit=9
 Planning time: 0.070 ms
 Execution time: 0.538 ms
(6 rows)

The “shared hit” confirms that we read the buffers from shared_buffers and not from the os/file system cache. Then lets restart and do the same check again:

postgres@centos7:/u02/pgdata/PG4/ [EDB10] pg_ctl -D . restart -m fast
postgres@centos7:/u02/pgdata/PG4/ [EDB10] psql -X postgres
psql.bin (10.1.5)
Type "help" for help.

postgres=# explain (analyze,buffers) select count(*) from t1;
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Aggregate  (cost=21.50..21.51 rows=1 width=8) (actual time=0.586..0.586 rows=1 loops=1)
   Buffers: shared hit=9
   ->  Seq Scan on t1  (cost=0.00..19.00 rows=1000 width=0) (actual time=0.024..0.295 rows=1000 loops=1)
         Buffers: shared hit=9
 Planning time: 0.451 ms
 Execution time: 0.766 ms
(6 rows)

postgres=# 

… here we go. How is this information stored? When you take a look at $PGDATA you’ll notice a file with the following format:

postgres@centos7:/u02/pgdata/PG4/ [EDB10] cat $PGDATA/autoprewarm.blocks | tail
<>
0,1664,1262,0,0
15471,1663,1259,0,0
15471,1663,1259,0,1
15471,1663,1259,0,2
15471,1663,1249,0,0
15471,1663,1249,0,1
15471,1663,1249,0,2
15471,1663,1249,0,3
15471,1663,1249,0,4

The first field is the OID of the database:

postgres=# select oid,datname from pg_database where oid=15471;
  oid  | datname  
-------+----------
 15471 | postgres
(1 row)

The second one is the tablespace:

postgres=# select oid,spcname from pg_tablespace where oid=1663;
 oid  |  spcname   
------+------------
 1663 | pg_default
(1 row)

The third one is the table:

postgres=# select oid,relname from pg_class where oid = 16402;
  oid  | relname 
-------+---------
 16402 | t1
(1 row)

postgres=# \! grep 16402 $PGDATA/autoprewarm.blocks
15471,1663,16402,0,0
15471,1663,16402,0,1
15471,1663,16402,0,2
15471,1663,16402,0,3
15471,1663,16402,0,4
15471,1663,16402,0,5
15471,1663,16402,0,6
15471,1663,16402,0,7
15471,1663,16402,0,8
15471,1663,16402,1,0
15471,1663,16402,1,2

The fourth one is the fork/file (0 is the datafile, 1 is the free space map) and the last one is the actual block to load. This is also described in “./contrib/pg_prewarm/autoprewarm.c” in the PostgreSQL source code:

/* Metadata for each block we dump. */
typedef struct BlockInfoRecord
{
        Oid                     database;
        Oid                     tablespace;
        Oid                     filenode;
        ForkNumber      forknum;
        BlockNumber blocknum;
} BlockInfoRecord;

For community PostgreSQL there is the contrib module pg_prewarm you can use for that, check here.

 

Cet article Auto pre-warming in EDB Postgres Advanced Server 10 est apparu en premier sur Blog dbi services.

Can I do it with PostgreSQL? – 18 – Instead of triggers on views

$
0
0

It has been quite a while since the last post in this series but today comes the next one. Being at a customer this morning this question popped up: Can we have instead of triggers on a view in PostgreSQL as well? I couln’d immediately answer (although I was quite sure you can) so here is the test. I took an example for Oracle from here and re-wrote it in PostgreSQL syntax.

I took the same tables and adjusted the data types:

CREATE TABLE CUSTOMER_DETAILS ( CUSTOMER_ID INT PRIMARY KEY
                              , CUSTOMER_NAME VARCHAR(20)
                              , COUNTRY VARCHAR(20)
                              );
CREATE TABLE PROJECTS_DETAILS ( PROJECT_ID INT PRIMARY KEY
                              , PROJECT_NAME VARCHAR(30)
                              , PROJECT_START_DATE DATE
                              , CUSTOMER_ID INT REFERENCES CUSTOMER_DETAILS(CUSTOMER_ID)
                              );

The same view definition:

CREATE OR REPLACE VIEW customer_projects_view AS
   SELECT cust.customer_id, cust.customer_name, cust.country,
          projectdtls.project_id, projectdtls.project_name, 
          projectdtls.project_start_Date
   FROM customer_details cust, projects_details projectdtls
   WHERE cust.customer_id = projectdtls.customer_id;

Try to insert:

postgres=# INSERT INTO customer_projects_view VALUES (1,'XYZ Enterprise','Japan',101,'Library management',now());
ERROR:  cannot insert into view "customer_projects_view"
DETAIL:  Views that do not select from a single table or view are not automatically updatable.
HINT:  To enable inserting into the view, provide an INSTEAD OF INSERT trigger or an unconditional ON INSERT DO INSTEAD rule.
Time: 2.135 ms

… and the answer is already in the error message. So obviously we should be able to do that. In PostgreSQL you need a trigger function:

CREATE OR REPLACE FUNCTION cust_proj_view_insert_proc() RETURNS trigger AS $$
BEGIN
        
   INSERT INTO customer_details (customer_id,customer_name,country)
          VALUES (NEW.customer_id, NEW.customer_name, NEW.country);

   INSERT INTO projects_details (project_id, project_name, project_start_Date, customer_id)
   VALUES (
     NEW.project_id,
     NEW.project_name,
     NEW.project_start_Date,
     NEW.customer_id);

   RETURN NEW;
     EXCEPTION WHEN unique_violation THEN
       RAISE EXCEPTION 'Duplicate customer or project id';
END;
$$ LANGUAGE plpgsql;

Then we need a trigger calling this function:

create trigger cust_proj_view_insert_trg 
    instead of insert on customer_projects_view for each row EXECUTE procedure cust_proj_view_insert_proc();

Try the insert again:

INSERT INTO customer_projects_view VALUES (1,'XYZ Enterprise','Japan',101,'Library management',now());
INSERT INTO customer_projects_view VALUES (2,'ABC Infotech','India',202,'HR management',now());

… and here we are:

postgres=# select * FROM customer_details;
 customer_id | customer_name  | country 
-------------+----------------+---------
           1 | XYZ Enterprise | Japan
           2 | ABC Infotech   | India

Definitely, you can :)

 

Cet article Can I do it with PostgreSQL? – 18 – Instead of triggers on views est apparu en premier sur Blog dbi services.

A response to: What makes a community?

$
0
0

A recent tweet of mine resulted in Martin Widlake to write a really great blog post about What makes a community. Please read it before you continue to read this. There was another response from Stefan Koehler which is worth mentioning as well.

Both, Martin and Stefan, speak about Oracle communities because this is were they are involved in. At the beginning of Martin’s post he writes: “Daniel was not specific about if this was a work/user group community or a wider consideration of society, …” and this was intentional. I don’t think that it really matters much if we speak about a community around a product, a community that just comes together for drinking beer and to discuss the latest football results or even if we talk about a community as a family. At least in the German translation “Gemeinschaft” applies to a family as well. This can be a very few people (mother,father,kids) or more if we include brothers, sisters, grandmas and so on. But still the same rules that Martin outlines in hist blog post apply: You’ll always have people driving the community such as organizing dinners (when we speak about families), organizing conferences (when we speak about technical communities) or organizing parties (when we talk about fun communities) or organizing whatever for whatever people make up the specific community. Then you’ll always have the people willing to help (the people Martin describes as the people who share and/or talk) and you’ll always have the people that consume/attend which is good as well, because without them you’d have nothing to share and to organize.

We at dbi services are a community as well. As we work with various products the community is not focused on a specific product (well, it is in the area of a specific product, of course) but rather on building an environment we like to work in. The community here is tight to technology but detached from a single product. We share the same methodologies, the same passion and have fun attending great parties that are organized mostly by the non technical people in our company. In this case you could say: The non-technical people are the drivers for the community of the company even if the company is very technical from its nature. And here we have the same situation again: Some organize, some attend/consume and some share, but all are required (as Martin outlined in his post as well).

Of course I have to say something about the PostgreSQL community: Because PostgreSQL is a real community project the community around it is much more important than with other technical communities. I do not say that you do not need a community for vendor controlled products because when the vendor fails to build a community around its product the product will fail as well. What I am saying is that the PostgreSQL community goes deeper as the complete product is driven by the community. Of course there are companies that hire people working for the community but they are not able to influence the direction if there is no agreement about the direction in the community. Sometimes this can make it very hard to progress and a lot of discussions need to be discussed but at the end I believe it is better to have something which the majority agrees on. In the PostgreSQL community I think there are several drivers: For sure all the developers are drivers, the people who take care of all the infrastructure (mailing lists, commitfests, …) are drivers as well. Basically everybody you can see on the mailing lists and answers questions are drivers because they keep the community active. Then we have all the people you see in other communities as well: Those who share and those who consume/attend. I think you get the point: An open source community is by its nature far more active than what you usually see for non-opensource communities for one reason: It already starts with the developers and not with a community around a final product. You can be part of such a community from the very beginning, which is writing new features and patches.

Coming back to the original question: What makes a community? Beside what Martin outlined there are several other key points:

  • The direction of the community (no matter if technical or not) must be so that people want to be part of that
  • When we speak about a community around a product: You must identify yourself with the product. When the product goes into a direction you can not support for whatever reason you’ll leave, sooner or later. The more people leave, the weaker the community
  • It must be easy to participate and to get help
  • A lot of people are willing to spend (free-) time to do stuff for the community
  • There must be a culture which respects you and everybody else
  • Maybe most important: A common goal and people that are able and willing to work together, even if this sometimes requires a lot of discussions

When you have all of these, the drivers, the people who share, and those that attend will come anyway, I believe.

 

Cet article A response to: What makes a community? est apparu en premier sur Blog dbi services.

Is it an index, a table or what?

$
0
0

A recent tweet from Kevin Closson outlined that in PostgreSQL it might be confusing if something is an index or table. Why is it like that? Lets have a look and start be re-building the example from Kevin:

For getting into the same situation Kevin described we need something like this:

postgres=# create table base4(custid int, custname varchar(50));
CREATE TABLE
postgres=# create index base4_idx on base4(custid);
CREATE INDEX

Assuming that we forgot that we created such an index and come back later and try to create it again we have exactly the same behavior:

postgres=# create index base4_idx on base4(custid);
ERROR:  relation "base4_idx" already exists
postgres=# drop table base4_idx;
ERROR:  "base4_idx" is not a table
HINT:  Use DROP INDEX to remove an index.
postgres=# 

They keyword here is “relation”. In PostgreSQL a “relation” does not necessarily mean a table. What you need to know is that PostgreSQL stores everything that looks like a table/relation (e.g. has columns) in the pg_class catalog table. When we check our relations there:

postgres=# select relname from pg_class where relname in ('base4','base4_idx');
  relname  
-----------
 base4
 base4_idx
(2 rows)

… we can see that both, the table and the index, are somehow treated as a relation. The difference is here:

postgres=# \! cat a.sql
select a.relname 
     , b.typname
  from pg_class a
     , pg_type b 
 where a.relname in ('base4','base4_idx')
   and a.reltype = b.oid;
postgres=# \i a.sql
 relname | typname 
---------+---------
 base4   | base4
(1 row)

Indexes do not have an entry in pg_type, tables have. What is even more interesting is, that the “base4″ table is a type itself. This means for every table you create a composite type is created as well that describes the structure of the table. You can even link back to pg_class:

postgres=# select typname,typrelid from pg_type where typname = 'base4';
 typname | typrelid 
---------+----------
 base4   |    32901
(1 row)

postgres=# select relname from pg_class where oid = 32901;
 relname 
---------
 base4
(1 row)

When you want to know what type a relation is of the easiest way is to ask like this:

postgres=# select relname,relkind from pg_class where relname in ('base4','base4_idx');
  relname  | relkind 
-----------+---------
 base4     | r
 base4_idx | i
(2 rows)

… where:

  • r = ordinary table
  • i = index
  • S = sequence
  • t = TOAST table
  • m = materialized view
  • c = composite type
  • f = foreign table
  • p = partitioned table

Of course there are also catalog tables for tables and indexes, so you can also double check there. Knowing all this the message is pretty clear:

postgres=# create index base4_idx on base4(custid);
ERROR:  relation "base4_idx" already exists
postgres=# drop relation base4_idx;
ERROR:  syntax error at or near "relation"
LINE 1: drop relation base4_idx;
             ^
postgres=# drop table base4_idx;
ERROR:  "base4_idx" is not a table
HINT:  Use DROP INDEX to remove an index.
postgres=# 

PostgreSQL finally is telling you that “base4_idx” is an index and not a table which is fine. Of course you could think that PostgreSQL should to that on its own but it is also true: When you want to drop something, you should be sure on what you really want to drop.

 

Cet article Is it an index, a table or what? est apparu en premier sur Blog dbi services.

Create index CONCURRENTLY in PostgreSQL

$
0
0

In PostgreSQL when you create an index on a table, sessions that want to write to the table must wait until the index build completed by default. There is a way around that, though, and in this post we’ll look at how you can avoid that.

As usual we’ll start with a little table:

postgres=# \! cat a.sql
drop table if exists t1;
create table t1 ( a int, b varchar(50));
insert into t1
select a.*, md5(a::varchar) from generate_series(1,5000000) a;
postgres=# \i a.sql
DROP TABLE
CREATE TABLE
INSERT 0 5000000

When you now create an index on that table and try to write the table at the same time from a different session that session will wait until the index is there (the screenshot shows the first session creating the index on the left and the second session doing the update on the right, which is waiting for the left one):
Selection_007

For production environments this not something you want to happen as this can block a lot of other sessions especially when the table in question is heavily used. You can avoid that by using “create index concurrently”.

Selection_008

Using that syntax writes to the table from other sessions will succeed while the index is being build. But, as clearly written in the documentation: The downside is that the table needs to be scanned twice, so more work needs to be done which means more resource usage on your server. Other points need to be considered as well. When, for whatever reason, you index build fails (e.g. by canceling the create index statement):

postgres=# create index concurrently i1 on t1(a);
^CCancel request sent
ERROR:  canceling statement due to user request

… you maybe would expect the index not to be there at all but this is not the case. When you try to create the index right after the canceled statement again you’ll hit this:

postgres=# create index concurrently i1 on t1(a);
ERROR:  relation "i1" already exists

This does not happen when you do not create the index concurrently:

postgres=# create index i1 on t1(a);
^CCancel request sent
ERROR:  canceling statement due to user request
postgres=# create index i1 on t1(a);
CREATE INDEX
postgres=# 

The questions is why this happens in the concurrent case but not in the “normal” case? The reason is simple: When you create an index the “normal” way the whole build is done in one transaction. Because of this the index does not exist when the transaction is aborted (the create index statement is canceled). When you build the index concurrently there are multiple transactions involved: “In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then two table scans occur in two more transactions”. So in this case:

postgres=# create index concurrently i1 on t1(a);
ERROR:  relation "i1" already exists

… the index is already stored in the catalog:

postgres=# create index concurrently i1 on t1(a);
^CCancel request sent
ERROR:  canceling statement due to user request
postgres=# select relname,relkind,relfilenode from pg_class where relname = 'i1';
 relname | relkind | relfilenode 
---------+---------+-------------
 i1      | i       |       32926
(1 row)

If you don’t take care of that you will have invalid indexes in your database:

postgres=# \d t1
                        Table "public.t1"
 Column |         Type          | Collation | Nullable | Default 
--------+-----------------------+-----------+----------+---------
 a      | integer               |           |          | 
 b      | character varying(50) |           |          | 
Indexes:
    "i1" btree (a) INVALID

You might think that this does not harm, but then consider this case:

-- in session one build a unique index
postgres=# create unique index concurrently i1 on t1(a);
-- then in session two violate the uniqueness after some seconds
postgres=# update t1 set a = 5 where a = 4000000;
UPDATE 1
-- the create index statement will fail in the first session
postgres=# create unique index concurrently i1 on t1(a);
ERROR:  duplicate key value violates unique constraint "i1"
DETAIL:  Key (a)=(5) already exists.

This is even worse as the index now really consumes space on disk:

postgres=# select relpages from pg_class where relname = 'i1';
 relpages 
----------
    13713
(1 row)

The index is invalid, of course and will not be used by the planner:

postgres=# \d t1
                        Table "public.t1"
 Column |         Type          | Collation | Nullable | Default 
--------+-----------------------+-----------+----------+---------
 a      | integer               |           |          | 
 b      | character varying(50) |           |          | 
Indexes:
    "i1" UNIQUE, btree (a) INVALID

postgres=# explain select * from t1 where a = 12345;
                              QUERY PLAN                              
----------------------------------------------------------------------
 Gather  (cost=1000.00..82251.41 rows=1 width=37)
   Workers Planned: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..81251.31 rows=1 width=37)
         Filter: (a = 12345)
(4 rows)

But the index is still maintained:

postgres=# select relpages from pg_class where relname = 'i1';
 relpages 
----------
    13713
(1 row)
postgres=# insert into t1 select a.*, md5(a::varchar) from generate_series(5000001,6000000) a;
INSERT 0 1000000

postgres=# select relpages from pg_class where relname = 'i1';
 relpages 
----------
    16454
(1 row)

So now you have an index which can not be used to speed up queries (which is bad) but the index is still maintained when you write to the table (which is even worse because you consume resources for nothing). The only way out of this is to drop and re-create the index:

postgres=# drop index i1;
DROP INDEX
-- potentially clean up any rows that violate the constraint and then
postgres=# create unique index concurrently i1 on t1(a);
CREATE INDEX
postgres=# \d t1
                        Table "public.t1"
 Column |         Type          | Collation | Nullable | Default 
--------+-----------------------+-----------+----------+---------
 a      | integer               |           |          | 
 b      | character varying(50) |           |          | 
Indexes:
    "i1" UNIQUE, btree (a)

postgres=# explain select * from t1 where a = 12345;
                          QUERY PLAN                           
---------------------------------------------------------------
 Index Scan using i1 on t1  (cost=0.43..8.45 rows=1 width=122)
   Index Cond: (a = 12345)
(2 rows)

Remember: When a create index operations fails in concurrent mode make sure that you drop the index immediately.

One more thing to keep in mind: When you create an index concurrently and there is another session already modifying the data the create index command waits until that other operation completes:

-- first session inserts data without completing the transaction
postgres=# begin;
BEGIN
Time: 0.579 ms
postgres=# insert into t1 select a.*, md5(a::varchar) from generate_series(6000001,7000000) a;
INSERT 0 1000000
-- second sessions tries to build the index
postgres=# create unique index concurrently i1 on t1(a);

The create index operation will wait until that completes:

postgres=# select query,state,wait_event,wait_event_type from pg_stat_activity where state ='active';
                                query                                 | state  | wait_event | wait_event_t
----------------------------------------------------------------------+--------+------------+-------------
 create unique index concurrently i1 on t1(a);                        | active | virtualxid | Lock
 select query,state,wait_event,wait_event_type from pg_stat_activity; | active |            | 

… meaning when someone forgets to end the transaction the create index command will wait forever. There is the parameter idle_in_transaction_session_timeout which gives you more control on that but still you need to be aware what is happening here.

Happy index creation :)

 

Cet article Create index CONCURRENTLY in PostgreSQL est apparu en premier sur Blog dbi services.

DBVISIT from Oracle to Postgres

$
0
0

As I regularly work on Oracle and PostgreSQL, I decided to test the replication from Oracle to PostgreSQL using the Dbvisit Replicate tool.

Dbivisit Replicate does not use Oracle logminer ot triggers but its own mining processes to get the changes when they are written to the redo logs. When a change appears in the redo log, an external file called PLOG is generated and transferred to the target.

dbvisit

The architecture is quite easy to understand, you have a MINE process on the source server, looking at the redo logs for changed data, and an APPLY process which applies SQL on the target database.

The configuration is easy to implement but must not be under estimated:=)

My configuration is the following:

  • Oracle server named cloud13c, with PSI database version 12.2.0.1
  • Postgres server named pg_essentials_p1 with Postgres version 9.6

At first, we create a user in the Postgres database:

postgres@pg_essentials_p1:/home/postgres/ [PG1] createuser -d -e -E -l -P -r -s dbvrep_admin
Enter password for new role: 
Enter it again: 
CREATE ROLE dbvrep_admin ENCRYPTED 
PASSWORD 'md5e3c4e8f1b4f8e388eef4fe890d6bdb36' SUPERUSER CREATEDB 
CREATEROLE INHERIT LOGIN;

We edit the configuration file /u02/pgdata/postgresql.conf in order to allow non-localhost connections:

postgres@pg1:/u02/pgdata/PG1/ [PG1] cp postgresql.conf postgresql.conf.save
postgres@pg1:/u02/pgdata/PG1/ [PG1] sed -i "s/^#\(
listen_addresses = '\)localhost'/\1*'\t/" postgresql.conf

We also enable connections on non-localhost address:

postgres@pg1:/u02/pgdata/PG1/ [PG1] cp pg_hba.conf 
pg_hba.conf.save
postgres@p1:/u02/pgdata/PG1/ [PG1] echo -e 
"host\tall\t\tall\t\t0.0.0.0/0\t\tmd5" >> pg_hba.conf

cat pg_hba.conf:

 # TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     md5
# IPv4 local connections:
host    all             all             127.0.0.1/32            md5
host    all             all             192.168.22.201/24       md5
 # IPv6 local connections:
host    all             all             ::1/128                 md5
 host    all             barman          192.168.1.101/24       md5
host    replication     barman_streaming 192.168.1.101/24       md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     postgres                                md5
host    replication     postgres        127.0.0.1/32            md5
host    replication     postgres        ::1/128                 md5
host all       all       0.0.0.0/0      md5

We have to restart the postgres server:

postgres@pg1:/u02/pgdata/PG1/ [PG1] pgrestart
waiting for server to shut down.... done
server stopped
server starting
postgres@pg1:/u02/pgdata/PG1/ [PG1] 2017-07-17 13:52:52.350 CEST
 - 1 - 3106 - 
 - @ LOG:  redirecting log output to logging collector process
2017-07-17 13:52:52.350 CEST - 2 - 3106 - 
 - @ HINT:  Future log output will appear in directory 
"/u01/app/postgres/admin/PG1/pg_log".
 
postgres@pg_essentials_p1:/u02/pgdata/PG1/ [PG1] 
postgres@pg_essentials_p1:/u02/pgdata/PG1/ [PG1] alias | grep pgrestart
alias pgrestart='pg_ctl -D ${PGDATA} restart -m fast'

Then we install dbvisit replicate:

We download the dbvisit_replicate-2.9.00-el5.x86_64.rpm and we install it:

[root@localhost software]# rpm -ivh dbvisit_replicate-2.9.00-el5.x86_64.rpm 
Preparing...                       ################################# [100%]
Updating / installing...
   1:dbvisit_replicate-2.9.00-el5  ################################# [100%]

To make it work properly, I had to modify the sqlnet.ora file as follows in order to avoid the following error message:

ERR-11: Could not connect as dbvrep to database PSI, 
error is ORA-24327: need explicit attach
before authenticating a user (DBD ERROR: OCISessionBegin)
sqlnet.ora:
SQLNET.SQLNET_ALLOWED_LOGON_VERSION=11
SQLNET.ALLOWED_LOGON_VERSION_CLIENT =11
SQLNET.ALLOWED_LOGON_VERSION_SERVER =11

Before running dbvrep, be sure you can connect with psql from the Oracle server to the postgreSQL server !! I needed to install a postgres client own the Oracle host and to define the PATH properly.

Finally by running dbvrep on the Oracle server, you  run the setup wizard, and you enter your configuration settings, this menu is quite easy to understand. The setup wizard is defined in 4 steps:

– Step 1: describe databases

– Step 2: Replicate Pairs

– Step 3: Replicated tables

– Step 4: Process Configuration

oracle@localhost:/home/oracle/ora2pg/ [PSI] dbvrep
Initializing......done
Dbvisit Replicate version 2.9.02
Copyright (C) Dbvisit Software Limited. All rights reserved.
No DDC file loaded.
Run "setup wizard" to start the configuration wizard or try "help" 
to see all commands available.
dbvrep> setup wizard                                                                              
This wizard configures Dbvisit Replicate.
 
The setup wizard creates configuration scripts, which need to be run 
after the wizard ends. Nochanges to the databases are made before that.
 
The progress is saved every time a list of databases, replications, etc. 
is shown. It will bere-read if wizard is restarted and the same DDC 
name and script path is selected.
Run the wizard now? [Yes]                                                   
Accept end-user license agreement? (View/Yes/No) [No] yes                  
Before starting the actual configuration, some basic information is needed. The DDC name and
script path determines where all files created by the wizard go 
(and where to reread them ifwizard is rerun) and the license key 
determines which options are available for this
configuration.
(DDC_NAME) - Please enter a name for this replication: [] ora2pg                                  
(LICENSE_KEY) - Please enter your license key: [(trial)]                                          
Which Replicate edition do you want to trial (LTD/XTD/MAX): [MAX]                                 
(SETUP_SCRIPT_PATH) - Please enter a directory for location of 
configuration scripts on this     
machine: [/home/oracle/Documents/ora2pg] /home/oracle/ora2pg                                 
 
Network configuration files were detected on this system in these locations:
/u00/app/oracle/network/admin
/u00/app/oracle/product/12.2.0.1/dbhome_1/network/admin
(TNS_ADMIN) - Please enter TNS configuration directory for this machine:                          [/u00/app/oracle/network/admin]                                          
Read 2 described databases from previous wizard run.
 
Step 1 - Describe databases
========================================
The first step is to describe databases used in the replication. 
There are usually two of them
(source and target).
 
Following databases are now configured:
1: Oracle PSI, SYS/***, SYSTEM/***, dbvrep/***, USERS/TEMP, dbvrep/, 
ASM:No, TZ: +02:00
2: Postgres postgres, dbvrep_admin/***, dbvrep_admin/***, dbvrep/***, 
/, dbvrep/, ASM:n/a, TZ: 
Enter the number of the database to modify it, or "add", or "done": [done]                        
Read 1 replication pairs from previous wizard run.
 
Step 2 - Replication pairs
========================================
The second step is to set source and targets for each replication pair.
 
Enter number of replication pair to modify it, or "add", or "done": [done] 1                  
Do you want to "edit" the replication pair or "delete" it? [edit] edit                        
Let's configure the replication pair, selecting source and target.
Following databases are described:
1: PSI#DBVREP (Oracle)
2: DBNAME=POSTGRES;HOST=PG1#DBVREP (Postgres) 
(cannot be source: not an Oracle database)
Select source database: [1]                                                                   
Select target database: [2]                                                                   
Will limited DDL replication be enabled? (Yes/No) [Yes]                                       
Use fetcher to offload the mining to a different server? (Yes/No) [No]                        
Should where clauses (and Event Streaming) include all columns, 
not just changed and PK?      (Yes/No) [No]                                                                            
Would you like to encrypt the data across the network? (Yes/No) [No]                          
Would you like to compress the data across the network? (Yes/No) [No]                         
How long do you want to set the network timeouts. 
Recommended range between 60-300 seconds    [60]                                                                                       
Lock and copy the data initially one-by-one or at a single SCN?
one-by-one : Lock tables one by one and capture SCN
single-scn : One SCN for all tables
ddl-only   : Only DDL script for target objects
resetlogs  : Use SCN from last resetlogs operation
(standby activation, rman incomplete
recovery)
no-lock    : Do not lock tables. Captures previous SCN of oldest active 
transaction. Requires pre-requisite running of pre-all.sh script            (one-by-one/single-scn/ddl-only/resetlogs/no-lock) [single-scn] 
 
What data instantiation script to create?
ddl_file       : DDL file created (APPLY.sql)
ddl_run        : DDL is automatically executed on target
load           : All replicated data is created and loaded automatically
none                                                                                          (ddl_file/ddl_run/load/none) [ddl_run] ddl-file
 
 
Following replication pairs are now configured:
1: PSI (Oracle) ==> postgres (Postgres), DDL: Yes, fetcher: No, 
process suffix: (no suffix),
compression: No, encryption: No, network timeout: 60, prepare type: 
single-scn,:
ddl-run
Enter number of replication pair to modify it, or "add", or "done": [done]                        
Read 1 replication pairs from previous wizard run.
 
Step 3 - Replicated tables
========================================
The third step is to choose the schemas and tables to be replicated. 
If the databases arereachable, the tables are checked for existence, 
datatype support, etc., schemas are queried for tables. 
Note that all messages are merely hints/warnings and may be ignored 
if issues are rectified before the scripts are actually executed.
 
Following tables are defined for replication pairs:
1: PSI (Oracle) ==> postgres (Postgres), DDL: Yes, suffix: (no suffix), 
prepare: single-scn
  PSI(tables)
Enter number of replication pair to modify it, or "done": [done]                                  
Read 2 replication pairs from previous wizard run.
 
Step 4 - Process configuration
========================================
The fourth step is to configure the replication processes for each 
replication.
 
Following processes are defined:
1: MINE on PSI
  Host: cloud13c, SMTP: No, SNMP: No
2: APPLY on postgres
  Host: pg1, SMTP: No, SNMP: No
Enter number of process to modify it, or "done": [done] 1                                         
Fully qualified name of the server for the process (usually co-located 
with the database, unless  mine is offloaded using fetcher): [cloud13c]                                                     
Server type (Windows/Linux/Unix): [Linux]                                                         
Enable email notifications about problems? (Yes/No) [No]                                          
Enable SNMP traps/notifications about problems? (Yes/No) [No]                                     
Directory with DDC file and default where to create log files etc. 
(recommended: same as global   setting, if possible)? [/home/oracle/ora2pg]                                                    
Following settings were pre-filled with defaults or your reloaded settings:
----------------------------------------
[MINE_REMOTE_INTERFACE]: Network remote interface: cloud13c:7901 
[MINE_DATABASE]: Database TNS: PSI 
[TNS_ADMIN]: tnsnames.ora path: /u00/app/oracle/network/admin 
[MINE_PLOG]: Filemask for generated plogs: /home/oracle/ora2pg/mine/%S.%E 
(%S is sequence, %T thread, %F original filename (stripped extension), 
%P process type, %N process name, %E default extension)
[LOG_FILE]: General log file: /home/oracle/ora2pg/log/dbvrep_%N_%D.%E 
[LOG_FILE_TRACE]: Error traces: 
/home/oracle/ora2pg/log/trace/dbvrep_%N_%D_%I_%U.%E 
 
Checking that these settings are valid...
Do you want to change any of the settings? [No]                                                   
Following processes are defined:
1: MINE on PSI
  Host: cloud13c, SMTP: No, SNMP: No
2: APPLY on postgres
  Host: pg1, SMTP: No, SNMP: No
Enter number of process to modify it, or "done": [done] 2                                         
Fully qualified name of the server for the process (usually co-located 
with the database, unless  mine is offloaded using fetcher): [pg1]                                                          
Server type (Windows/Linux/Unix): [Linux]                                                         
Enable email notifications about problems? (Yes/No) [No]                                          
Enable SNMP traps/notifications about problems? (Yes/No) [No]                                     
Directory with DDC file and default where to create log files etc. 
(recommended: same as global   setting, if possible)? [/home/oracle/ora2pg]                                                    
Following settings were pre-filled with defaults or your reloaded settings:
----------------------------------------
[APPLY_REMOTE_INTERFACE]: Network remote interface: pg1:7902 
[APPLY_DATABASE]: Database Postgres connection string: dbname=postgres;
host=pg1 
[TNS_ADMIN]: tnsnames.ora path: /u00/app/oracle/network/admin 
[APPLY_SCHEMA]: Dbvisit Replicate database (schema): dbvrep 
[APPLY_STAGING_DIR]: Directory for received plogs: /home/oracle/ora2pg/apply 
[LOG_FILE]: General log file: /home/oracle/ora2pg/log/dbvrep_%N_%D.%E 
[LOG_FILE_TRACE]: Error traces: 
/home/oracle/ora2pg/log/trace/dbvrep_%N_%D_%I_%U.%E 
 
Checking that these settings are valid...
Do you want to change any of the settings? [No]                                                   
Following processes are defined:
1: MINE on PSI
  Host: cloud13c, SMTP: No, SNMP: No
2: APPLY on postgres
  Host: pg1, SMTP: No, SNMP: No
Enter number of process to modify it, or "done": [done]                                           
Created file /home/oracle/ora2pg/ora2pg-APPLY.ddc.
Created file /home/oracle/ora2pg/ora2pg-MINE.ddc.
Created file /home/oracle/ora2pg/config/ora2pg-setup.dbvrep.
Created file /home/oracle/ora2pg/config/ora2pg-dbsetup_PSI_DBVREP.sql.
Created file /home/oracle/ora2pg/config/ora2pg-dbsetup_DBNAME_POSTGRES_HOST_PG1_DBVREP.sql.
Created file /home/oracle/ora2pg/config/ora2pg-grants_PSI_DBVREP.sql.
Created file /home/oracle/ora2pg/config/ora2pg-grants_DBNAME_POSTGRES_HOST_PG1_DBVREP.sql.
Created file /home/oracle/ora2pg/config/ora2pg-onetime.ddc.
Created file /home/oracle/ora2pg/start-console.sh.
Created file /home/oracle/ora2pg/ora2pg-run-cloud13c.sh.
Created file /home/oracle/ora2pg/scripts/ora2pg-cloud13c-start-MINE.sh.
Created file /home/oracle/ora2pg/scripts/ora2pg-cloud13c-stop-MINE.sh.
Created file /home/oracle/ora2pg/scripts/ora2pg-cloud13c-dbvrep-MINE.sh.
Created file /home/oracle/ora2pg/scripts/systemd-dbvrep-MINE_ora2pg.service.
Created file /home/oracle/ora2pg/scripts/upstart-dbvrep-MINE_ora2pg.conf.
Created file /home/oracle/ora2pg/ora2pg-run-pg1.sh.
Created file /home/oracle/ora2pg/scripts/ora2pg-pg1-start-APPLY.sh.
Created file /home/oracle/ora2pg/scripts/ora2pg-pg1-stop-APPLY.sh.
Created file /home/oracle/ora2pg/scripts/ora2pg-pg1-dbvrep-APPLY.sh.
Created file /home/oracle/ora2pg/scripts/systemd-dbvrep-APPLY_ora2pg.service.
Created file /home/oracle/ora2pg/scripts/upstart-dbvrep-APPLY_ora2pg.conf.
Created file /home/oracle/ora2pg/Nextsteps.txt.
Created file /home/oracle/ora2pg/ora2pg-all.sh.
============================================================================
Dbvisit Replicate wizard completed
Script /home/oracle/ora2pg/ora2pg-all.sh created. 
This runs all the above created scripts. Please exit out of dbvrep, 
review and run script as current user to setup and start Dbvisit Replicate.
============================================================================
Optionally, the script can be invoked now by this wizard.
Run this script now? (Yes/No) [No]                                          dbvrep> exit

As it is asked at the end of the setup wizard, we run the ora2pg_all.sh :

oracle@localhost:/home/oracle/ora2pg/ [PSI] . ora2pg-all.sh 
Setting up Dbvisit Replicate configuration
Configure database PSI...
This check fails if the DBID is not the expected one...
Ok, check passed.
Configure database dbname=postgres
Object grants for database PSI...
Object grants for database dbname=postgres
Setting up the configuration
Initializing......done
DDC loaded from database (0 variables).
Dbvisit Replicate version 2.9.02
Copyright (C) Dbvisit Software Limited. All rights reserved.
DDC file /home/oracle/ora2pg/config/ora2pg-onetime.ddc loaded.
MINE: Cannot determine Dbvisit Replicate dictionary version. (no
dictionary exists)
APPLY: Cannot determine Dbvisit Replicate dictionary version. (no
dictionary exists)
dbvrep> #clear the no-DDC-DB-available warning
dbvrep> process clear previous warnings
dbvrep> set ON_WARNING SKIP
Variable ON_WARNING set to SKIP for process *.
dbvrep> set ON_ERROR EXIT
Variable ON_ERROR set to EXIT for process *.
dbvrep> 
dbvrep> # Configuring default processes
dbvrep> choose process MINE
Process type MINE set to: MINE.
dbvrep> choose process APPLY
Process type APPLY set to: APPLY.
dbvrep> PROCESS SWITCH_REDOLOG
Redo log switch requested.
dbvrep> PROCESS SETUP MINE DROP DICTIONARY
0 dictionary objects dropped.
dbvrep> PROCESS SETUP MINE CREATE DICTIONARY
dbvrep> PROCESS SETUP MINE LOAD DICTIONARY
Oldest active transaction SCN: 2054212 (no active transaction)
Supplemental logging on database set.
dbvrep> PROCESS SETUP APPLY DROP DICTIONARY
0 dictionary objects dropped.
dbvrep> PROCESS SETUP APPLY CREATE DICTIONARY
dbvrep> PROCESS SETUP APPLY LOAD DICTIONARY
dbvrep> PROCESS SETUP PAIR MINE AND APPLY
Applier SCN set (start=2054228, current=2054228).
dbvrep> SET APPLY.INSTANTIATE_SCN NOW
Variable INSTANTIATE_SCN set to NOW for process APPLY.
dbvrep> SET MINE._PREPARE_SUPLOG_TYPE PK
Variable _PREPARE_SUPLOG_TYPE set to PK for process MINE.
dbvrep> EXCLUDE CREATE TABLE %.DBMS_TABCOMP_TEMP_UNCMP #Ignore tables
created by Compression Advisor
Exclude rule created.
dbvrep> EXCLUDE CREATE TABLE %.DBMS_TABCOMP_TEMP_CMP #Ignore tables
created by Compression Advisor
Exclude rule created.
dbvrep> EXCLUDE CREATE TABLE %.SCHEDULER$_% #Ignore tables created by
Oracle scheduler (also used by schema/full expdp/impdp)
Exclude rule created.
dbvrep> EXCLUDE CREATE TABLE %.CMP1$% #Ignore tables created by
Compression Advisor since 11.2.0.4
Exclude rule created.
dbvrep> EXCLUDE CREATE TABLE %.CMP2$% #Ignore tables created by
Compression Advisor since 11.2.0.4
Exclude rule created.
dbvrep> EXCLUDE CREATE TABLE %.CMP3$% #Ignore tables created by
Compression Advisor since 11.2.0.4
Exclude rule created.
dbvrep> EXCLUDE CREATE TABLE %.CMP4$% #Ignore tables created by
Compression Advisor since 11.2.0.4
Exclude rule created.
dbvrep> memory_set IGNORE_APPLY_DDL_DIFFERENCES Yes
Variable IGNORE_APPLY_DDL_DIFFERENCES set to YES for process *.
dbvrep> SET PREPARE_SCHEMA_EXCEPTIONS none
Variable PREPARE_SCHEMA_EXCEPTIONS set to none for process *.
dbvrep> PROCESS SUPPLEMENTAL LOGGING SCHEMA "PSI" ENABLE PRIMARY KEY
dbvrep> PROCESS SWITCH_REDOLOG
Redo log switch requested.
dbvrep> PROCESS WAIT_SCN_FLIP
Waited 1 seconds until scn_to_timestamp changed.
dbvrep> #single-scn instantiation: lock all tables and schemas
dbvrep> PROCESS LOCK SCHEMAS "PSI"
Locking all schemas.
...locked 2 of 2 tables from PSI schema.
Lock done.
dbvrep> #single-scn instantiation: unlock all tables and schemas, but
keep the SCN
dbvrep> PROCESS LOCK RELEASE LOCKS
Engine locks released.
dbvrep> 
dbvrep> #prepare the tables (we use OFFLINE as neither MINE nor APPLY
is running; with OFFLINE we won't wait on network timeout)
dbvrep> PREPARE OFFLINE SCHEMA "PSI"
Table PSI.EMPLOYE instantiated at SCN 2056800
Table PSI.OFFICE instantiated at SCN 2056800
dbvrep> 
dbvrep> #single-scn instantiation: unlock all tables and schemas,
forget the SCN (so it does not affect any further PREPARE statements)
dbvrep> PROCESS LOCK CLEAR SCN
dbvrep> PROCESS SWITCH_REDOLOG
Redo log switch requested.
dbvrep> #prepare script for instantiation
dbvrep> PROCESS PREPARE_DP WRITE DDL_FILE FILE
/home/oracle/ora2pg/APPLY.sql USERID SYSTEM/manager@PSI
File /home/oracle/ora2pg/APPLY.sql has been written successfully.
Created DDL script /home/oracle/ora2pg/APPLY.sql.
dbvrep> create ddcdb from ddcfile
DDC loaded into database (430 variables).
dbvrep> load ddcdb
DDC loaded from database (430 variables).
dbvrep> set ON_WARNING SKIP
Variable ON_WARNING set to SKIP for process *.
dbvrep> set ON_ERROR SKIP
Variable ON_ERROR set to SKIP for process *.
OK-0: Completed successfully.
WARN-1850: No DDC DB available, dictionary table does not exist.
These steps are required after the ora2pg-all.sh script runs:
 
1) Create the necessary directory(ies) on the servers:
cloud13c: /home/oracle/ora2pg
pg1: /home/oracle/ora2pg
 
2) Copy the DDC files to the server(s) where the processes will run:
pg1: /home/oracle/ora2pg/ora2pg-APPLY.ddc
cloud13c: /home/oracle/ora2pg/ora2pg-MINE.ddc
 
Ensure that the parameter TNS_ADMIN (in the ddc file) is pointing to the correct TNS_ADMIN path on each of the servers.
 
3) Review that path to dbvrep executable is correct in the run scripts:
/home/oracle/ora2pg/ora2pg-run-cloud13c.sh
/home/oracle/ora2pg/ora2pg-run-pg1.sh
 
4) Copy the run script to the server(s) where the processes will run:
cloud13c: /home/oracle/ora2pg/ora2pg-run-cloud13c.sh
pg1: /home/oracle/ora2pg/ora2pg-run-pg1.sh
 
5) Ensure firewall is open for listen interfaces 0.0.0.0:7902, 0.0.0.0:7901 used by the processes.
 
6) Make sure the data on apply are in sync as of time when setup was run.
Scripts for Data Pump/export/DDL were created as requested:
 
Create referenced database links (if any) before running the scripts.
/home/oracle/ora2pg/APPLY.sql
 
7) Start the replication processes on all servers:
cloud13c: /home/oracle/ora2pg/ora2pg-run-cloud13c.sh
pg1: /home/oracle/ora2pg/ora2pg-run-pg1.sh
 
8) Start the console to monitor the progress:
/home/oracle/ora2pg/start-console.sh

As explained you have to copy two files on the postgres server : /home/oracle/ora2pg/ora2pg-APPLY.ddc and /home/oracle/ora2pg/ora2pg-run-pg1.sh

As I choosed the option ddl_only, we have to first create the tables on the postgres server. In order to do this we can use the APPLY:sql file from the Oracle server.

The next step consist in running the MINE process on the Oracle server:

oracle@localhost:/home/oracle/ora2pg/ [PSI] . ora2pg-run-cloud13c.sh 
Initializing......done
DDC loaded from database (430 variables).
Dbvisit Replicate version 2.9.02
Copyright (C) Dbvisit Software Limited. All rights reserved.
DDC file /home/oracle/ora2pg/ora2pg-MINE.ddc loaded.
Starting process MINE...started

And we launch the APPLy process on the postgres server:

postgres@pg_essentials_p1:/home/oracle/ora2pg/ [PG1] . ora2pg-run-pg1.sh 
Initializing......done
DDC loaded from database (431 variables).
Dbvisit Replicate version 2.9.02
Copyright (C) Dbvisit Software Limited. All rights reserved.
DDC file /home/oracle/ora2pg/ora2pg-APPLY.ddc loaded.
Starting process APPLY...Created directory /home/oracle/ora2pg/ddc_backup
Created directory /home/oracle/ora2pg/log/
Created directory /home/oracle/ora2pg/log/trace/
Created directory /home/oracle/ora2pg/apply
started

Initially I had two tables in my PSI oracle database belonging to the psi schema: EMPLOYE and OFFICE. I used the APPLY.sql script to create the tables in the postgres environment.

To visualize the activity we run start_console.sh on the Oracle server:

oracle@localhost:/home/oracle/ora2pg/ [PSI] . start-console.sh 
Initializing......done
DDC loaded from database (431 variables).
Dbvisit Replicate version 2.9.02
Copyright (C) Dbvisit Software Limited. All rights reserved.
 
| Dbvisit Replicate 2.9.02(MAX edition) - Evaluation License expires in 30 days
MINE is running. Currently at plog 120 and SCN 2060066 (11/07/2017 15:27:57).
APPLY is running. Currently at plog 120 and SCN 2060021 (11/07/2017 15:27:45).
Progress of replication ora2pg:MINE->APPLY: total/this execution
-------------------------------------------------------------------------------------------------
PSI.EMPLOYE/psi.employe:      100%  Mine:1/1             Unrecov:0/0         Applied:1/1         Conflicts:0/0       Last:07/11/2017 15:20:06/OK
PSI.OFFICE/psi.office:        100%  Mine:1/1             Unrecov:0/0         Applied:1/1         Conflicts:0/0       Last:07/11/2017 15:21:36/OK
-------------------------------------------------------------------------------------------------
2 tables listed.

And we can validate that each insert in the employe or office table is replicated on the postgres server:

From the postgres database;

(postgres@[local]:5432) [postgres] > select * from psi.employe;
 name  | salary 
-------+--------
 Larry |  10000
 Bill  |   2000
(2 rows)

From the Oracle server:

SQL> insert into employe values ('John', 50000);
 
1 row created.
 
SQL> commit;
 
Commit complete.

The console is giving us correct informations:

/ Dbvisit Replicate 2.9.02(MAX edition) - Evaluation License expires in 30 days
MINE is running. Currently at plog 120 and SCN 2075526 (11/07/2017 16:44:17).
APPLY is running. Currently at plog 120 and SCN 2075494 (11/07/2017 16:44:08).
Progress of replication ora2pg:MINE->APPLY: total/this execution
-------------------------------------------------------------------------------------------------
PSI.EMPLOYE/psi.employe:      100%  Mine:3/3             Unrecov:0/0         Applied:3/3         Conflicts:0/0       Last:07/11/2017 16:18:41/OK
PSI.OFFICE/psi.office:        100%  Mine:3/3             Unrecov:0/0         Applied:3/3         Conflicts:0/0       Last:07/11/2017 15:37:02/OK
-------------------------------------------------------------------------------------------------
2 tables listed.

And the result is apllied on the postgres database:
(postgres@[local]:5432) [postgres] > select * from psi.employe;
 name  | salary 
-------+--------
 Larry |  10000
 Bill  |   2000
 John  |  50000
(3 rows)
 
As previously we have choosen the single-scn and ddl-run option, we had to run the APPLY.sql script from the Oracle server in order to create the tables on the postgres side, you can also choose in Step 2 of the configuration wizard, the load option (all replicated data is created and loaded automatically):
Lock and copy the data initially one-by-one or at a single SCN?
one-by-one : Lock tables one by one and capture SCN
single-scn : One SCN for all tables
ddl-only   : Only DDL script for target objects
resetlogs  : Use SCN from last resetlogs operation (standby activation, rman incomplete
recovery)
no-lock    : Do not lock tables. Captures previous SCN of oldest active transaction. Requires
pre-requisite running of pre-all.sh script                                                    (one-by-one/single-scn/ddl-only/resetlogs/no-lock) [single-scn] 
 
What data instantiation script to create?
ddl_file       : DDL file created (APPLY.sql)
ddl_run        : DDL is automatically executed on target
load           : All replicated data is created and loaded automatically
none                                                                                          (ddl_file/ddl_run/load/none) [ddl_run] load
Do you want to (re-)create the tables on target or keep them (they are already created)?      (create/keep) [keep] create

In this case you can visualize that each Oracle table is replicated to the Postgres server.

From the oracle server:

SQL> create table salary (name varchar2(10)); 
 
Table created.
 
SQL> insert into salary values ('Larry');
 
1 row created.
 
SQL> commit;
 
Commit complete.

The dbvist console displays correct informations:

\ Dbvisit Replicate 2.9.02(MAX edition) - Evaluation License expires in 30 days
MINE is running. Currently at plog 135 and SCN 2246259 (11/27/2017 14:44:24).
APPLY is running. Currently at plog 135 and SCN 2246237 (11/27/2017 14:44:18).
Progress of replication replic:MINE->APPLY: total/this execution
--------------------------------------------------------------------------------------
REP.SALARY:                   100%  Mine:1/1             Unrecov:0/0         Applied:1/1         Conflicts:0/0       Last:27/11/2017 14:01:25/OK
---------------------------------------------------------------------------------------------
1 tables listed.

From the postgres server:

(postgres@[local]:5432) [postgres] > select * from rep.salary;
 name  
-------
 Larry
(1 row)

The plog files generated in the postgres server contains the strings we need:

The plot files are generated on the postgres server in the directory /home/oracle/replic/apply

-bash-4.2$ ls
122.plog.gz  124.plog.gz  126.plog  128.plog.gz  130.plog.gz  132.plog.gz  134.plog
123.plog.gz  125.plog.gz  127.plog  129.plog.gz  131.plog.gz  133.plog.gz  135.plog
-bash-4.2$ strings 135.plog | grep -l larry
-bash-4.2$ strings 135.plog | grep -i larry
Larry
-bash-4.2$ strings 135.plog | grep -i salary
SALARY
create table salary (name varchar2(10))
SALARY
SALARY

Despite some problems at the beginning of my tests, the replication from Oracle to PostgreSQL is working fine and fast. There are many possibilities with Dbvisit Replicate I will try to test in the following weeks.

 

 

 

 

 

 

Cet article DBVISIT from Oracle to Postgres est apparu en premier sur Blog dbi services.

Are statistics immediately available after creating a table or an index in PostgreSQL?

$
0
0

While giving the last PostgreSQL DBA Essentials workshop this question came up: When we create a table or an index: are the statistics available automatically? To be more precise: When we create and load a table in one step, create an index on that table afterwards: Do we have the statistics available by default or do we need to wait for autovacuum to kick in or analyze manually? Lets see …

First of all lets disable autovacuum so it does not kick off analyze in the background:

postgres=# \! ps -ef | grep autov | grep -v grep
postgres  1641  1635  0 07:08 ?        00:00:00 postgres: MY_CLUSTER: autovacuum launcher process   
postgres=# alter system set autovacuum=off;
ALTER SYSTEM
postgres=# select * from pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

postgres=# \! ps -ef | grep autov | grep -v grep

Create and populate the table:

postgres=# \! cat a.sql
drop table if exists t;
create table t
as select a.*, md5(a::varchar) from generate_series(1,5000000) a;
postgres=# \i a.sql
psql:a.sql:1: NOTICE:  table "t" does not exist, skipping
DROP TABLE
SELECT 5000000

Create an index:

postgres=# create index i1 on t(a);
CREATE INDEX
postgres=# \d+ t
                                     Table "public.t"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 a      | integer |           |          |         | plain    |              | 
 md5    | text    |           |          |         | extended |              | 
Indexes:
    "i1" btree (a)

Do we have statistics already? Lets check:

postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 't'::regclass;
 stanullfrac | stawidth | stadistinct 
-------------+----------+-------------
(0 rows)

No, at least not for the table. What about the index?

postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 'i1'::regclass;
 stanullfrac | stawidth | stadistinct 
-------------+----------+-------------
(0 rows)

No. Lets analyze:

postgres=# analyze t;
ANALYZE
postgres=# analyze i1;
WARNING:  skipping "i1" --- cannot analyze non-tables or special system tables
ANALYZE

Apparently we can not analyze an index. What do we see now?

postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 't'::regclass;
 stanullfrac | stawidth | stadistinct 
-------------+----------+-------------
           0 |        4 |          -1
           0 |       33 |          -1
(2 rows)

postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 'i1'::regclass;
 stanullfrac | stawidth | stadistinct 
-------------+----------+-------------
(0 rows)

We do see statistics for the table but not for the index. The reason is that “analyze” works on the tables, but not on the indexes. For regular indexes there will be nothing in pg_statistic because that information would be redundant with the underlying table columns. But there will be statistics for function based indexes:

postgres=# create index i2 on t(lower(a::text));
CREATE INDEX
postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 'i2'::regclass;
 stanullfrac | stawidth | stadistinct 
-------------+----------+-------------
(0 rows)

postgres=# analyze t;
ANALYZE
postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 'i2'::regclass;
 stanullfrac | stawidth | stadistinct 
-------------+----------+-------------
           0 |       10 |          -1
(1 row)

So, when autovacuum is off we do not get statistics when we do not kick off a manual analyze (which is not a surprise). What happens when autovacuum is on?

postgres=# alter system set autovacuum=on;
ALTER SYSTEM
postgres=# select * from pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)
postgres=# \i a.sql
DROP TABLE
SELECT 5000000
postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 't'::regclass;
 stanullfrac | stawidth | stadistinct 
-------------+----------+-------------
(0 rows)

Nope, same picture here. But some seconds later:

postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 't'::regclass;
 stanullfrac | stawidth | stadistinct 
-------------+----------+-------------
           0 |        4 |          -1
           0 |       33 |          -1
(2 rows)

… statistics are there. Conclusion: When you require current statistics directly after loading a table you’d better kick of a manual analyze right after. Otherwise autovacuum will take care about that, but not immediately.

 

Cet article Are statistics immediately available after creating a table or an index in PostgreSQL? est apparu en premier sur Blog dbi services.


Does pg_upgrade in check mode raises a failure when the old cluster is running?

$
0
0

Today I had the pleasure to have Bruce Momjian in my session about PostgreSQL Upgrade Best Practices at the IT Tage 2017 in Frankfurt. While browsing through the various options you have for upgrading there was one slide where I claimed that the old cluster needs to be down before you run pg_upgrade in check mode as you will hit a (non-critical) failure message otherwise. Lets see if that really is the case or I did something wrong…

To start with lets initialize a new 9.6.2 cluster:

postgres@pgbox:/home/postgres/ [PG962] initdb --version
initdb (PostgreSQL) 9.6.2 dbi services build
postgres@pgbox:/home/postgres/ [PG962] initdb -D /tmp/aaa
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  en_US.UTF-8
  CTYPE:    en_US.UTF-8
  MESSAGES: en_US.UTF-8
  MONETARY: de_CH.UTF-8
  NUMERIC:  de_CH.UTF-8
  TIME:     de_CH.UTF-8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /tmp/aaa ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /tmp/aaa -l logfile start

Start that:

postgres@pgbox:/home/postgres/ [PG962] pg_ctl -D /tmp/aaa -l logfile start
postgres@pgbox:/home/postgres/ [PG962] psql -c "select version()" postgres
                                                           version                                                           
-----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.2 dbi services build on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)

Time: 0.861 ms

For being able to upgrade we’ll need a new cluster, so:

postgres@pgbox:/home/postgres/ [PG10] initdb --version
initdb (PostgreSQL) 10.0 dbi services build
postgres@pgbox:/home/postgres/ [PG10] initdb -D /tmp/bbb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  en_US.UTF-8
  CTYPE:    en_US.UTF-8
  MESSAGES: en_US.UTF-8
  MONETARY: de_CH.UTF-8
  NUMERIC:  de_CH.UTF-8
  TIME:     de_CH.UTF-8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /tmp/bbb ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /tmp/bbb -l logfile start

We’ll not start that one but will just run pg_upgrade in check mode from the new binaries:

postgres@pgbox:/home/postgres/ [PG10] pg_upgrade --version
pg_upgrade (PostgreSQL) 10.0 dbi services build
postgres@pgbox:/home/postgres/ [PG10] export PGDATAOLD=/tmp/aaa
postgres@pgbox:/home/postgres/ [PG10] export PGDATANEW=/tmp/bbb
postgres@pgbox:/home/postgres/ [PG10] export PGBINOLD=/u01/app/postgres/product/96/db_2/bin/
postgres@pgbox:/home/postgres/ [PG10] export PGBINNEW=/u01/app/postgres/product/10/db_0/bin/
postgres@pgbox:/home/postgres/ [PG10] pg_upgrade -c

*failure*
Consult the last few lines of "pg_upgrade_server.log" for
...

… and here we go. From the log:

postgres@pgbox:/home/postgres/ [PG10] cat pg_upgrade_server.log

-----------------------------------------------------------------
  pg_upgrade run on Tue Dec 12 21:23:43 2017
-----------------------------------------------------------------

command: "/u01/app/postgres/product/96/db_2/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/tmp/aaa" -o "-p 50432 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000  -c listen_addresses='' -c unix_socket_permissions=0700" start >> "pg_upgrade_server.log" 2>&1
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....FATAL:  lock file "postmaster.pid" already exists
HINT:  Is another postmaster (PID 2194) running in data directory "/tmp/aaa"?
 stopped waiting
pg_ctl: could not start server
Examine the log output.

So, @Bruce: Something to improve :)
Again: It was a pleasure to have you there and I hope we’ll meet again at one of the conferences in 2018.

 

Cet article Does pg_upgrade in check mode raises a failure when the old cluster is running? est apparu en premier sur Blog dbi services.

How we build our customized PostgreSQL Docker image

$
0
0

Docker becomes more and more popular these days and a lot of companies start to really use it. At one project we decided to build our own customized Docker image instead of using the official PostgreSQL one. The main reason for that is that we wanted to compile from source so that we only get want is really required. Why having PostgreSQL compiled with tcl support when nobody will ever use that? Here is how we did it …

To dig in right away, this is the simplified Dockerfile:

FROM debian

# make the "en_US.UTF-8" locale so postgres will be utf-8 enabled by default
ENV LANG en_US.utf8
ENV PG_MAJOR 10
ENV PG_VERSION 10.1
ENV PG_SHA256 3ccb4e25fe7a7ea6308dea103cac202963e6b746697366d72ec2900449a5e713
ENV PGDATA /u02/pgdata
ENV PGDATABASE "" \
    PGUSERNAME "" \
    PGPASSWORD ""

COPY docker-entrypoint.sh /

RUN set -ex \
        \
        && apt-get update && apt-get install -y \
           ca-certificates \
           curl \
           procps \
           sysstat \
           libldap2-dev \
           libpython-dev \
           libreadline-dev \
           libssl-dev \
           bison \
           flex \
           libghc-zlib-dev \
           libcrypto++-dev \
           libxml2-dev \
           libxslt1-dev \
           bzip2 \
           make \
           gcc \
           unzip \
           python \
           locales \
        \
        && rm -rf /var/lib/apt/lists/* \
        && localedef -i en_US -c -f UTF-8 en_US.UTF-8 \
        && mkdir /u01/ \
        \
        && groupadd -r postgres --gid=999 \
        && useradd -m -r -g postgres --uid=999 postgres \
        && chown postgres:postgres /u01/ \
        && mkdir -p "$PGDATA" \
        && chown -R postgres:postgres "$PGDATA" \
        && chmod 700 "$PGDATA" \
        \
        && curl -o /home/postgres/postgresql.tar.bz2 "https://ftp.postgresql.org/pub/source/v$PG_VERSION/postgresql-$PG_VERSION.tar.bz2" \
        && echo "$PG_SHA256 /home/postgres/postgresql.tar.bz2" | sha256sum -c - \
        && mkdir -p /home/postgres/src \
        && chown -R postgres:postgres /home/postgres \
        && su postgres -c "tar \
                --extract \
                --file /home/postgres/postgresql.tar.bz2 \
                --directory /home/postgres/src \
                --strip-components 1" \
        && rm /home/postgres/postgresql.tar.bz2 \
        \
        && cd /home/postgres/src \
        && su postgres -c "./configure \
                --enable-integer-datetimes \
                --enable-thread-safety \
                --with-pgport=5432 \
                --prefix=/u01/app/postgres/product/$PG_VERSION \\
                --with-ldap \
                --with-python \
                --with-openssl \
                --with-libxml \
                --with-libxslt" \
        && su postgres -c "make -j 4 all" \
        && su postgres -c "make install" \
        && su postgres -c "make -C contrib install" \
        && rm -rf /home/postgres/src \
        \
        && apt-get update && apt-get purge --auto-remove -y \
           libldap2-dev \
           libpython-dev \
           libreadline-dev \
           libssl-dev \
           libghc-zlib-dev \
           libcrypto++-dev \
           libxml2-dev \
           libxslt1-dev \
           bzip2 \
           gcc \
           make \
           unzip \
        && apt-get install -y libxml2 \
        && rm -rf /var/lib/apt/lists/*

ENV LANG en_US.utf8
USER postgres
EXPOSE 5432
ENTRYPOINT ["/docker-entrypoint.sh"]

We based the image on the latest Debian image, that is line 1. The following lines define the PostgreSQL version we will use and define some environment variables we will user later. What follows is basically installing all the packages required for building PostgreSQL from source, adding the operating system user and group, preparing the directories, fetching the PostgreSQL source code, configure, make and make install. Pretty much straight forward. Finally, to shrink the image, we remove all the packages that are not any more required after PostgreSQL was compiled and installed.

The final setup of the PostgreSQL instance happens in the docker-entrypoint.sh script which is referenced at the very end of the Dockerfile:

#!/bin/bash

# this are the environment variables which need to be set
PGDATA=${PGDATA}/${PG_MAJOR}
PGHOME="/u01/app/postgres/product/${PG_VERSION}"
PGAUTOCONF=${PGDATA}/postgresql.auto.conf
PGHBACONF=${PGDATA}/pg_hba.conf
PGDATABASENAME=${PGDATABASE}
PGUSERNAME=${PGUSERNAME}
PGPASSWD=${PGPASSWORD}

# create the database and the user
_pg_create_database_and_user()
{
    ${PGHOME}/bin/psql -c "create user ${PGUSERNAME} with login password '${PGPASSWD}'" postgres
    ${PGHOME}/bin/psql -c "create database ${PGDATABASENAME} with owner = ${PGUSERNAME}" postgres
}

# start the PostgreSQL instance
_pg_prestart()
{
    ${PGHOME}/bin/pg_ctl -D ${PGDATA} -w start
}

# start postgres and do not disconnect
# required for docker
_pg_start()
{
    ${PGHOME}/bin/postgres "-D" "${PGDATA}"
}

# stop the PostgreSQL instance
_pg_stop()
{
    ${PGHOME}/bin/pg_ctl -D ${PGDATA} stop -m fast
}

# initdb a new cluster
_pg_initdb()
{
    ${PGHOME}/bin/initdb -D ${PGDATA} --data-checksums
}


# adjust the postgresql parameters
_pg_adjust_config() {
    # PostgreSQL parameters
    echo "shared_buffers='128MB'" >> ${PGAUTOCONF}
    echo "effective_cache_size='128MB'" >> ${PGAUTOCONF}
    echo "listen_addresses = '*'" >> ${PGAUTOCONF}
    echo "logging_collector = 'on'" >> ${PGAUTOCONF}
    echo "log_truncate_on_rotation = 'on'" >> ${PGAUTOCONF}
    echo "log_filename = 'postgresql-%a.log'" >> ${PGAUTOCONF}
    echo "log_rotation_age = '1440'" >> ${PGAUTOCONF}
    echo "log_line_prefix = '%m - %l - %p - %h - %u@%d '" >> ${PGAUTOCONF}
    echo "log_directory = 'pg_log'" >> ${PGAUTOCONF}
    echo "log_min_messages = 'WARNING'" >> ${PGAUTOCONF}
    echo "log_autovacuum_min_duration = '60s'" >> ${PGAUTOCONF}
    echo "log_min_error_statement = 'NOTICE'" >> ${PGAUTOCONF}
    echo "log_min_duration_statement = '30s'" >> ${PGAUTOCONF}
    echo "log_checkpoints = 'on'" >> ${PGAUTOCONF}
    echo "log_statement = 'none'" >> ${PGAUTOCONF}
    echo "log_lock_waits = 'on'" >> ${PGAUTOCONF}
    echo "log_temp_files = '0'" >> ${PGAUTOCONF}
    echo "log_timezone = 'Europe/Zurich'" >> ${PGAUTOCONF}
    echo "log_connections=on" >> ${PGAUTOCONF}
    echo "log_disconnections=on" >> ${PGAUTOCONF}
    echo "log_duration=off" >> ${PGAUTOCONF}
    echo "client_min_messages = 'WARNING'" >> ${PGAUTOCONF}
    echo "wal_level = 'replica'" >> ${PGAUTOCONF}
    echo "hot_standby_feedback = 'on'" >> ${PGAUTOCONF}
    echo "max_wal_senders = '10'" >> ${PGAUTOCONF}
    echo "cluster_name = '${PGDATABASENAME}'" >> ${PGAUTOCONF}
    echo "max_replication_slots = '10'" >> ${PGAUTOCONF}
    echo "work_mem=8MB" >> ${PGAUTOCONF}
    echo "maintenance_work_mem=64MB" >> ${PGAUTOCONF}
    echo "wal_compression=on" >> ${PGAUTOCONF}
    echo "max_wal_senders=20" >> ${PGAUTOCONF}
    echo "shared_preload_libraries='pg_stat_statements'" >> ${PGAUTOCONF}
    echo "autovacuum_max_workers=6" >> ${PGAUTOCONF}
    echo "autovacuum_vacuum_scale_factor=0.1" >> ${PGAUTOCONF}
    echo "autovacuum_vacuum_threshold=50" >> ${PGAUTOCONF}
    # Authentication settings in pg_hba.conf
    echo "host    all             all             0.0.0.0/0            md5" >> ${PGHBACONF}
}

# initialize and start a new cluster
_pg_init_and_start()
{
    # initialize a new cluster
    _pg_initdb
    # set params and access permissions
    _pg_adjust_config
    # start the new cluster
    _pg_prestart
    # set username and password
    _pg_create_database_and_user
}

# check if $PGDATA exists
if [ -e ${PGDATA} ]; then
    # when $PGDATA exists we need to check if there are files
    # because when there are files we do not want to initdb
    if [ -e "${PGDATA}/base" ]; then
        # when there is the base directory this
        # probably is a valid PostgreSQL cluster
        # so we just start it
        _pg_prestart
    else
        # when there is no base directory then we
        # should be able to initialize a new cluster
        # and then start it
        _pg_init_and_start
    fi
else
    # initialze and start the new cluster
    _pg_init_and_start
    # create PGDATA
    mkdir -p ${PGDATA}
    # create the log directory
    mkdir -p ${PGDATA}/pg_log
fi
# restart and do not disconnect from the postgres daemon
_pg_stop
_pg_start

The important point here is: PGDATA is a persistent volume that is linked into the Docker container. When the container comes up we need to check if something that looks like a PostgreSQL data directory is already there. If yes, then we just start the instance with what is there. If nothing is there we create a new instance. Remember: This is just a template and you might need to do more checks in your case. The same is true for what we add to pg_hba.conf here: This is nothing you should do on real systems but can be handy for testing.

Hope this helps …

 

Cet article How we build our customized PostgreSQL Docker image est apparu en premier sur Blog dbi services.

Backup and Restore PostgreSQL with PgBackRest I

$
0
0

Many tools can be used to backup PostgreSQL databases. In this blog I will talk about PgBackRest which is a simple tool that can be used to backup and restore a PostgreSQL database. Full, differential, and incremental backups are supported.
In this first blog I will present a basic configuration of pgbackprest. Our configuration is composed of only one cluster and pgbackrest is installed on the server hosting the database. The goal is to explain a first use of PgBackRest.
Below our configuration
Server with Oracle Linux 7
PostgreSQL 10.1
PgBackRest 1.28
We supposed that the linux box and PostgreSQL 10.1 are already installed. So let’s install PgBackRest.

root@pgserver ~]# yum search pgbackrest
Loaded plugins: langpacks, ulninfo
=========================== N/S matched: pgbackrest ============================
pgbackrest.noarch : Reliable PostgreSQL Backup & Restore
pgbackrest.x86_64 : Reliable PostgreSQL Backup & Restore
Name and summary matches only, use "search all" for everything

And then we can install PgBackRest
[root@pgserver ~]# yum install pgbackrest.x86_64
After we can check the installation using pgbackrest command

[postgres@pgserver ~]$ /usr/bin/pgbackrest
pgBackRest 1.28 - General help
Usage:
pgbackrest [options] [command] Commands:
archive-get Get a WAL segment from the archive.
archive-push Push a WAL segment to the archive.
backup Backup a database cluster.
check Check the configuration.
expire Expire backups that exceed retention.
help Get help.
info Retrieve information about backups.
restore Restore a database cluster.
stanza-create Create the required stanza data.
stanza-delete Delete a stanza.
stanza-upgrade Upgrade a stanza.
start Allow pgBackRest processes to run.
stop Stop pgBackRest processes from running.
version Get version.
Use 'pgbackrest help [command]' for more information.

The configuration of PgBackRest is very easy, it consists of a configuration pgbackrest.conf file that must be edited. In my case the file is located in /etc. As specified, we will use a very basic configuration file.
Below the contents of my configuration file

[root@pgserver etc]# cat pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2
[root@pgserver etc]#

In the file above,
• repo-path is where backup will be stored,
• clusterpgserver is the name of my cluster stanza (free to take what you want as name). A stanza is the configuration for a PostgreSQL database cluster that defines where it is located, how it will be backed up, archiving options, etc.
• db-path is the path of my database files
• retention-full : configure retention to 2 full backups
A complete list can be found here
Once the configuration file done, we can now create the stanza with the command create-stanza. Note that my PostgreSQL cluster is using the port 5435.

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 stanza-create
2018-02-08 14:01:49.293 P00 INFO: stanza-create command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-08 14:01:50.707 P00 INFO: stanza-create command end: completed successfully
[postgres@pgserver ~]$

After we create the stanza, we can verify that the configuration is fine using the check command

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 check
2018-02-08 14:03:42.095 P00 INFO: check command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-08 14:03:48.805 P00 INFO: WAL segment 00000001000000000000000C successfully stored in the archive at '/var/lib/pgbackrest/archive/clustpgserver/10-1/0000000100000000/00000001000000000000000C-c387b901a257bac304f27865478fd9f768de83d6.gz'
2018-02-08 14:03:48.808 P00 INFO: check command end: completed successfully
[postgres@pgserver ~]$

Since we did not take yet any backup with PgBackRest, the command info for the backups returns error

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info info
stanza: clustpgserver
status: error (no valid backups)
db (current)
wal archive min/max (10-1): 00000001000000000000000C / 00000001000000000000000C
[postgres@pgserver ~]$

Now let’s take a backup

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 backup
2018-02-08 14:06:52.706 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver
WARN: no prior backup exists, incr backup has been changed to full
2018-02-08 14:06:54.734 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-08 14:06:53": backup begins after the next regular checkpoint completes
2018-02-08 14:06:55.159 P00 INFO: backup start archive = 00000001000000000000000E, lsn = 0/E000060
2018-02-08 14:07:09.867 P01 INFO: backup file /var/lib/pgsql/10/data/base/13805/1255 (592KB, 2%) checksum 61f284092cabf44a30d1442ef6dd075b2e346b7f


2018-02-08 14:08:34.709 P00 INFO: expire command begin 1.28: --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-08 14:08:34.895 P00 INFO: full backup total < 2 - using oldest full backup for 10-1 archive retention
2018-02-08 14:08:34.932 P00 INFO: expire command end: completed successfully
[postgres@pgserver ~]$

We can see that by default PgBackRest will try to do an incremental backup. But as there is no full backup yet, a full backup will be done. Once full backup done, all future backups will be incremental unless we specify the type of backup.

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 backup
2018-02-08 14:26:25.590 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver
2018-02-08 14:26:29.314 P00 INFO: last backup label = 20180208-140653F, version = 1.28
2018-02-08 14:26:30.135 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-08 14:26:26": backup begins after the next regular checkpoint completes
...
2018-02-08 14:27:01.408 P00 INFO: expire command begin 1.28: --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-08 14:27:01.558 P00 INFO: full backup total < 2 - using oldest full backup for 10-1 archive retention
2018-02-08 14:27:01.589 P00 INFO: expire command end: completed successfully
[postgres@pgserver ~]$

If we want to perform another full backup we can specify the option –type=full

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 --type=full backup
2018-02-08 14:30:05.961 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver --type=full
2018-02-08 14:30:08.472 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-08 14:30:06": backup begins after the next regular checkpoint completes
2018-02-08 14:30:08.993 P00 INFO: backup start archive = 000000010000000000000012, lsn = 0/12000028
….
….

To have info about our backups
[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 00000001000000000000000E / 000000010000000000000012
full backup: 20180208-140653F
timestamp start/stop: 2018-02-08 14:06:53 / 2018-02-08 14:08:19
wal start/stop: 00000001000000000000000E / 00000001000000000000000E
database size: 23.2MB, backup size: 23.2MB
repository size: 2.7MB, repository backup size: 2.7MB
incr backup: 20180208-140653F_20180208-142626I
timestamp start/stop: 2018-02-08 14:26:26 / 2018-02-08 14:26:52
wal start/stop: 000000010000000000000010 / 000000010000000000000010
database size: 23.2MB, backup size: 8.2KB
repository size: 2.7MB, repository backup size: 472B
backup reference list: 20180208-140653F
full backup: 20180208-143006F
timestamp start/stop: 2018-02-08 14:30:06 / 2018-02-08 14:31:30
wal start/stop: 000000010000000000000012 / 000000010000000000000012
database size: 23.2MB, backup size: 23.2MB
repository size: 2.7MB, repository backup size: 2.7MB
[postgres@pgserver ~]$

Now that we see how to perform backup with pgbackrest, let’s see how to restore.
First let identify the directory of our database files

[postgres@pgserver ~]$ psql
psql (10.1)
Type "help" for help.
postgres=# show data_directory ;
data_directory
------------------------
/var/lib/pgsql/10/data
(1 row)
postgres=#

And let’s remove all files in the directory

[postgres@pgserver data]$ pwd
/var/lib/pgsql/10/data
[postgres@pgserver data]$ ls
base pg_dynshmem pg_notify pg_stat_tmp pg_wal postmaster.pid
current_logfiles pg_hba.conf pg_replslot pg_subtrans pg_xact
global pg_ident.conf pg_serial pg_tblspc postgresql.auto.conf
log pg_logical pg_snapshots pg_twophase postgresql.conf
pg_commit_ts pg_multixact pg_stat PG_VERSION postmaster.opts
[postgres@pgserver data]$ rm -rf *
[postgres@pgserver data]$

Now if we try to connect, of course we will get errors

[postgres@pgserver data]$ psql
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5435"?
[postgres@pgserver data]$

So let’s restore with PgBackRest with the restore command

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info restore
2018-02-08 14:52:01.845 P00 INFO: restore command begin 1.28: --db1-path=/var/lib/pgsql/10/data --log-level-console=info --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-08 14:52:03.490 P00 INFO: restore backup set 20180208-143006F
2018-02-08 14:52:21.904 P01 INFO: restore file /var/lib/pgsql/10/data/base/13805/1255 (592KB, 2%) checksum 61f284092cabf44a30d1442ef6dd075b2e346b7f
….
….
2018-02-08 14:53:21.186 P00 INFO: write /var/lib/pgsql/10/data/recovery.conf
2018-02-08 14:53:23.948 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2018-02-08 14:53:28.258 P00 INFO: restore command end: completed successfully
[postgres@pgserver ~]$

At the end of the backup, a recovery.conf file is created in the data directory

[postgres@pgserver data]$ cat recovery.conf
restore_command = '/usr/bin/pgbackrest --log-level-console=info --stanza=clustpgserver archive-get %f "%p"'

Now we can restart the PostgreSQL cluster

[postgres@pgserver data]$ pg_ctl start
waiting for server to start....2018-02-08 14:57:06.519 CET [4742] LOG: listening on IPv4 address "0.0.0.0", port 5435
2018-02-08 14:57:06.522 CET [4742] LOG: listening on IPv6 address "::", port 5435
2018-02-08 14:57:06.533 CET [4742] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5435"
2018-02-08 14:57:06.551 CET [4742] LOG: listening on Unix socket "/tmp/.s.PGSQL.5435"
2018-02-08 14:57:06.645 CET [4742] LOG: redirecting log output to logging collector process
2018-02-08 14:57:06.645 CET [4742] HINT: Future log output will appear in directory "log".
...... done
server started

And then connect

[postgres@pgserver data]$ psql
psql (10.1)
Type "help" for help.
postgres=#

Conclusion
In this blog we shown in a simple configuration how to perform backup using PgBackRest. This basic configuration can help for first use of PgBackRest. In future articles we will go further in an advanced use of this tool.

 

Cet article Backup and Restore PostgreSQL with PgBackRest I est apparu en premier sur Blog dbi services.

Backup and Restore PostgreSQL with PgBackRest II

$
0
0

In a precedent blog I shown a basic utilization of PgBackRest which is a tool to backup and restore PostgreSQL databases. In this blog I am going to talk some useful features of this tool. In practical examples we will see some tasks we can do with this tool. Of course the official documentation remains the best source of knowledges.

Encryption
Nowadays encryption of backups is very critical and is mandatory for many companies. PgBackRest allows us to encrypt the repository where backups are stored. A passphrase is used to encrypt/decrypt files of the repository. As you may already know, it is recommended to use a strong passphrase. In the following demonstration we use the openssl to generate a passphrase.

[postgres@pgserver ~]$ openssl rand -base64 48
FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
[postgres@pgserver ~]$

Once the passphrase generated, we can update the PgBackRest configuration file with
2 options: repo-cipher-pass and repo-cipher-type

[postgres@pgserver clustpgserver]$ cat /etc/pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
repo-cipher-pass=FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
repo-cipher-type=aes-256-cbc
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2

The next step is to create the stanza

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 stanza-create
2018-02-13 13:54:50.447 P00 INFO: stanza-create command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-13 13:55:04.520 P00 INFO: stanza-create command end: completed successfully
[postgres@pgserver ~]$

As we can see the system automatically detect that the repository is encrypted and then will rewrite the command including the –repo-cipher-pass and the –repo-cipher-type options. After the creation of the stanza we can check the status of our stanza

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 check
2018-02-13 13:56:08.999 P00 INFO: check command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-13 13:57:08.026 P00 INFO: WAL segment 00000002000000000000004C successfully stored in the archive at '/var/lib/pgbackrest/archive/clustpgserver/10-1/0000000200000000/00000002000000000000004C-f5ced60cd351d74a91c9ce2e913b761144165e28.gz'
2018-02-13 13:57:08.030 P00 INFO: check command end: completed successfully

Everything seems fine, so let’s run a backup. Note that outputs are truncated

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 backup
2018-02-13 14:01:40.012 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver
WARN: no prior backup exists, incr backup has been changed to full
2018-02-13 14:01:54.118 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-13 14:01:52": backup begins after the next regular checkpoint completes
...
type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-13 14:35:08.281 P00 INFO: full backup total < 2 - using oldest full backup for 10-1 archive retention
2018-02-13 14:35:08.801 P00 INFO: expire command end: completed successfully
[postgres@pgserver ~]$

In a non-encrypted repository, file backup.info can be read. Now with encryption the if we try to read the file backup.info in the repository, we cannot.

[postgres@pgserver clustpgserver]$ less /var/lib/pgbackrest/backup/clustpgserver/backup.info
"/var/lib/pgbackrest/backup/clustpgserver/backup.info" may be a binary file. See it anyway?

And using the command strings, we can see that the file is encrypted.

[postgres@pgserver clustpgserver]$ strings /var/lib/pgbackrest/backup/clustpgserver/backup.info
Salted__Fx
.;Ru
cz4@
do:t
\pi3"E
VUSO
}a.R*
Wx5M
,?,W
3CXWB
[postgres@pgserver clustpgserver]$

From now, backups cannot be used unless the password is provided.

Restore in another location
PgBackRest allows to restore to another location. This can be useful if we want to duplicate our cluster on the same server or to another server. In the following demonstration, let’s duplicate on the same server.
The data directory of the source cluster is /var/lib/pgsql/10/data

postgres=# show data_directory;
data_directory
------------------------
/var/lib/pgsql/10/data
(1 row)
postgres=#

To duplicate to a new data directory /u01/devdata for example, the option –db-path is used

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-path=/u01/devdata restore


2018-02-14 09:40:05.755 P01 INFO: restore file /u01/devdata/base/1/13657 (0B, 100%)
2018-02-14 09:40:05.773 P01 INFO: restore file /u01/devdata/base/1/13652 (0B, 100%)
2018-02-14 09:40:05.811 P01 INFO: restore file /u01/devdata/base/1/13647 (0B, 100%)
2018-02-14 09:40:05.983 P01 INFO: restore file /u01/devdata/base/1/13642 (0B, 100%)
2018-02-14 09:40:06.067 P00 INFO: write /u01/devdata/recovery.conf
2018-02-14 09:40:14.403 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2018-02-14 09:40:30.187 P00 INFO: restore command end: completed successfully

After the duplicate don’t forget to change the port (as we are in the same server) and then start your new cluster

postgres=# show data_directory ;
data_directory
----------------
/u01/devdata
(1 row)
postgres=#

Restore specific databases
With PgBackRest, we can restore specific user databases. Note that built-in databases (template0, template1 and postgres) are always restored.
Let’s show an example. In our source cluster we actually have two databases test and sandbox.

sandbox=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
sandbox | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

In sandbox we have a table mytab with 2 rows

sandbox=# \c sandbox
You are now connected to database "sandbox" as user "postgres".
sandbox=# table mytab;
id
----
1
2
(2 rows)

Now let’s restore the cluster but only with test database, the option –db-include will be used.

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-path=/u01/devdata --db-include=test restore
2018-02-14 10:11:00.948 P00 INFO: restore command begin 1.28: --db-include=test=1 --db1-path=/u01/devdata --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-14 10:11:05.137 P00 INFO: restore backup set 20180214-095439F_20180214-100446I
2018-02-14 10:11:25.110 P00 INFO: remap $PGDATA directory to /u01/devdata
...

After the restore completed, let’s start the new cluster and let’s verify present databases.

[postgres@pgserver devdata]$ psql -p 5436
psql (10.1)
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
sandbox | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

What!! the sandbox is still present despite the use of option –include-db=test. But if we try to connect to sandbox database. We get an error.

postgres=# \c sandbox
FATAL: relation mapping file "base/24581/pg_filenode.map" contains invalid data
Previous connection kept
postgres=#

And if we compare at OS level the size of files of the database at the source cluster and at the target

[postgres@pgserver log]$ du -sh /var/lib/pgsql/10/data/base/24581
7.8M /var/lib/pgsql/10/data/base/24581
[postgres@pgserver log]$ du -sh /u01/devdata/base/24581
16K /u01/devdata/base/24581
[postgres@pgserver log]$

We can see that at the target cluster, sandbox uses less disk space during the selective restore than it would have if the entire database had been restored. To finish the selective restore, we have to manually drop the sandbox database. Indeed PgBackRest cannot automatically drop the database because the cluster is not accessible until the recovery process finishes.

postgres=# drop database sandbox;
DROP DATABASE
postgres=#

Automatic cleanup of expired backups
Another nice feature of PgBackRest is that expired backups are automatically removed.
If we check our pgbackrest.conf file, we see that the retention-full is set to 2. This means that 2 full backups will be maintained. So if we do a third full backup, the first full backup and all corresponding incremental and differential backups will be expired and removed

[postgres@pgserver log]$ cat /etc/pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
repo-cipher-pass=FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
repo-cipher-type=aes-256-cbc
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2
[postgres@pgserver log]$

Let’s do a quick demonstration. Actually we have 2 full backups

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 00000002000000000000004E / 000000020000000000000056
full backup: 20180213-140152F
timestamp start/stop: 2018-02-13 14:01:52 / 2018-02-13 14:32:00
wal start/stop: 00000002000000000000004E / 00000002000000000000004E
database size: 577MB, backup size: 577MB
repository size: 28.8MB, repository backup size: 28.8MB
incr backup: 20180213-140152F_20180213-152509I
timestamp start/stop: 2018-02-14 09:31:03 / 2018-02-14 09:33:17
wal start/stop: 000000020000000000000052 / 000000020000000000000052
database size: 30.7MB, backup size: 285.3KB
repository size: 3.6MB, repository backup size: 24.3KB
backup reference list: 20180213-140152F
full backup: 20180214-095439F
timestamp start/stop: 2018-02-14 09:54:39 / 2018-02-14 09:58:53
wal start/stop: 000000020000000000000054 / 000000020000000000000054
database size: 30.7MB, backup size: 30.7MB
repository size: 3.6MB, repository backup size: 3.6MB
incr backup: 20180214-095439F_20180214-100446I
timestamp start/stop: 2018-02-14 10:04:46 / 2018-02-14 10:07:43
wal start/stop: 000000020000000000000056 / 000000020000000000000056
database size: 38.3MB, backup size: 7.6MB
repository size: 4.5MB, repository backup size: 928.5KB
backup reference list: 20180214-095439F
[postgres@pgserver log]$

And we can confirm by executing a simple ls in the repository where backups are stored

[postgres@pgserver clustpgserver]$ ls -ld *
drwxr-x---. 3 postgres postgres 69 Feb 13 14:32 20180213-140152F
drwxr-x---. 3 postgres postgres 69 Feb 14 09:33 20180213-140152F_20180213-152509I
drwxr-x---. 3 postgres postgres 69 Feb 14 09:59 20180214-095439F
drwxr-x---. 3 postgres postgres 69 Feb 14 10:07 20180214-095439F_20180214-100446I
drwxr-x---. 3 postgres postgres 17 Feb 13 14:33 backup.history
-rw-r-----. 1 postgres postgres 2992 Feb 14 10:08 backup.info
-rw-r-----. 1 postgres postgres 2992 Feb 14 10:08 backup.info.copy
lrwxrwxrwx. 1 postgres postgres 33 Feb 14 10:08 latest -> 20180214-095439F_20180214-100446I
[postgres@pgserver clustpgserver]$ ls -ld
drwxr-x---. 7 postgres postgres 4096 Feb 14 10:08 .
[postgres@pgserver clustpgserver]$ ls -ld *

Now let’s do a third full backup

[postgres@pgserver clustpgserver]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 --type=full backup
2018-02-14 10:55:52.250 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver --type=full

2018-02-14 11:19:02.001 P00 INFO: backup command end: completed successfully
2018-02-14 11:19:02.107 P00 INFO: expire command begin 1.28: --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-14 11:19:02.928 P00 INFO: expire full backup set: 20180213-140152F, 20180213-140152F_20180213-152509I
2018-02-14 11:22:08.759 P00 INFO: remove expired backup 20180213-140152F_20180213-152509I
2018-02-14 11:22:09.000 P00 INFO: remove expired backup 20180213-140152F

2018-02-14 11:22:49.387 P00 INFO: expire command end: completed successfully
[postgres@pgserver clustpgserver]$

We can see that at the end of backups, some old backups are expired and removed. We can also confirm this by listing files in the repository

[postgres@pgserver clustpgserver]$ ls -ld *
drwxr-x---. 3 postgres postgres 69 Feb 14 09:59 20180214-095439F
drwxr-x---. 3 postgres postgres 69 Feb 14 10:07 20180214-095439F_20180214-100446I
drwxr-x---. 3 postgres postgres 69 Feb 14 11:13 20180214-105603F
drwxr-x---. 3 postgres postgres 17 Feb 13 14:33 backup.history
-rw-r-----. 1 postgres postgres 2320 Feb 14 11:19 backup.info
-rw-r-----. 1 postgres postgres 2320 Feb 14 11:20 backup.info.copy
lrwxrwxrwx. 1 postgres postgres 16 Feb 14 11:14 latest -> 20180214-105603F
[postgres@pgserver clustpgserver]$

Point-in-Time Recovery
PgBackRest can also do a point-in-time recovery. Let’s drop table article in the database test

test=# table article;
nom
---------
printer
(1 row)
.
test=# select now();
now
-------------------------------
2018-02-14 11:39:28.024378+01
(1 row)
.
test=# drop table article;
DROP TABLE
.
test=# table article;
ERROR: relation "article" does not exist
LINE 1: table article;
^
test=#

And now let’s restore until just before we drop the table let’s say 2018-02-14 11:39:28.
But as we have many backup sets we have to restore from a backup done before the table was dropped.
If we check our backups, we have to restore from the full backup: 20180214-105603F
which was taken before table article was dropped.

[postgres@pgserver devdata]$ pgbackrest --stanza=clustpgserver --log-level-console=info info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 000000020000000000000054 / 00000002000000000000005A
full backup: 20180214-095439F
timestamp start/stop: 2018-02-14 09:54:39 / 2018-02-14 09:58:53
wal start/stop: 000000020000000000000054 / 000000020000000000000054
database size: 30.7MB, backup size: 30.7MB
repository size: 3.6MB, repository backup size: 3.6MB
incr backup: 20180214-095439F_20180214-100446I
timestamp start/stop: 2018-02-14 10:04:46 / 2018-02-14 10:07:43
wal start/stop: 000000020000000000000056 / 000000020000000000000056
database size: 38.3MB, backup size: 7.6MB
repository size: 4.5MB, repository backup size: 928.5KB
backup reference list: 20180214-095439F
full backup: 20180214-105603F
timestamp start/stop: 2018-02-14 10:56:03 / 2018-02-14 11:12:26

wal start/stop: 000000020000000000000058 / 000000020000000000000058
database size: 38.3MB, backup size: 38.3MB
repository size: 4.5MB, repository backup size: 4.5MB
incr backup: 20180214-105603F_20180214-121044I
timestamp start/stop: 2018-02-14 12:10:44 / 2018-02-14 12:15:14
wal start/stop: 00000002000000000000005A / 00000002000000000000005A
database size: 38.3MB, backup size: 1.1MB
repository size: 4.5MB, repository backup size: 140.8KB
backup reference list: 20180214-105603F
[postgres@pgserver devdata]$

For the restore we use the option –set which allows us to specify the backup set we want to use for the restore. Note also the use of –type=time and –target

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --type=time "--target=2018-02-14 11:39:28.024378+01" --db-path=/u01/devdata --set=20180214-105603F restore
2018-02-14 13:36:50.848 P00 INFO: restore command begin 1.28: --db1-path=/u01/devdata --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --set=20180214-105603F --stanza=clustpgserver "--target=2018-02-14 11:39:28.024378+01" --type=time
2018-02-14 13:37:03.406 P00 INFO: restore backup set 20180214-105603F
...

At the end of the restore let’s see the contents of the recovery.done file

[postgres@pgserver devdata]$ cat recovery.conf
restore_command = '/usr/bin/pgbackrest --db1-path=/u01/devdata --log-level-console=info --stanza=clustpgserver archive-get %f "%p"'
recovery_target_time = '2018-02-14 11:39

If we start our new cluster, we can see in log files that PITR is starting
2018-02-14 13:54:23.824 CET [10049] LOG: starting point-in-time recovery to 2018-02-14 11:39:28.024378+01
And once the recovery finished, we can verify that the table article is present

postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# \d article
Table "public.article"
Column | Type | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
nom | character varying(50) | | |
.
test=# table article;
nom
---------
printer
(1 row)
test=#

Conclusion:
In this blog I talked about some features about PgBackRest. But as already specified, there are many, many other options with this wonderful tool. The official documentation can give more information. In future blogs we will explore more with advanced configuration.

 

Cet article Backup and Restore PostgreSQL with PgBackRest II est apparu en premier sur Blog dbi services.

New SHA-2 functions showing up in PostgreSQL 11

$
0
0

A recent commit announced new SHA-2 functions coming up in PostgreSQL 11. Until now you can use the md5 function to generate hashes for test data or whatever you want. This commit adds more functions you can use for that. Let’s see how they work.

When you want to try what follows make sure you are on the development version of PostgreSQL. You can find a little howto here.

For generating test data in PostgreSQL I often use things like that:

postgres@pgbox:/home/postgres/ [PGDEV] psql
psql (11devel)
Type "help" for help.

postgres=# \! cat a.sql
drop table if exists t1;
create table t1 as
  select a.*
       , md5(a::varchar)
    from generate_series (1,1000000) a;
postgres=# \i a.sql
psql:a.sql:1: NOTICE:  table "t1" does not exist, skipping
DROP TABLE
SELECT 1000000
postgres=# select * from t1 limit 5;
 a |               md5                
---+----------------------------------
 1 | c4ca4238a0b923820dcc509a6f75849b
 2 | c81e728d9d4c2f636f067f89cc14862c
 3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
 4 | a87ff679a2f3e71d9181a67b7542122c
 5 | e4da3b7fbbce2345d7772b0674a318d5
(5 rows)

Now we have more function to chose from:

postgres=# \df *sha*
                                       List of functions
   Schema   |               Name               | Result data type | Argument data types | Type 
------------+----------------------------------+------------------+---------------------+------
 pg_catalog | pg_advisory_lock_shared          | void             | bigint              | func
 pg_catalog | pg_advisory_lock_shared          | void             | integer, integer    | func
 pg_catalog | pg_advisory_unlock_shared        | boolean          | bigint              | func
 pg_catalog | pg_advisory_unlock_shared        | boolean          | integer, integer    | func
 pg_catalog | pg_advisory_xact_lock_shared     | void             | bigint              | func
 pg_catalog | pg_advisory_xact_lock_shared     | void             | integer, integer    | func
 pg_catalog | pg_relation_is_publishable       | boolean          | regclass            | func
 pg_catalog | pg_stat_reset_shared             | void             | text                | func
 pg_catalog | pg_try_advisory_lock_shared      | boolean          | bigint              | func
 pg_catalog | pg_try_advisory_lock_shared      | boolean          | integer, integer    | func
 pg_catalog | pg_try_advisory_xact_lock_shared | boolean          | bigint              | func
 pg_catalog | pg_try_advisory_xact_lock_shared | boolean          | integer, integer    | func
 pg_catalog | sha224                           | bytea            | bytea               | func
 pg_catalog | sha256                           | bytea            | bytea               | func
 pg_catalog | sha384                           | bytea            | bytea               | func
 pg_catalog | sha512                           | bytea            | bytea               | func

Using the same test script as before but with the sha224 function:

postgres=# \! cat a.sql
drop table if exists t1;
create table t1 as
  select a.*
       , sha224(a::text::bytea)
    from generate_series (1,1000000) a;
postgres=# \i a.sql
DROP TABLE
SELECT 1000000
postgres=# select * from t1 limit 5;
 a |                           sha224                           
---+------------------------------------------------------------
 1 | \xe25388fde8290dc286a6164fa2d97e551b53498dcbf7bc378eb1f178
 2 | \x58b2aaa0bfae7acc021b3260e941117b529b2e69de878fd7d45c61a9
 3 | \x4cfc3a1811fe40afa401b25ef7fa0379f1f7c1930a04f8755d678474
 4 | \x271f93f45e9b4067327ed5c8cd30a034730aaace4382803c3e1d6c2f
 5 | \xb51d18b551043c1f145f22dbde6f8531faeaf68c54ed9dd79ce24d17
(5 rows)

You can use the other functions in the same way, of course.

 

Cet article New SHA-2 functions showing up in PostgreSQL 11 est apparu en premier sur Blog dbi services.

Configuring huge pages for your PostgreSQL instance, RedHat/CentOS version

$
0
0

Almost every PostgreSQL I get in touch with is not configured to use huge pages, which is quite a surprise as it can give you a performance boost. Actually it is not the PostgreSQL instance you need to configure but the operating system to provide that. PostgreSQL will use huge pages by default when they are configured and will fall back to normal pages otherwise. The parameter which controls that in PostgreSQL is huge_pages which defaults to “try” leading to the behavior just described: Try to get them, otherwise use normal pages. Lets see how you can do that on RedHat and CentOS. I’ll write another post about how you do that for Debian based distributions shortly.

What you need to know is that RedHat as well as CentOS come with tuned profiles by default. This means kernel parameters and other settings are managed through profiles dynamically and not anymore by adjusting /etc/sysctl (although that works as well). When you are in virtualized environment (VirtualBox in my case) you probably will see something like this:

postgres@pgbox:/home/postgres/ [PG10] tuned-adm active
Current active profile: virtual-guest

Virtual guest is maybe not the best solution for database server as it comes with those settings (especially vm.dirty_ratio and vm.swappiness):

postgres@pgbox:/home/postgres/ [PG10] cat /usr/lib/tuned/virtual-guest/tuned.conf  | egrep -v "^$|^#"
[main]
summary=Optimize for running inside a virtual guest
include=throughput-performance
[sysctl]
vm.dirty_ratio = 30
vm.swappiness = 30

What we do at dbi services is to provide our own profile which adjusts the settings better suited for a database server.

postgres@pgbox:/home/postgres/ [PG10]  cat /etc/tuned/dbi-postgres/tuned.conf | egrep -v "^$|^#"
[main]
summary=dbi services tuned profile for PostgreSQL servers
[cpu]
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[disk]
readahead=>4096
[sysctl]
vm.overcommit_memory=2
vm.swappiness=0
vm.dirty_ratio=2
vm.dirty_background_ratio=1

What has all this to do with larges pages you might think. Well, tuning profiles can also be used to configure them and for us this is the preferred method because we can do it all in one file. But we before we do that lets look at the PostgreSQL instance:

postgres=# select version();
                                                          version                                                           
----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.0 build on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)

postgres=# show huge_pages;
 huge_pages 
------------
 try
(1 row)

As said at the beginning of this post the default behavior of PostgreSQL is to use them if available. The question now is: How can you check if you have huge pages configured on the operating system level? The answer is in the virtual /proc/meminfo file:

postgres=# \! cat /proc/meminfo | grep -i huge
AnonHugePages:      6144 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Alle “HugePages” statistics report a zero so this system definitely is not configured to provide huge pages to PostgreSQL. AnonHugePages is for Transparent Hugepage and it is common recommendation to disable them for database servers. So we have two tasks to complete:

  • Disable transparent huge pages
  • Configure the system to provide enough huge pages for our PostgreSQL instance

For disabling transparent huge pages we just need to add the following lines to our tuning profile:

postgres@pgbox:/home/postgres/ [PG10] sudo echo "[vm]
> transparent_hugepages=never" >> /etc/tuned/dbi-postgres/tuned.conf

When transparent huge pages are enabled you can see that in the following file:

postgres@pgbox:/home/postgres/ [PG10] cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

Once we switch the profile to our own profile:

postgres@pgbox:/home/postgres/ [PG10] sudo tuned-adm profile dbi-postgres
postgres@pgbox:/home/postgres/ [PG10] sudo tuned-adm active
Current active profile: dbi-postgres

… you’ll notice that it is disabled from now on:

postgres@pgbox:/home/postgres/ [PG10] cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

Task one completed. For configuring the operating system to provide huge pages for our PostgreSQL we need to know how many huge pages we require. How do we do that? The procedure is documented in the PostgreSQL documentation. Basically you start your instance and then check how many you would require. In my case, to get the PID of the postmaster process:

postgres@pgbox:/home/postgres/ [PG10] head -1 $PGDATA/postmaster.pid
1640

To get the VmPeak for that process:

postgres@pgbox:/home/postgres/ [PG10] grep ^VmPeak /proc/1640/status
VmPeak:	  344340 kB

As the huge page size is 2MB on my system (which should be default for most systems):

postgres@pgbox:/home/postgres/ [PG10] grep ^Hugepagesize /proc/meminfo
Hugepagesize:       2048 kB

… we will require at least 344340/2048 huge pages for this PostgreSQL instance:

postgres@pgbox:/home/postgres/ [PG10] echo "344340/2048" | bc
168

All we need to do is to add this to our tuning profile in the “[sysctl]” section:

postgres@pgbox:/home/postgres/ [PG10] grep nr_hugepages /etc/tuned/dbi-postgres/tuned.conf 
vm.nr_hugepages=170

Re-set the profile and we’re done:

postgres@pgbox:/home/postgres/ [PG10] sudo tuned-adm profile dbi-postgres
postgres@pgbox:/home/postgres/ [PG10] cat /proc/meminfo | grep -i huge
AnonHugePages:      4096 kB
HugePages_Total:     170
HugePages_Free:      170
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

This confirms that we now have 170 huge pages of which all of them are free to consume. Now lets configure PostgreSQL to only start when it can get the amount of huge pages required by switching the “huge_pages” parameter to “on” and restart the instance:

postgres@pgbox:/home/postgres/ [PG10] psql -c "alter system set huge_pages=on" postgres
ALTER SYSTEM
Time: 0.719 ms
postgres@pgbox:/home/postgres/ [PG10] pg_ctl -D $PGDATA restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....2018-02-25 11:21:29.107 CET - 1 - 3170 -  - @ LOG:  listening on IPv4 address "0.0.0.0", port 5441
2018-02-25 11:21:29.107 CET - 2 - 3170 -  - @ LOG:  listening on IPv6 address "::", port 5441
2018-02-25 11:21:29.110 CET - 3 - 3170 -  - @ LOG:  listening on Unix socket "/tmp/.s.PGSQL.5441"
2018-02-25 11:21:29.118 CET - 4 - 3170 -  - @ LOG:  redirecting log output to logging collector process
2018-02-25 11:21:29.118 CET - 5 - 3170 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started

As the instance started all should be fine and we can confirm that by looking at the statistics in /proc/meminfo:

postgres@pgbox:/home/postgres/ [PG10] cat /proc/meminfo | grep -i huge
AnonHugePages:      4096 kB
HugePages_Total:     170
HugePages_Free:      162
HugePages_Rsvd:       64
HugePages_Surp:        0
Hugepagesize:       2048 kB

You might be surprised that not all (actually only 8) huge pages are used right now but this will change as soon as you put some load on the system:

postgres=# create table t1 as select * from generate_series(1,1000000);
SELECT 1000000
postgres=# select count(*) from t1;
  count  
---------
 1000000
(1 row)

postgres=# \! cat /proc/meminfo | grep -i huge
AnonHugePages:      4096 kB
HugePages_Total:     170
HugePages_Free:      153
HugePages_Rsvd:       55
HugePages_Surp:        0
Hugepagesize:       2048 kB
postgres=# 

Hope this helps …

 

Cet article Configuring huge pages for your PostgreSQL instance, RedHat/CentOS version est apparu en premier sur Blog dbi services.

Configuring huge pages for your PostgreSQL instance, Debian version

$
0
0

In the last post we had a look at how you can configure huge pages on RedHat and CentOS systems. For Debian and Debian based systems the procedure is different as Debian does not come with tuned. Lets see how it works there.

Checking the basic system configuration works the same in Debian as in RedHat based distributions by checking the /proc/meminfo file:

postgres@debianpg:/home/postgres/ [PG1] cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

So nothing configured for huge pages in the default configuration. Using the same procedure from the last post this is how you calculate the required huge pages for the PostgreSQL instance:

postgres@debianpg:/home/postgres/ [PG1] head -1 $PGDATA/postmaster.pid
6661
postgres@debianpg:/home/postgres/ [PG1] grep ^VmPeak /proc/6661/status
VmPeak:	  393836 kB
postgres@debianpg:/home/postgres/ [PG1] grep ^Hugepagesize /proc/meminfo
Hugepagesize:       2048 kB
postgres@debianpg:/home/postgres/ [PG1] echo "393836/2048" | bc
192

We’ll need at least 192 pages. Lets add that to /etc/sysctl.conf:

postgres@debianpg:/home/postgres/ [PG1] sudo bash
root@debianpg:/home/postgres$ echo "vm.nr_hugepages=200" >> /etc/sysctl.conf

Notify the system about that change:

root@debianpg:/home/postgres$ sysctl -p
vm.nr_hugepages = 200

… and we have 200 huge pages available:

postgres@debianpg:/home/postgres/ [PG1] cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     200
HugePages_Free:      200
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Again, lets force PostgreSQL to use huge pages and restart the instance:

postgres@debianpg:/home/postgres/ [PG1] psql -c "alter system set huge_pages=on" postgres
ALTER SYSTEM
postgres@debianpg:/home/postgres/ [PG1] pg_ctl -D $PGDATA restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....2018-02-25 17:13:59.398 CET [6918] LOG:  listening on IPv6 address "::1", port 5432
2018-02-25 17:13:59.398 CET [6918] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2018-02-25 17:13:59.403 CET [6918] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2018-02-25 17:13:59.421 CET [6919] LOG:  database system was shut down at 2018-02-25 17:13:59 CET
2018-02-25 17:13:59.427 CET [6918] LOG:  database system is ready to accept connections
 done
server started

… and that’s it:

postgres@debianpg:/home/postgres/ [PG1] cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     200
HugePages_Free:      193
HugePages_Rsvd:       64
HugePages_Surp:        0
Hugepagesize:       2048 kB

We can do the same test as in the last post to check that the number of huge pages will increase when you have load on the system:

postgres=# create table t1 as select * from generate_series(1,1000000);
SELECT 1000000
postgres=# select count(*) from t1;
  count  
---------
 1000000
(1 row)

postgres=# \! cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     200
HugePages_Free:      184
HugePages_Rsvd:       55
HugePages_Surp:        0
Hugepagesize:       2048 kB

Btw: This is on Debian 9 (not sure if it is the same for lower versions):

postgres@debianpg:/home/postgres/ [PG1] cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
 

Cet article Configuring huge pages for your PostgreSQL instance, Debian version est apparu en premier sur Blog dbi services.


Parallel pg_dump is slow by default?

$
0
0

Short answer: Yes, it is. Being at a customer the last days we wanted to parallel pg_dump a 2TB database. We were quite surprised that it was quite slow and it was not immediately clear why it was. Well, the answer is in the documentation: When you go for parallel dumps you need to use the directory format and this comes with: “This format is compressed by default and also supports parallel dumps.”. Compression takes time, so the question was if we could disable compression which was not clear from the statement: Does “compressed by default” mean that it per default is compressed and you can not change that or does it just mean it is the default, but you can change it?

As always, lets setup a short test case:

postgres=# create table dmp1 as 
           select a,a::varchar b,now() c 
             from generate_series ( 1, 1000000) a;
SELECT 1000000
postgres=# create table dmp2 as select * from dmp1;
SELECT 1000000
postgres=# create table dmp3 as select * from dmp1;
SELECT 1000000
postgres=# create table dmp4 as select * from dmp1;
SELECT 1000000
postgres=# \d dmp*
                        Table "public.dmp1"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

                        Table "public.dmp2"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

                        Table "public.dmp3"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

                        Table "public.dmp4"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

We have four tables each containing 1’000’000 rows. When we use pg_dump in parallel with the default it looks like this:

postgres@pgbox:/home/postgres/ [PG10] mkdir /var/tmp/dmp
postgres@pgbox:/home/postgres/ [PG10] time pg_dump --format=d --jobs=4 --file=/var/tmp/dmp/ postgres

real	0m2.788s
user	0m2.459s
sys	0m0.597s
postgres@pgbox:/home/postgres/ [PG10] ls -la /var/tmp/dmp/
total 19528
drwxr-xr-x. 2 postgres postgres    4096 Mar  9 07:16 .
drwxrwxrwt. 4 root     root          51 Mar  9 07:15 ..
-rw-r--r--. 1 postgres postgres      25 Mar  9 07:16 3113.dat.gz
-rw-r--r--. 1 postgres postgres      25 Mar  9 07:16 3114.dat.gz
-rw-r--r--. 1 postgres postgres      25 Mar  9 07:16 3115.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3116.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3117.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3118.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3119.dat.gz
-rw-r--r--. 1 postgres postgres    5819 Mar  9 07:16 toc.dat

As stated in the documentation the result is compressed. When speed is more important then the size on disk you can however disable the compression:

postgres@pgbox:/home/postgres/ [PG10] rm -rf /var/tmp/dmp/*
postgres@pgbox:/home/postgres/ [PG10] time pg_dump --format=d --jobs=4 --file=/var/tmp/dmp/ --compress=0 postgres

real	0m5.357s
user	0m0.065s
sys	0m0.460s
postgres@pgbox:/home/postgres/ [PG10] ls -la /var/tmp/dmp/
total 171040
drwxr-xr-x. 2 postgres postgres     4096 Mar  9 07:18 .
drwxrwxrwt. 4 root     root           51 Mar  9 07:15 ..
-rw-r--r--. 1 postgres postgres        5 Mar  9 07:18 3113.dat
-rw-r--r--. 1 postgres postgres        5 Mar  9 07:18 3114.dat
-rw-r--r--. 1 postgres postgres        5 Mar  9 07:18 3115.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3116.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3117.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3118.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3119.dat
-rw-r--r--. 1 postgres postgres     5819 Mar  9 07:18 toc.dat

In my case it got slower than the compressed dump but this is because I do not really have fast disks on my little VM. When you have a good storage solution disabling compression should bring you more speed.

 

Cet article Parallel pg_dump is slow by default? est apparu en premier sur Blog dbi services.

Finally you will be able to use exit and quit in psql

$
0
0

When giving the PostgreSQL DBA Essentials workshop one of the main issues people have is how they can exit psql. Even on stackoverflow this is a popular topic. The good news for people who still don’t like to use “\q” here is the commit that will add additional options to quit/exit from psql.

Up to PostgreSQL 10 what you can usually see is something like this:

postgres@pgbox:/home/postgres/ [PG10] psql -X postgres
psql (10.0)
Type "help" for help.

postgres=# exit
postgres-# exit
postgres-# quit
postgres-# let me out, what do I need to to?
postgres-# 

Starting with PostgreSQL 11 you can either use “quit”:

postgres@pgbox:/home/postgres/ [PGDEV] psql -X postgres
psql (11devel)
Type "help" for help.

postgres=# quit
postgres@pgbox:/home/postgres/ [PGDEV] 

… or “exit”:

postgres@pgbox:/home/postgres/ [PGDEV] psql -X postgres
psql (11devel)
Type "help" for help.

postgres=# exit
postgres@pgbox:/home/postgres/ [PGDEV] 

I am pretty sure MacBook users will love that :)

 

Cet article Finally you will be able to use exit and quit in psql est apparu en premier sur Blog dbi services.

pg_basebackup and redirecting progress messages to a file

$
0
0

Recently I came over that commit and wondered what that is about. The answer is quite simple but I didn’t know that this issue existed. Basically it is about how progress messages are written to screen and how they are written to a file. Lets have a look.

When your run pg_basebackup with progress messages and in verbose mode the output looks like this:

postgres@pgbox:/home/postgres/ [PG10] pg_basebackup --pgdata=/var/tmp/aa --verbose --progress 
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 4/30000028 on timeline 1
pg_basebackup: starting background WAL receiver
593320/593320 kB (100%), 1/1 tablespace                                         
pg_basebackup: write-ahead log end point: 4/30000130
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed

You’ll notice that the highlighted line always is overwritten on screen until we reach one hundred percent. Looking at that line when pg_basebackup is running will give you an estimate on how long it will take and you will see which file it is currently working on. When you do the same thing but kick it in the background like this:

postgres@pgbox:/home/postgres/ [PG10] pg_basebackup --version
pg_basebackup (PostgreSQL) 10.0 
postgres@pgbox:/home/postgres/ [PG10] mkdir /var/tmp/aa
postgres@pgbox:/home/postgres/ [PG10] nohup pg_basebackup --pgdata=/var/tmp/aa --verbose --progress  > /tmp/a.log 2>&1  &

… you will have the same output in the log file:

postgres@pgbox:/home/postgres/ [PG10] cat -f /tmp/a.log
nohup: ignoring input
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 4/28000028 on timeline 1
pg_basebackup: starting background WAL receiver
593315/593315 kB (100%), 1/1 tablespace                                         
pg_basebackup: write-ahead log end point: 4/28000130
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed

Somehow that was not considered very useful so the commit mentioned above changed that:

postgres@pgbox:/home/postgres/ [PGDEV] pg_basebackup --version
pg_basebackup (PostgreSQL) 11devel
postgres@pgbox:/home/postgres/ [PGDEV] nohup pg_basebackup --pgdata=/var/tmp/aa --verbose --progress  > /tmp/a.log 2>&1  &
postgres@pgbox:/home/postgres/ [PGDEV] cat /tmp/a.log
nohup: ignoring input
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_29846"
     0/184659 kB (0%), 0/1 tablespace (/var/tmp/aa/backup_label           )
  1705/184659 kB (0%), 0/1 tablespace (/var/tmp/aa/base/1/1249            )
  4697/184659 kB (2%), 0/1 tablespace (/var/tmp/aa/base/1/2657            )
  8395/184659 kB (4%), 0/1 tablespace (/var/tmp/aa/base/13276/1255        )
 20601/184659 kB (11%), 0/1 tablespace (/var/tmp/aa/base/13277/2670        )
 30614/184659 kB (16%), 0/1 tablespace (/var/tmp/aa/base/16395/2607_vm     )
 45367/184659 kB (24%), 0/1 tablespace (/var/tmp/aa/base/16395/16424       )
 54743/184659 kB (29%), 0/1 tablespace (/var/tmp/aa/base/16395/16424       )
 74327/184659 kB (40%), 0/1 tablespace (/var/tmp/aa/base/16395/16424       )
118807/184659 kB (64%), 0/1 tablespace (/var/tmp/aa/base/16395/16424       )
146647/184659 kB (79%), 0/1 tablespace (/var/tmp/aa/base/16395/16424       )
175197/184659 kB (94%), 0/1 tablespace (/var/tmp/aa/base/16395/16432       )
184668/184668 kB (100%), 0/1 tablespace (/var/tmp/aa/global/pg_control      )
184668/184668 kB (100%), 0/1 tablespace (/var/tmp/aa/global/pg_control      )
184668/184668 kB (100%), 1/1 tablespace                                         
pg_basebackup: write-ahead log end point: 0/E000168
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed

When you redirect the output into a file you can see much more steps in the log file compared to what we saw before (only a single line which is overwritten all the time). Seems to be a good change.

 

Cet article pg_basebackup and redirecting progress messages to a file est apparu en premier sur Blog dbi services.

Local partitioned indexes in PostgreSQL 11

$
0
0

When declarative partitioning was introduced with PostgreSQL 10 this was a big step forward. But as always with big new features some things do not work in PostgreSQL 10 which now get resolved in PostgreSQL 11. One of those are local partitioned indexes. To make it easier to understand lets start with an example in PostgreSQL 10.

A very simple list partitioned table:

postgres=# select version();
                                                          version                                                           
----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)

postgres=# create table part ( a int, list varchar(5) ) partition by list (list);
CREATE TABLE
postgres=# create table part_1 partition of part for values in ('beer');
CREATE TABLE
postgres=# create table part_2 partition of part for values in ('wine');
CREATE TABLE
postgres=# \d+ part
                                          Table "public.part"
 Column |         Type         | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer              |           |          |         | plain    |              | 
 list   | character varying(5) |           |          |         | extended |              | 
Partition key: LIST (list)
Partitions: part_1 FOR VALUES IN ('beer'),
            part_2 FOR VALUES IN ('wine')

In PostgreSQL 10 what happens when we create an index on the partitioned table?

postgres=# create index i_test on part (a);
ERROR:  cannot create index on partitioned table "part"

You just can not do it. But you can create indexes on the partitions directly:

postgres=# create index i_test_1 on part_1 (a);
CREATE INDEX
postgres=# create index i_test_2 on part_2 (a);
CREATE INDEX

Lets do the same test with PostgreSQL 11:

postgres=# select version();
                                                  version                                                   
------------------------------------------------------------------------------------------------------------
 PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)

postgres=# create table part ( a int, list varchar(5) ) partition by list (list);
CREATE TABLE
postgres=# create table part_1 partition of part for values in ('beer');
CREATE TABLE
postgres=# create table part_2 partition of part for values in ('wine');
CREATE TABLE
postgres=# \d+ part
                                          Table "public.part"
 Column |         Type         | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer              |           |          |         | plain    |              | 
 list   | character varying(5) |           |          |         | extended |              | 
Partition key: LIST (list)
Partitions: part_1 FOR VALUES IN ('beer'),
            part_2 FOR VALUES IN ('wine')

Try to create the index on the partitioned table:

postgres=# create index i_test on part (a);
CREATE INDEX
postgres=# \d+ part
                                          Table "public.part"
 Column |         Type         | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer              |           |          |         | plain    |              | 
 list   | character varying(5) |           |          |         | extended |              | 
Partition key: LIST (list)
Indexes:
    "i_test" btree (a)
Partitions: part_1 FOR VALUES IN ('beer'),
            part_2 FOR VALUES IN ('wine')

postgres=# \d+ part_1
                                         Table "public.part_1"
 Column |         Type         | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer              |           |          |         | plain    |              | 
 list   | character varying(5) |           |          |         | extended |              | 
Partition of: part FOR VALUES IN ('beer')
Partition constraint: ((list IS NOT NULL) AND ((list)::text = 'beer'::character varying(5)))
Indexes:
    "part_1_a_idx" btree (a)

postgres=# \d+ part_2
                                         Table "public.part_2"
 Column |         Type         | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer              |           |          |         | plain    |              | 
 list   | character varying(5) |           |          |         | extended |              | 
Partition of: part FOR VALUES IN ('wine')
Partition constraint: ((list IS NOT NULL) AND ((list)::text = 'wine'::character varying(5)))
Indexes:
    "part_2_a_idx" btree (a)

The index is cascaded down to all the partitions in PostgreSQL 11 which is really nice. As a side effect of this, when you try this in PostgreSQL 10:

postgres=# alter table part add constraint part_pk primary key(a,list);
ERROR:  primary key constraints are not supported on partitioned tables
LINE 1: alter table part add constraint part_pk primary key(a,list);
                             ^

… you will get an error message telling you that primary keys are not supported on partitioned tables. The same applies here, you can do that on the partitions directly:

postgres=# alter table part_1 add constraint part1_pk primary key(a,list);
ALTER TABLE
postgres=# alter table part_2 add constraint part2_pk primary key(a,list);
ALTER TABLE

Now in PostgreSQL 11 this works as well:

postgres=# alter table part add constraint part_pk primary key(a,list);
ALTER TABLE
postgres=# \d+ part
                                          Table "public.part"
 Column |         Type         | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer              |           | not null |         | plain    |              | 
 list   | character varying(5) |           | not null |         | extended |              | 
Partition key: LIST (list)
Indexes:
    "part_pk" PRIMARY KEY, btree (a, list)
    "i_test" btree (a)
Partitions: part_1 FOR VALUES IN ('beer'),
            part_2 FOR VALUES IN ('wine')

postgres=# \d+ part_1
                                         Table "public.part_1"
 Column |         Type         | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer              |           | not null |         | plain    |              | 
 list   | character varying(5) |           | not null |         | extended |              | 
Partition of: part FOR VALUES IN ('beer')
Partition constraint: ((list IS NOT NULL) AND ((list)::text = 'beer'::character varying(5)))
Indexes:
    "part_1_pkey" PRIMARY KEY, btree (a, list)
    "part_1_a_idx" btree (a)

postgres=# \d+ part_2
                                         Table "public.part_2"
 Column |         Type         | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer              |           | not null |         | plain    |              | 
 list   | character varying(5) |           | not null |         | extended |              | 
Partition of: part FOR VALUES IN ('wine')
Partition constraint: ((list IS NOT NULL) AND ((list)::text = 'wine'::character varying(5)))
Indexes:
    "part_2_pkey" PRIMARY KEY, btree (a, list)
    "part_2_a_idx" btree (a)

Quite some improvements to show up in PostgreSQL 11.

 

Cet article Local partitioned indexes in PostgreSQL 11 est apparu en premier sur Blog dbi services.

What is the maximum in list size in PostgreSQL?

$
0
0

Yesterday, while being at a customer, an interesting question popped up: What is the maximum of in list values in PostgreSQL? I couldn’t answer although I never read somewhere that there is a limit. The following is for fun only and I am not saying that creating huge in lists is a good idea. Lets go.

The version I tested is PostgreSQL 10:

postgres=# select version(), now();
                                                  version                                                   |              now              
------------------------------------------------------------------------------------------------------------+-------------------------------
 PostgreSQL 10.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit | 2018-03-21 18:29:50.269369+01
(1 row)

postgres=# create table t1 ( a varchar(10));
CREATE TABLE

We will use a very simple bash script to build the in list and execute the result in PostgreSQL:

postgres@pgbox:/home/postgres/ [PG10] cat t.sh 
#!/bin/bash
count=$1

statement='select * from t1 where a in ('

for (( i=1; i<=$count; i++ ))
do  
    if [ "${i}" -lt "${count}" ]; then
        statement="${statement} '${i}',"
    elif [ "${i}" == "${count}" ]; then
        statement="${statement} '${i}');"
    fi
done

psql -c "${statement}" postgres

Lets start with 100:

postgres@pgbox:/home/postgres/ [PG10] ./t.sh 100
 a 
---
(0 rows)

Time: 0.983 ms

1000:

postgres@pgbox:/home/postgres/ [PG10] ./t.sh 1000
 a 
---
(0 rows)

Time: 1.525 ms

10000:

postgres@pgbox:/home/postgres/ [PG10] ./t.sh 10000
 a 
---
(0 rows)

Time: 11.549 ms

… takes a much longer (because of the bash script which fully occupies my virtual core):

./t.sh: line 15: /u01/app/postgres/product/10/db_0/bin/psql: Argument list too long

So there at least is a limit with psql. Lets try by creating a sql script and execute that instead:

postgres@pgbox:/home/postgres/ [PG10] cat t.sh 
#!/bin/bash
count=$1

echo 'select * from t1 where a in (' > ttt.sql

for (( i=1; i<=$count; i++ ))
do  
    if [ "${i}" -lt "${count}" ]; then
        echo  "'${i}'," >> ttt.sql
    elif [ "${i}" == "${count}" ]; then
        echo "'${i}');" >> ttt.sql
    fi
done

psql -f ttt.sql postgres

This way of stringing together the statement is much more efficient than building the list by concatenating everything into one variable. Does it still work?

postgres@pgbox:/home/postgres/ [PG10] ./t.sh 100000
 a 
---
(0 rows)

Time: 155.986 ms

Not a problem, one more:

postgres@pgbox:/home/postgres/ [PG10] ./t.sh 1000000
 a 
---
(0 rows)

Time: 14211.766 ms (00:14.212)

Still works. So now we could say: lets stop, who in the world will pass one million values into an in list. On the other hand, lets have fun and double:

postgres@pgbox:/home/postgres/ [PG10] ./t.sh 2000000
 a 
---
(0 rows)

Time: 3993.091 ms (00:03.993)

One more:

postgres@pgbox:/home/postgres/ [PG10] ./t.sh 3000000
psql:ttt.sql:3000001: ERROR:  out of memory
DETAIL:  Failed on request of size 524288.
Time: 3026.925 ms (00:03.027)

Ok, now I am hitting some limits, but probably not those of PostgreSQL. I’ll test further when I have more time for that :)

 

Cet article What is the maximum in list size in PostgreSQL? est apparu en premier sur Blog dbi services.

Viewing all 526 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>