Quantcast
Channel: Archives des PostgreSQL - dbi Blog
Viewing all 526 articles
Browse latest View live

Postgres vs. Oracle access paths II – IndexOnlyScan

$
0
0

In the previous post I’ve explained a sequential scan by accident: my query needed only one column which was indexed, and I expected to read the index rather than the table. And I had to hint the Oracle example to get the same because the Oracle optimizer chooses the index scan over the table scan in that case. Here is where I learned a big difference between Postgres and Oracle. They both use MVCC to query without locking, but Postgres MVCC is for table rows (tuples) only whereas Oracle MVCC is for all blocks – tables and indexes.

So this second post is about Index Only Scan and the second constant you find in the documentation for the query planner:
random_page_cost (floating point)
Sets the planner’s estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0.


I am here in the situation after the previous post: created table and index, have run a query which did a sequential scan on the table:

explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1554.00..1554.01 rows=1 width=8) (actual time=17.430..17.430 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=1429
-> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.031..13.011 rows=10000 loops=1)
Output: n, a, x
Buffers: shared hit=1429
Planning time: 1.791 ms
Execution time: 17.505 ms

Index Only Scan

I want to understand why the query planner did not choose an access to the index only. This is where hints are useful: force a plan that is not chosen by the optimizer in order to check if this plan is possible, and then check its cost:

/*+ IndexOnlyScan(demo1) */
explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1727.29..1727.30 rows=1 width=8) (actual time=5.424..5.425 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=1429 read=29
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..1702.29 rows=10000 width=4) (actual time=0.177..4.613 rows=10000 loops=1)
Output: n
Heap Fetches: 10000
Buffers: shared hit=1429 read=29
Planning time: 0.390 ms
Execution time: 5.448 ms

From there you see that an Index Only Scan is possible but more expensive. The estimated cost is higher than the Seq Scan (cost=0.29..1702.29 instead of cost=0.00..1529.00). And the execution statistics shows that I’ve read the 1429 table pages in addition to the 29 pages of the index.

From the hit/read statistics we can note that the create table has left all the table pages in the buffer cache, but this is not the case for the create index. But that’s another story. My concern is why and index only access goes to read all table blocks in addition to the index ones, which brings the cost to 1727.30-1554.01=173.29 higher than the sequential scan.

The clue is in this line showing that all my rows were fetched from heap page, which is the table: Heap Fetches: 10000

Tuple visibility

In ACID databases, a modification must not be visible by others until the transaction completion (commit). There are two ways to achieve that. The first way is to read the latest version of data: lock in share mode what you read, so that no concurrent update can happen. The other solution is to query a previous version of data (MVCC – Multi Version Concurrency Control) where uncommitted changes are not visible. Both Oracle and Postgres use MVCC which is great because you can have transactions and queries on the same database. But they do the versioning at a different level.

Oracle MVCC is physical, at block level. Then everything is versioned: tables as well as index, with their transaction information (ITL) which, with the help of the transaction table, give all information about visibility: committed or not, and with the commit SCN. With this architecture, a modified block can be written to disk even with uncommitted changes and there is no need to re-visit it later once the transaction is committed.

Postgres MVCC is logical at row (‘tuple’) level: new version is a new row, and committed changes set the visibility of the row. The table row is versioned but not the index entry. If you access by index, you still need to go to the table to see if the row is visible to you. This is why I had heap fetches here and the table blocks were read.

This explains that the cost of Index Only Scan is high here. In addition to about 30 index blocks to read, I’ve read about 1429 table blocks. But that can be worse. For each index entry, and I have 10000 of them, we need to go to the table row, which is exactly what the 10000 heap fetches are. But I’m lucky because I have a very good clustering factor: I have created the table with increasing values for the column N (generated by generate_series). With a bad clustering factor (physical storage of rows in the table not correlated with the order of index) you would see up to 10000 additional shared hits. Thankfully, the query planner estimates this and has switched to table scan which is cheaper in this case.

Vacuum and Visibility Map

Always going to the table rows to see if they are committed would always be more expensive than a table scan. The Postgres vacuum process maintains a Visibility Map as a bitmap of pages that have been vacuumed and have no more tuples to vacuum. This means that all rows in those pages are visible to all transactions. When there is an update on the page, the flag is unset, and remains unset until the modification is committed and the vacuum runs on it. This visibility flag is used by the Index Only Scan to know if it is needed to get to the page.

Let’s run the vacuum and try again the same query:

vacuum demo1;
VACUUM
 
explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=295.29..295.30 rows=1 width=8) (actual time=2.192..2.193 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=30
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.277 rows=10000 loops=1)
Output: n
Heap Fetches: 0
Buffers: shared hit=30
Planning time: 0.450 ms
Execution time: 2.213 ms

Here, without any hint, the query planner has chosen the Index Only Scan which is now less expensive than a Seq Scan: cost=0.29..270.29

Cost of Index Only Scan

There is an initial cost of 0.29 is calculated from cpu_operator_cost which defaults 0.0025 which means that about 0.29/0.0025=116 operations were charged here. This cost is minimal and I don’t go into details.
CaptureIndexScanpgora
Then, to get rows we have to

  • read 30 blocks from the index. Those seem to be random scan (with random_page_cost=4) and then the cost for all rows is 4*30=120
  • process the index entries (with cpu_index_tuple_cost=0.005) and then the cost for all 10000 rows is 0.005*10000=50
  • process the result rows (with cpu_tuple_cost=0.01) and then the cost for all 10000 rows is 0.01*10000=100

This brings the cost to the total of 270.29

For the above operation, the SUM(N) this is exactly the same as in the previous post on Seq Scan: cost=25 (cpu_operator_cost=0.0025 for 10000 rows) and is this initial cost because the sum is now only when all rows are processed, and an additional 0.01 for the result row.

Oracle

In the previous post I used the FULL() hint to compare Oracle Full Table Scan to Postgres Seq Scan, but by default, Oracle chose an index only access because the index covers all the rows and columns we need.

All columns that we need:

In the previous post we have seen the column projection (from the +projeciton format of dbms_xplan):

Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) SUM("N")[22] 2 - (rowset=256) "N"[NUMBER,22]

I need only the column N from the table DEMO1, and this column is in the index DEMO1_N

All rows that we need:

In Oracle an index does not have an entry for every row but only for rows where at least one of the indexed columns is not null. Here because we have no where clause predicate on N, and because we have not declared the column N as NOT NULL, the access by index may not return all rows. However, the SUM() function does not need to know about the null values, because they don’t change the sum and then the optimizer can safely choose to do an index only access.

Here is the query without hints:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6z194712fvcfu, child number 0
-------------------------------------
select /*+ */ sum(n) from demo1
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 (100)| 1 |00:00:00.01 | 26 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 26 |
| 2 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 26 |
--------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) SUM("N")[22] 2 - "N"[NUMBER,22]

This plan looks very similar to the Postgres one after the vacuum: 51 buffers which is approximately the number of blocks in my index here. However, Oracle does not have the ‘vacuum’ requirement because the MVCC applies to the index and Oracle does not need to go to the table to undo the uncommitted changes. But there is something else here. If you remember the previous post, the Oracle cost=1 is equivalent to the cost of a random read (single block) and the cost of reading one block through a larger I/O (multiblock read) is, with default statistics, about 0.278 times cheaper. Here, 7/26= 0.2692 which proves that the cost is based on multiblock reads. Oracle can read indexes with INDEX FAST FULL SCAN in the same way it reads table with FULL TABLE SCAN: with larger I/O. We don’t need any ordering of rows here, because we just do the sum, and then we don’t need to follow the chain of leaf blocks, scattered within the index segment. Just read all blocks as they come, with fast I/O.

Index Fast Full Scan is possible in Oracle because MVCC is at block level for indexes as well as tables. You can just read the blocks as of the point in time of the query, without being concerned by concurrent operations that update the index entries or split the blocks. Postgres Index Only Scan is limited because MVCC is on tables only, and then must scan the index in the order of leaves, and must read the visibility map and maybe the table pages.

In Oracle, an index can be used to partition vertically a table, asa redundant storage of a few columns in order to avoid full table scans on large rows, allowing queries to avoid completely to read the table when the index covers all required rows and columns. We will see more about the ‘all rows’ requirement in the next post.

 

Cet article Postgres vs. Oracle access paths II – IndexOnlyScan est apparu en premier sur Blog dbi services.


Postgres vs. Oracle access paths III – Partial Index

$
0
0

In the previous post I said that an Index Only Access needs to find all rows in the index. Here is a case where, with similar data, Postgres can find all rows but Oracle needs additional considerations.

In the previous post I’ve executed:
select sum(n) from demo1
The execution plan was:

Aggregate (cost=295.29..295.30 rows=1 width=8) (actual time=2.192..2.193 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=30
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.277 rows=10000 loops=1)
Output: n
Heap Fetches: 0
Buffers: shared hit=30

Basically, this reads all values of the column N and then aggregates them to the sum.
If I remove the SUM() I have only the part that reads all values from N:

explain (analyze,verbose,costs,buffers) select n from demo1 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.284 rows=10000 loops=1)
Output: n
Heap Fetches: 0
Buffers: shared hit=30
Planning time: 0.440 ms
Execution time: 1.972 ms

Oracle

This sounds logical. Now let’s run the same query, a simple ‘select n from demo1′ in Oracle:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID ad4z7tpt0dkta, child number 0
-------------------------------------
select /*+ */ n from demo1
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 397 (100)| 10000 |00:00:00.01 | 1451 |
| 1 | TABLE ACCESS FULL| DEMO1 | 1 | 10000 | 397 (0)| 10000 |00:00:00.01 | 1451 |
--------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Here the access path is different: a full table scan instead of an index only access (Index Fast Full Scan). It is not a cost decision. If we try to force an index access, with INDEX_FFS() or INDEX() hints, the query will still do a Full Table Scan. The reason is that and index only access is possible only if all columns and all rows are present in the index. But Oracle does not always index all rows. The Oracle index has no entry for the rows where all the indexed columns are nulls.

Where n is not null

If I run the same query with the purpose of showing only non-null values, with a ‘where n is not null’ predicate, then an index only access is possible:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2gbjpw5u0v9cw, child number 0
-------------------------------------
select /*+ */ n from demo1 where n is not null
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 (100)| 10000 |00:00:00.01 | 28 |
| 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 28 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("N" IS NOT NULL)

Constraints

An alternative, if we know that we will never have null values here, is to give the information to the optimizer that there are no null values in the column N:
In Oracle:
alter table demo1 modify n not null;
This is the equivalent of the PostgreSQL
alter table demo1 alter column n set not null;
Then, in addition to ensuring the verification of the constraint, the constraint informs the optimizer that there is no null values and that all rows can be find in the index:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID ad4z7tpt0dkta, child number 0
-------------------------------------
select /*+ */ n from demo1
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 (100)| 10000 |00:00:00.01 | 28 |
| 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 28 |
-------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Additional columns

Even if the column can have some null values, it is easy to have an index on null values in Oracle, just by adding a non-null column or expression. And if you don’t need this additional column, you can even add a constant, such as in the following index definition:

create unique index demo1_n on demo1(n,0);

This works because all index entries have at least one non null value. But looking at the buffers you can see that this additional byte (0 is stored in 1 byte) has a little overhead (31 blocks read here instead of 28):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID ad4z7tpt0dkta, child number 0
-------------------------------------
select /*+ */ n from demo1
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 8 (100)| 10000 |00:00:00.01 | 31 |
| 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 8 (0)| 10000 |00:00:00.01 | 31 |
-------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Oracle Partial Indexes

In Oracle, all indexes that include a nullable column are partial indexes: not all rows are indexed, and an index access is possible only if the WHERE clause, or a constraint, guarantees that we don’t need the non-indexed rows. Combined with expression, it can be a way to implement partial indexes when the expression returns null for a specific condition. Oracle even provides computed columns (aka virtual columns) so that the expression does not have to be coded in the where clause of the query.

As an example with expressions, the following index has entries only for the values lower than 10:
create index demo_top10 on demo1(case when n<=10 then n end)

However, to use it, we must mention the expression explicitly:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 863drbjwayrt7, child number 0
-------------------------------------
select /*+ */ (case when n<=10 then n end) from demo1 where (case when
n<=10 then n end)<=5
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 4 |00:00:00.01 | 2 |
|* 1 | INDEX RANGE SCAN| DEMO1_N_TOP10 | 1 | 5 | 1 (0)| 4 |00:00:00.01 | 2 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DEMO1"."SYS_NC00004$"<=5)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "DEMO1"."SYS_NC00004$"[NUMBER,22]

We can see that internally, a virtual column (“SYS_NC00004$”) has been created for the indexed expression, and is used for the predicate and the projection which uses the same expression. There is another possibility with the ‘partial index’ feature introduced in 12c but it has not the flexibility of a predicate: it is based on partitioning where only some partitions can be indexed.

Postgres Partial Indexes

Postgres does not need those workarounds. An index indexes all rows, including null entries, and partial indexes can be defined with a where clause:
create index demo_top10 on demo1(n) where n<=10

No need to change the query. As long as the result can come from the partial index, we can use the column without an expression on it:

explain (analyze,verbose,costs,buffers) select n from demo1 where n<=5 ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n_top10 on public.demo1 (cost=0.14..4.21 rows=4 width=4) (actual time=0.114..0.114 rows=5 loops=1)
Output: n
Index Cond: (demo1.n <= 5)
Heap Fetches: 0
Buffers: shared hit=2
Planning time: 0.557 ms
Execution time: 0.129 ms

Here the smaller partial index (demo1_n_top10) has been chosen by the query planner.

As you see I’ve not used exactly the same condition. The query planner understood that n<=5 (in the WHERE clause) is a subset of n<=10 (in the index definition). However, if the predicate is too different, it cannot use the index:

fpa=# explain (analyze,verbose,costs,buffers) select n from demo1 where 2*n<=10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n on public.demo1 (cost=0.29..320.29 rows=3333 width=4) (actual time=0.020..1.086 rows=5 loops=1)
Output: n
Filter: ((2 * demo1.n) <= 10)
Rows Removed by Filter: 9995
Heap Fetches: 0
Buffers: shared hit=30

Here, instead of “Index Cond” we have a simple “Filter”. The Index Only Scan has read all the rows, and they were filtered afterward (“Rows Removed by Filter”).

Index condition

With the VERBOSE option of EXPLAIN we see the condition used by the index access:
Index Cond: (demo1.n <= 5)
‘Index Cond.’ is not a simple filter removing rows after an operation, but it is the condition which is used for fast access to the index entries in the sorted index structure. We have the equivalent in Oracle with the ‘+predicate’ format of dbms_xplan:

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("N"<=5)

Before going further on index access for WHERE clause predicate, the next post will show the major characteristic of indexes (besides the fact that it stores a redundant subset of columns and rows): they are maintained sorted and may return the resulting rows in order.

 

Cet article Postgres vs. Oracle access paths III – Partial Index est apparu en premier sur Blog dbi services.

A wonderful PostgreSQL feature: default privileges

$
0
0

Imagine this scenario (which is not so uncommon): You have a lot of objects in a user schema and you want to grant another user access to that tables. You can easily do this by granting select on the tables to the user and you’re fine. Really? Maybe now, but what will happen when the user which owns the objects creates new objects? Then you will need to grant those to the second user as well. In PostgreSQL there is an easier solution. Lets go …

Again we start by creating two users each with its own schema:

postgres=# create user a with login password 'a';
CREATE ROLE
postgres=# create schema a authorization a;
CREATE SCHEMA
postgres=# alter user a set search_path=a;
ALTER ROLE
postgres=# create user b with login password 'b';
CREATE ROLE
postgres=# create schema b authorization b;
CREATE SCHEMA
postgres=# alter user b set search_path=b;
ALTER ROLE
postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of 
-----------+------------------------------------------------------------+-----------
 a         |                                                            | {}
 b         |                                                            | {}
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

postgres=# \dn
  List of schemas
  Name  |  Owner   
--------+----------
 a      | a
 b      | b
 public | postgres
(3 rows)

User “a” shall be the one owning the objects:

postgres=# \c postgres a
You are now connected to database "postgres" as user "a".
postgres=> create table t1 ( a int );
CREATE TABLE
postgres=> create table t2 ( a int );
CREATE TABLE
postgres=> insert into t1 (a) values (1);
INSERT 0 1
postgres=> insert into t2 (a) values (2);
INSERT 0 1
postgres=> \d
       List of relations
 Schema | Name | Type  | Owner 
--------+------+-------+-------
 a      | t1   | table | a
 a      | t2   | table | a
(2 rows)

When you want to give user “b” access to these tables you could do:

postgres=> grant select on table t1 to b;
GRANT
postgres=> grant select on table t2 to b;
GRANT

From now on user “b” should be able to select from the two tables owned by user “a”, right?:

postgres=> \c postgres b
You are now connected to database "postgres" as user "b".
postgres=> select count(*) from a.t1;
ERROR:  permission denied for schema a
LINE 1: select count(*) from a.t1;

This is not how it works in PostgreSQL. What you need to do is this:

postgres=> \c postgres a
You are now connected to database "postgres" as user "a".
postgres=> grant usage on schema a to b;
GRANT

This allows user “b” access to the schema “a” (remember that a user and a schema are different things in PostgreSQL):

postgres=> \c postgres b
You are now connected to database "postgres" as user "b".
postgres=> select count(*) from a.t1;
 count 
-------
     1
(1 row)

postgres=> select count(*) from a.t2;
 count 
-------
     1
(1 row)

What happens now when user “a” creates another object:

postgres=> \c postgres a
You are now connected to database "postgres" as user "a".
postgres=> create table t3 as select * from t1;
SELECT 1
postgres=> \d
       List of relations
 Schema | Name | Type  | Owner 
--------+------+-------+-------
 a      | t1   | table | a
 a      | t2   | table | a
 a      | t3   | table | a
(3 rows)

Will user “b” be able to select data from it?

postgres=> \c postgres b
You are now connected to database "postgres" as user "b".
postgres=> select count(*) from a.t3;
ERROR:  permission denied for relation t3

Of course not. The “usage” on a schema grants only access to that schema but not access to the objects in the schema. When we want user “b” being able to select from all tables in schema “a” even when user “a” creates new objects then we can modify the default privileges:

postgres=# \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# alter default privileges in schema a grant select on tables to b;
ALTER DEFAULT PRIVILEGES

Should user “b” now be able to select from the “t3″ table in schema “a”?

postgres=> select current_user;
 current_user 
--------------
 b
(1 row)

postgres=> select count(*) from a.t3;
ERROR:  permission denied for relation t3
postgres=> 

No. When you modify the default privileges this will affect only objects created after your modification. Lets create a new table with user “a” in schema “a”:

postgres=> \c postgres a
You are now connected to database "postgres" as user "a".
postgres=> create table t4 as select from t1;
SELECT 1

As this table was created after the modification to the default privileges user “b” is allowed to select from it automatically:

postgres=> \c postgres b
You are now connected to database "postgres" as user "b".
postgres=> select count(*) from a.t4;
 count 
-------
     1
(1 row)

When you check the link to the documentation above you’ll notice that you can not only grant select on tables but much more. Hope this helps …

 

Cet article A wonderful PostgreSQL feature: default privileges est apparu en premier sur Blog dbi services.

Developer GUI tools for PostgreSQL

$
0
0

There was a recent thread on the PostgreSQL general mailing list asking for GUI tools for PostgreSQL. This is question we get asked often at customers so I though it might be good idea to summarize some of them in a blog post. When you know other tools than the ones listed here which look promising, let me know so I can add them. There is a list of tools in the PostgreSQL Wiki as well.

Name Linux Windows MacOS Free Screenshot
pgAdmin Y Y Y Y pg_gui_pgadmin
DBeaver Y Y Y Y pg_gui_dbeaver EMS SQL Manager for PostgreSQL N Y N N pg_gui_ems_sql_manager JET BRAINS DataCrip Y Y Y N pg_gui_datagrip PostgreSQL Studio Y Y Y Y pg_gui_pgstudio Navicat for PostgreSQL Y Y Y N pg_gui_navicat execute Query Y Y Y Y pg_gui_executequery SQuirreL SQL Client Y Y Y Y pg_gui_aquirrel pgModeler Y Y Y Y pg_gui_pgmodeler DbSchema Y Y Y N pg_gui_dbschema Oracle SQL Developer Y Y Y Y pg_gui_sqldeveloper PostgreSQL Maestro N Y N N pg_gui_sqlmaestro SQL workbench Y Y Y Y pg_gui_sqlworkbench Nucleon Database Master N Y N N pg_gui_databasemaster Razor SQL Y Y Y N pg_gui_razorsql Database Workbench N Y N N pg_gui_databaseworkbench
 

Cet article Developer GUI tools for PostgreSQL est apparu en premier sur Blog dbi services.

Postgres vs. Oracle access paths IV – Order By and Index

$
0
0

I realize that I’m talking about indexes in Oracle and Postgres, and didn’t mention yet the best website you can find about indexes, with concepts and examples for all RDBMS: http://use-the-index-luke.com. You will probably learn a lot about SQL design. Now let’s continue on execution plans with indexes.

As we have seen two posts ago, an index can be used even with a 100% selectivity (all rows), when we don’t filter any rows. Oracle has INDEX FAST FULL SCAN which is the fastest, reading blocks sequentially as they come. But this doesn’t follow the B*Tree leaves chain and does not return the rows in the order of the index. However, there is also the possibility to read the leaf blocks in the index order, with INDEX FULL SCAN and random reads instead of multiblock reads.
It is similar to the Index Only Scan of Postgres except that there is no need to get to the table to filter out uncommitted changes. Oracle reads the transaction table to get the visibility information, and goes to undo records if needed.

The previous post had a query with a ‘where n is not null’ predicate to be sure having all index entries in Oracle indexes and we will continue on this by adding an order by.

For this post, I’ve increased the size of the column N in the Oracle table, by adding 1/3 to each number. I did this for this post only, and for the Oracle table only. The index on N is now 45 blocks instead of 20. The reason is to show what happens when the cost of ‘order by’ is high. I didn’t change the Postgres table because there is only one way to scan the index, where result is always sorted.

Oracle Index Fast Full Scan vs. Index Full Scan


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID dbck3rgnqbakg, child number 0
-------------------------------------
select /*+ */ n from demo1 where n is not null order by n
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 46 (100)| 10000 |00:00:00.01 | 48 |
| 1 | INDEX FULL SCAN | DEMO1_N | 1 | 10000 | 46 (0)| 10000 |00:00:00.01 | 48 |
---------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Index Full Scan, the random read version of index read is chosen here by the Oracle optimizer because we want the result on the column N and the index can provide this without additional sorting.

We can force the optimizer to do multiblock reads, with INDEX_FFS hint:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID anqfbf5caat2a, child number 0
-------------------------------------
select /*+ index_ffs(demo1) */ n from demo1 where n is not null order
by n
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 82 (100)| 10000 |00:00:00.01 | 51 | | | |
| 1 | SORT ORDER BY | | 1 | 10000 | 82 (2)| 10000 |00:00:00.01 | 51 | 478K| 448K| 424K (0)|
| 2 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 14 (0)| 10000 |00:00:00.01 | 51 | | | |
-----------------------------------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "N"[NUMBER,22] 2 - "N"[NUMBER,22]

The estimated cost is higher: the index read is cheaper (cost=14 instead of 46) but then the sort operation brings this to 82. We can see additional columns in the execution plan here because the sorting operation needs a workarea in memory (estimated 478K, actually 424K used during the execution). Note that the multiblock read has a few blocks of overhead (reads 51 blocks instead of 48) because it has to read the segment header to identify the extents to scan.

Postgres Index Only Scan

In PostgreSQL there’s only one way to scan indexes: random reads by following the chain of leaf blocks. This returns the rows in the order of the index and does not require an additional sort:


explain (analyze,verbose,costs,buffers) select n from demo1 where n is not null order by n ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n on public.demo1 (cost=0.29..295.29 rows=10000 width=4) (actual time=0.125..1.277 rows=10000 loops=1)
Output: n
Index Cond: (demo1.n IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=30
Planning time: 0.532 ms
Execution time: 1.852 ms

In the previous posts, we have seen a cost of cost=0.29..270.29 for the Index Only Scan. Here we have an additional cost of 25 for the cpu_operator_cost because I’ve added the ‘where n is not null’. As the default constant is 0.0025 this is the query planner estimating to evaluate it for 10000 rows.

First Rows

The Postgres cost always shows two values. The first one is the startup cost: the cost just before being able to return the first row. Some operations have a very small startup cost, others have some blocking operations that must finish before sending their first result rows. Here, as we have no sort operation, the first row retrieved from the index can be returned immediately and the startup cost is small: 0.29
In Oracle you can see the initial cost by optimizing the plan to retrieve the first row, with the FIRST_ROWS() hint:


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 0fjk9vv4g1q1w, child number 0
-------------------------------------
select /*+ first_rows(1) */ n from demo1 where n is not null order by
n
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 (100)| 10000 |00:00:00.01 | 48 |
| 1 | INDEX FULL SCAN | DEMO1_N | 1 | 10000 | 2 (0)| 10000 |00:00:00.01 | 48 |
---------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

The actual number of blocks read (48) is the same as before because I finally fetched all rows, but the cost is small because it was estimated for two rows only. Of course, we can also tell Postgres or Oracle that we want only the first rows. This is for the next post.

Character strings

The previous example is an easy one because the column N is a number and both Oracle and Postgres stores number in a binary format that follows the same order as the numbers. But that’s different with character strings. If you are not in America, there is a very little chance that the order you want to see follows the ASCII order. Here I’ve run a similar query but using the column X instead of N, which is a text (VARCHAR2 in Oracle):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID fsqk4fg1t47v5, child number 0
-------------------------------------
select /*+ */ x from demo1 where x is not null order by x
--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2493 (100)| 10000 |00:00:00.27 | 1644 | 18 | | | |
| 1 | SORT ORDER BY | | 1 | 10000 | 2493 (1)| 10000 |00:00:00.27 | 1644 | 18 | 32M| 2058K| 29M (0)|
|* 2 | INDEX FAST FULL SCAN| DEMO1_X | 1 | 10000 | 389 (0)| 10000 |00:00:00.01 | 1644 | 18 | | | |
--------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("X" IS NOT NULL)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) NLSSORT("X",'nls_sort=''FRENCH''')[2000], "X"[VARCHAR2,1000] 2 - "X"[VARCHAR2,1000]

I have created an index on X, and as you can see it can be used to get all X values, but with an Index Fast Full Scan, the multiblock index only access which is fast but does not return rows in the order of the index. And then a sort operation is applied. I can force an Index Full Scan with INDEX() hint but the sort will still have to be done.

The reason can be seen in the column projection note. My Oracle client application is running on a laptop where the OS is in French and Oracle returns the setting according to what the end-user can expect. This is National Language Support. An Oracle database can be accessed by users all around the world and they will see ordered lists, date format, decimal separators,… according to their country and language.

ORDER BY … COLLATE …

My databases has been created in a system which is in English. In Postgres we can get results sorted in French with the COLLATE option of ORDER BY:


explain (analyze,verbose,costs,buffers) select x from demo1 where x is not null order by x collate "fr_FR" ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=5594.17..5619.17 rows=10000 width=1036) (actual time=36.163..37.254 rows=10000 loops=1)
Output: x, ((x)::text)
Sort Key: demo1.x COLLATE "fr_FR"
Sort Method: quicksort Memory: 1166kB
Buffers: shared hit=59
-> Index Only Scan using demo1_x on public.demo1 (cost=0.29..383.29 rows=10000 width=1036) (actual time=0.156..1.559 rows=10000 loops=1)
Output: x, x
Index Cond: (demo1.x IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=52
Planning time: 0.792 ms
Execution time: 38.264 ms

Same idea here as in Oracle: there is an additional sort operation, which is a blocking operation that needs to be completed before being able to return the first row.

The detail of the cost is the following:

  • The index on the column X has 52 blocks witch is estimated at cost=208 (random_page_cost=4)
  • We have 10000 index entries to process, estimated at cost=50 (cpu_index_tuple_cost=0.005)
  • We have 10000 result rows to process, estimated at cost=100 (cpu_tuple_cost=0.01)
  • We have evaluated 10000 ‘is not null’ conditions, estimated at cost=25 (cpu_operator_cost=0.0025)

In Oracle we can use the same COLLATE syntax, but the name of the language is different, consistent across platforms rather than useing the OS one:


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 82az4syppyndf, child number 0
-------------------------------------
select /*+ */ x from demo1 where x is not null order by x collate "French"
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2493 (100)| 10000 |00:00:00.28 | 1644 | | | |
| 1 | SORT ORDER BY | | 1 | 10000 | 2493 (1)| 10000 |00:00:00.28 | 1644 | 32M| 2058K| 29M (0)|
|* 2 | INDEX FAST FULL SCAN| DEMO1_X | 1 | 10000 | 389 (0)| 10000 |00:00:00.01 | 1644 | | | |
-----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("X" IS NOT NULL)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) NLSSORT("X" COLLATE "French",'nls_sort=''FRENCH''')[2000], "X"[VARCHAR2,1000] 2 - "X"[VARCHAR2,1000]

In Oracle, we do not need to use the COLLATE option. The language can be set for the session (NLS_LANGUAGE=’French’) or from the environment (NLS_LANG=’=French_.’). Oracle can share cursors across sessions (to avoid to waste resource compiling and optimizing the same statements used by different sessions) but will not share execution plans among different NLS environments because, as we have seen, the plan can be different. Postgres do not have to manage that because each PREPARE statement does a full compilation and optimization. There is no cursor sharing in Postgres.

Indexing for different languages

We have seen in the Oracle execution plan Column Projection Information that an NLSSORT operation is applied on the column to get a value that follows the collation order of the language. We have seen in the previous post that we can index a function on a column. Then we have the possibility to create an index for different languages. The following index will be used to avoid sort from French users:

create index demo1_x_fr on demo1(nlssort(x,'NLS_SORT=French'));

Since 12cR2 we can create the same with de collate syntax:

create index demo1_x_fr on demo1(x collate "French");

Both syntaxes create the same index, which can be used by queries with ORDER BY … COLLATE or with session that set the NLS_LANGUAGE:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 82az4syppyndf, child number 0
-------------------------------------
select /*+ */ x from demo1 where x is not null order by x collate "French"
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4770 (100)| 10000 |00:00:00.02 | 4772 |
|* 1 | TABLE ACCESS BY INDEX ROWID| DEMO1 | 1 | 10000 | 4770 (1)| 10000 |00:00:00.02 | 4772 |
| 2 | INDEX FULL SCAN | DEMO1_X_FR | 1 | 10000 | 3341 (1)| 10000 |00:00:00.01 | 3341 |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("X" IS NOT NULL)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "X"[VARCHAR2,1000] 2 - "DEMO1".ROWID[ROWID,10], "DEMO1"."SYS_NC00004$"[RAW,2000]

There’s no sort operation here as the INDEX FULL SCAN returns the rows in order.

PostgreSQL has the same syntax:

create index demo1_x_fr on demo1(x collate "fr_FR");

and then the query can use this index and bypass the sort operation:

explain (analyze,verbose,costs,buffers) select x from demo1 where x is not null order by x collate "fr_FR" ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_x_fr on public.demo1 (cost=0.29..383.29 rows=10000 width=1036) (actual time=0.190..1.654 rows=10000 loops=1)
Output: x, x
Index Cond: (demo1.x IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=32 read=20
Planning time: 1.049 ms
Execution time: 2.304 ms

Avoiding a sort operation can really improve the performance of queries in two ways: save the resources required by a sort operation (which will have to spill to disk when the workarea do not fit in memory) and avoid a blocking operation and then be able to return the first rows quickly.

We have seen how indexes can be used to access a subset of columns from a smaller structure, and how they can be used to access a sorted version of the rows. Future posts will show how the index access is used to quickly filter a subset of rows. But for the moment I’ll continue on this blocking operation. We have seen a lot of Postgres costs, and they have two values (startup cost and total cost). More on startup cost in the next post.

 

Cet article Postgres vs. Oracle access paths IV – Order By and Index est apparu en premier sur Blog dbi services.

Postgres vs. Oracle access paths V – FIRST ROWS and MIN/MAX

$
0
0

We have seen how an index can help to avoid a sorting operation in the previous post. This avoids a blocking operation: the startup cost is minimal and the first rows can be immediately returned. This is often desired when displaying rows to the user screen. Here is more about Postgres startup cost, Oracle first_rows costing, and fetching first rows only.

Here is the execution plan we had in Oracle to get the values of N sorted. The cost for Oracle is the cost to read the index leaves: estimated to 46 random reads:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID dbck3rgnqbakg, child number 0
-------------------------------------
select /*+ */ n from demo1 where n is not null order by n
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 46 (100)| 10000 |00:00:00.01 | 48 |
| 1 | INDEX FULL SCAN | DEMO1_N | 1 | 10000 | 46 (0)| 10000 |00:00:00.01 | 48 |
---------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

In PostreSQL, we have two costs (cost=0.29..295.29):

explain (analyze,verbose,costs,buffers) select n from demo1 where n is not null order by n ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n on public.demo1 (cost=0.29..295.29 rows=10000 width=4) (actual time=0.194..2.026 rows=10000 loops=1)
Output: n
Index Cond: (demo1.n IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=30
Planning time: 1.190 ms
Execution time: 2.966 ms

I explained where the total cost (295.29) comes from:

  • The index on the column X has 30 blocks witch is estimated at cost=120 (random_page_cost=4)
  • We have 10000 index entries to process, estimated at cost=50 (cpu_index_tuple_cost=0.005)
  • We have 10000 result rows to process, estimated at cost=100 (cpu_tuple_cost=0.01)
  • We have evaluated 10000 ‘is not null’ conditions, estimated at cost=25 (cpu_operator_cost=0.0025)

But the Postgres EXPLAIN also show the startup cost (0.29) which is the cost before returning the first rows (only few cpu_operator_cost here).

From that, I can guess that fetching 1 row will have the following cost:

  • The startup cost of 0.29
  • Read the first index page, cost=4 (random_page_cost=4)
  • 1 index entry to process at cpu_index_tuple_cost=0.005
  • 1 result row to process, estimated at cpu_tuple_cost=0.01
  • 1 ‘is not null’ conditions, estimated at cpu_operator_cost=0.0025

This should be approximately cost=4.3075 for one row. Roughly the cost to read one index page. We will see later that the query planner do not count this first index page.

Oracle First Rows

In Oracle, we have only the total cost in the execution plan, but we can estimate the cost to retrieve 1 row with the FIRST_ROWS(1) hint:


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 0fjk9vv4g1q1w, child number 0
-------------------------------------
select /*+ first_rows(1) */ n from demo1 where n is not null order by
n
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 (100)| 10000 |00:00:00.01 | 48 |
| 1 | INDEX FULL SCAN | DEMO1_N | 1 | 10000 | 2 (0)| 10000 |00:00:00.01 | 48 |
---------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

The cost here is small, estimated to 2 random reads (1 B*Tree branch and 1 leaf) which is sufficient to get the first row. Of course, I’ve estimated it for 1 row but I finally retrieved all rows (A-Rows=10000), reading all blocks (Buffers=48). However, my execution plan is optimized for fetching one row.

Fetch first rows

I can run the previous query and finally fetch only one row, but I can also explicitly filter the result to get one row only. If you use older versions of Oracle, you may have used the ‘rownum’ way of limiting rows, and this implicitly adds the first_rows hint. Here I’m using the FETCH FIRST syntax and I need to explicitely add the FIRST_ROWS() hint to get the plan optimized for that.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 9bcm542sk64az, child number 0
-------------------------------------
select /*+ first_rows(1) */ n from demo1 where n is not null order by n fetch first 1 row only
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 (100)| 1 |00:00:00.01 | 3 |
|* 1 | VIEW | | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 3 |
|* 2 | WINDOW NOSORT STOPKEY| | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 3 |
| 3 | INDEX FULL SCAN | DEMO1_N | 1 | 10000 | 2 (0)| 2 |00:00:00.01 | 3 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "N")<=1)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "from$_subquery$_002"."N"[NUMBER,22], "from$_subquery$_002"."rowlimit_$$_rownumber"[NUMBER,22] 2 - (#keys=1) "N"[NUMBER,22], "DEMO1".ROWID[ROWID,10], ROW_NUMBER() OVER ( ORDER BY "N")[22] 3 - "DEMO1".ROWID[ROWID,10], "N"[NUMBER,22]

The cost is the same, estimated to 2 random reads, but we see how Oracle implements the FETCH FIRST: with window functions. And only one row has been fetched (A-Rows) reading 3 blocks (buffers). Note that because the index is sorted, the window function is a NOSORT operation.

Postgres

I can run the same query on PostgreSQL and get the execution plan:

explain (analyze,verbose,costs,buffers) select n from demo1 where n is not null order by n fetch first 1 row only;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..0.31 rows=1 width=4) (actual time=0.124..0.124 rows=1 loops=1)
Output: n
Buffers: shared hit=3
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..295.29 rows=10000 width=4) (actual time=0.124..0.124 rows=1 loops=1)
Output: n
Index Cond: (demo1.n IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=3
Planning time: 0.576 ms
Execution time: 0.143 ms

Here, the total cost of the query is lower than the total cost of the Index Only Scan, because we know we will not read all index entries. Then the total cost of the query (0.31) is based on the startup cost (0.29) of the index access. I suppose there is 0.01 for the cpu_tuple_cost but I expected to see the cost to get the first page because we cannot get a row without reading the whole page. My guess is that Postgres divides the total cost (295) by the number of rows (10000) and uses that as a per-row estimation. This makes sense for a lot of rows but underestimates the cost to get the first page.

In order to validate my guess, I force a Seq Scan to have a higher cost and fetch first 5 rows:

explain (analyze,verbose,costs,buffers) select n from demo1 where n is not null fetch first 5 row only ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.76 rows=5 width=4) (actual time=0.026..0.029 rows=5 loops=1)
Output: n
Buffers: shared hit=1
-> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.022..0.024 rows=5 loops=1)
Output: n
Filter: (demo1.n IS NOT NULL)
Buffers: shared hit=1
Planning time: 1.958 ms
Execution time: 0.057 ms

My guess is: ( 1529.00 / 10000 ) * 5 = 0.7645 which is exactly the cost estimated for the Limit operation. This approximation does not take the page granularity into account.

MIN/MAX

The “order by n fetch first 1 row only” finally reads only one index entry, the first one, and returns the indexed value. We can get the same value with a “select max(N)” and Oracle has a special operation for that:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 29bsqfg69nudp, child number 0
-------------------------------------
select /*+ */ min(n) from demo1
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 (100)| 1 |00:00:00.01 | 2 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 2 |
| 2 | INDEX FULL SCAN (MIN/MAX)| DEMO1_N | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 2 |
-------------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) MIN("N")[22] 2 - "N"[NUMBER,22]

This goes through the index branches (blevel=1 here in this small index so root is the first and only one branch) to the first leaf in order to get the value in the first entry. This has read 2 blocks here. The same can be done to get the last index entry in case we “select max(N)”.

Postgres do not show a special operation for it, but a plan which is very similar to the one we have seen above when fetching the first row: Index Only Scan, with a Limit:


explain (analyze,verbose,costs,buffers) select min(n) from demo1 ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.31..0.32 rows=1 width=4) (actual time=0.123..0.124 rows=1 loops=1)
Output: $0
Buffers: shared hit=3
InitPlan 1 (returns $0)
-> Limit (cost=0.29..0.31 rows=1 width=4) (actual time=0.121..0.121 rows=1 loops=1)
Output: demo1.n
Buffers: shared hit=3
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..295.29 rows=10000 width=4) (actual time=0.119..0.119 rows=1 loops=1)
Output: demo1.n
Index Cond: (demo1.n IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=3
Planning time: 0.415 ms
Execution time: 0.140 ms

If we look at the ‘Index Only Scan’ we see exactly what I had at the top of this post with “select n from demo1 where n is not null order by n”.

Above it, there’s the Limit clause which is exactly the same as the one with the “fetch 1 row only” because the query planner understands that getting the MIN(N) is the same as getting the first value from the ordered index on N.

This is processed as a non-correlated subquery (query block), also called InitPlan. The result of it ($0) is used by the result with an additional cost of 0.01 for the cpu_tuple_cost in this additional step. I don’t really know the reason for this additional step here, but anyway, the cost is minimal. Basically, both Oracle and Postgres take advantage of the index structure to get the minimum – or first value – from the sorted index entries.

In this series, I’m running very simple queries in order to show how it works. In this post, we reached the minimum: one column and one row. The next post will finally select one additional column, which is not in the index.

 

Cet article Postgres vs. Oracle access paths V – FIRST ROWS and MIN/MAX est apparu en premier sur Blog dbi services.

Postgres vs. Oracle access paths VI – Index Scan

$
0
0

In the previous post my queries were still reading the indexed column only, from a table which had no modifications since the last vacuum, and then didn’t need to read table pages: it was Index Only Scan. However, we often need more columns than the ones that are in the index. Here is the Index Scan access path.

I’m continuing on the table that I’ve created in the first post of the series. I’ve run VACUUM (the lazy one, not the full one) and did not do any modification after that, as we have seen that Index Only Access is efficient only when there are no modifications.

create table demo1 as select generate_series n , 1 a , lpad('x',1000,'x') x from generate_series(1,10000);
SELECT 10000
create unique index demo1_n on demo1(n);
CREATE INDEX
vacuum demo1;
VACUUM

I have 10000 rows, a unique column N with decimal numbers, indexed and another column A which is not indexed.

Index Only Scan

I’ll now query one row, the one with N=1000.

explain (analyze,verbose,costs,buffers) select n from demo1 where n=1000 ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n on public.demo1 (cost=0.29..4.30 rows=1 width=4) (actual time=0.123..0.124 rows=1 loops=1)
Output: n
Index Cond: (demo1.n = 1000)
Heap Fetches: 0
Buffers: shared hit=3
Planning time: 0.625 ms
Execution time: 0.137 ms

It seems that the query planner estimates to read one block:

  • The startup cost of 0.29 as we have seen before
  • Read one index page, cost=4 (random_page_cost=4)
  • 1 result row to process, estimated at cpu_tuple_cost=0.01

As the index is a B*Tree with 30 pages, I expect to read at least one branch in addition to the leaf block. The execution has actually read 3 blocks (Buffers: shared hit=3). Here it seems that Postgres decides to ignore the branches and count only the leaf blocks.

In Oracle, the estimation cost=1 and execution has read 2 blocks:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gusay436hpzck, child number 0
-------------------------------------
select /*+ */ n from demo1 where n=1000
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 1 |00:00:00.01 | 2 |
|* 1 | INDEX UNIQUE SCAN| DEMO1_N | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 2 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("N"=1000)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Both Oracle and Postgres read only the index here. This is the fastest access to one indexed column: no need to read the table because the column is in the index. The use-case is quite limited here: just testing the existence of the column. I will now select another column than the one used in the where clause.

Select another column

I filter on N but now query the column A which is not in the index. The Index Only Scan changes to an Index Scan:

explain (analyze,verbose,costs,buffers) select a from demo1 where n=1000 ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Index Scan using demo1_n on public.demo1 (cost=0.29..8.30 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=1)
Output: a
Index Cond: (demo1.n = 1000)
Buffers: shared hit=3
Planning time: 0.639 ms
Execution time: 0.030 ms

The cost is the same except that there is one additional page to read, which pushes it to cost=8.30:

  • The startup cost of 0.29 as we have seen before
  • Read one index page, and one table page: cost=8 (random_page_cost=4)
  • 1 result row to process, estimated at cpu_tuple_cost=0.01

In Oracle it is not a different operation. We still have the INDEX UNIQUE SCAN, but in addition to it, an additional operation to read the table: TABLE ACCESS BY INDEX ROWID. The index entry returns the ROWID (physical address of the table block, equivalent to the Postgres TID). And then we have the detail of the cost, and execution buffer reads: one more block.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 8q4tcxgk1n1vn, child number 0
-------------------------------------
select /*+ */ a from demo1 where n=1000
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 (100)| 1 |00:00:00.01 | 3 |
| 1 | TABLE ACCESS BY INDEX ROWID| DEMO1 | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 3 |
|* 2 | INDEX UNIQUE SCAN | DEMO1_N | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 2 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("N"=1000)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"[NUMBER,22] 2 - "DEMO1".ROWID[ROWID,10]

The important thing here is within the predicate information where we see the part of the where clause which is not a filter applied after the scan, but is used for optimal access by the index. It is displayed as access() in Oracle execution plan:

access("N"=1000)

In PostgreSQL execution plan, the same information is displayed as ‘Index Cond':

Index Cond: (demo1.n = 1000)

Postgres Range Scan

That was retrieving only one row with an equality predicate on a unique index column. The index scan helps to get directly to the value because of the B*Tree structure. As the index is sorted, an inequality predicate can also use the index to find the rows in a range of values.

The Postgres plan looks the same, with Index Scan:

explain (analyze,verbose,costs,buffers) select a from demo1 where n<=1000 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Index Scan using demo1_n on public.demo1 (cost=0.29..175.78 rows=1000 width=4) (actual time=0.029..0.780 rows=1000 loops=1)
Output: a
Index Cond: (demo1.n <= 1000)
Buffers: shared hit=147
Planning time: 1.019 ms
Execution time: 0.884 ms

Same plan but of course we have more index blocks to scan, and more rows to fetch from the table, which is why the cost is higher.

In order to understand the cost, I’ve changed the query planner constants one by one. Here is what I got:

  • (cost=0.29..33.78 rows=1000 width=4) when seq_page_cost=0 instead of 1, which means that it estimates (175.78-33.78)/1=142 sequential reads
  • (cost=0.29..159.78 rows=1000 width=4) when random_page_cost=0 instead of 4, which means that it estimates (175.78-159.78)/4=4 random reads
  • (cost=0.29..165.78 rows=1000 width=4) when cpu_tuple_cost=0 instead of 0.01, which means that it estimates (175.78-165.78)/0.01=1000 rows
  • (cost=0.29..170.78 rows=1000 width=4) when cpu_index_tuple_cost=0 instead of 0.005, which means that it estimates (175.78-170.78)/0.005=1000 index entries
  • (cost=0.00..173.00 rows=1000 width=4) when cpu_operator_cost=0 instead of 0.0025, which means that it estimates (175.78-173.00)/0.0025=1112 cpu operations (116 for initial cost + 996 to get all rows)

I understand the 4 random reads from the index pages. However, I expected random reads, and not sequential reads, to fetch the rows from the table. But this is a case where the clustering factor is very good: the rows have been inserted in the same order as the indexed column, and this means that those reads from table probably read consecutive pages.

In order to validate this guess, I’ve traced the system calls on Linux

25734 open("base/12924/42427", O_RDWR) = 42
25734 lseek(42, 0, SEEK_END) = 11706368
25734 open("base/12924/42433", O_RDWR) = 43
25734 lseek(43, 0, SEEK_END) = 245760

The file descriptor 42 is my table (demo1) and the descriptor 43 is the index (demo1_n). The file name is in the open() call and it includes the file id:

select relname,relfilenode from pg_class where relname='demo1';
-[ RECORD 1 ]--+------
relname | demo1
relfilenode | 42427
 
select relname,relfilenode from pg_class where relname='demo1_n';
-[ RECORD 1 ]--+--------
relname | demo1_n
relfilenode | 42433

Then we see some random reads from the index (branches and first leaf):

25734 lseek(43, 0, SEEK_SET) = 0
25734 read(43, "100036037360374 b152"..., 8192) = 8192
25734 lseek(43, 24576, SEEK_SET) = 24576
25734 read(43, "121000836360374 35023720330237 "..., 8192) = 8192
25734 lseek(43, 8192, SEEK_SET) = 8192
25734 read(43, "13245t360374 211 340237 "..., 8192) = 8192

Then we see 53 reads from the table:

25734 lseek(42, 0, SEEK_SET) = 0
25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192
25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192
25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192
...

Only one lseek. The other reads are all single block (8k) I/O calls but without seek, which means that they are sequential. When relying on filesystem prefetching, this may avoid the latency for each I/O call.

Then the next leaf block from the index is read, and then 52 reads from the table (no lseek):

25734 read(43, "13245t360374 211 340237 "..., 8192) = 8192
25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192
25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192
25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192
...

And again, one index block and 38 contiguous table blocks:

25734 lseek(43, 32768, SEEK_SET) = 32768
25734 read(43, "13245t360374 211 340237 "..., 8192) = 8192
25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192
25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192
25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192
...

Here is the summary of the cost 175.78

  • The startup cost of 0.29 as we have seen before
  • Estimates 4 random reads (reading 1000 rows from a 30 pages index which contains 10000 rows): cost=16 (random_page_cost=4)
  • Estimates 142 sequential reads: cost=142 (seq_page_cost=1)
  • 1000 index entries to process, estimated at cost=5 (cpu_index_tuple_cost=0.005)
  • 1000 result row to process, estimated at cost=10 (cpu_tuple_cost=0.01)
  • about 1000 operators or functions estimated at cpu_operator_cost=0.0025

The very interesting thing here is that the query planner is totally aware of the clustering factor and uses sequential read estimation.

Oracle Range Scan

Here is the same query on the similar table on Oracle:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID a3gqx19xs9wxq, child number 0
-------------------------------------
select /*+ */ a from demo1 where n<=1000
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 147 (100)| 1000 |00:00:00.01 | 148 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO1 | 1 | 1000 | 147 (0)| 1000 |00:00:00.01 | 148 |
|* 2 | INDEX RANGE SCAN | DEMO1_N | 1 | 1000 | 4 (0)| 1000 |00:00:00.01 | 4 |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("N"<=1000)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"[NUMBER,22] 2 - "DEMO1".ROWID[ROWID,10]

The straces shows calls to pread:

open("/u01/oradata/CDB1A/PDB/users01.dbf", O_RDWR|O_DSYNC) = 7
fcntl(7, F_SETFD, FD_CLOEXEC) = 0
fcntl(7, F_DUPFD, 256) = 258
fcntl(258, F_SETFD, FD_CLOEXEC) = 0
close(7) = 0
pread(258, "62422313G275"142532'!1?275""..., 8192, 2252800 ) = 8192
pread(258, "62422413C275"14x2432'!1?275""..., 8192, 2260992 ) = 8192
pread(258, "6242313v3362274"24b+1&!1354274""..., 8192, 24731648 ) = 8192
pread(258, "6242314v3362274"24e*1&!1354274""..., 8192, 24739840 ) = 8192
pread(258, "6242315v3362274"24d51&!1354274""..., 8192, 24748032 ) = 8192
pread(258, "6242316v3362274"24g41&!1354274""..., 8192, 24756224 ) = 8192
pread(258, "6242317v3362274"24f71&!1354274""..., 8192, 24764416 ) = 8192
pread(258, "6242320v3362274"24y71&!1354274""..., 8192, 24772608 ) = 8192

pread is similar to lseek()+read() here and, as far as I know, Linux detects when there is no need to seek, and this allows prefetching as well. Oracle has also its own prefetching but I’ll not go into the detail here (read Timur Akhmadeev on Pythian blog about this).

With Oracle, there is no need to run strace because all system calls are instrumented as ‘wait events’ and here is a trace:

PARSE #140375247563104:c=2000,e=1872,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=187737470,tim=53267437268
EXEC #140375247563104:c=0,e=147,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=187737470,tim=53267437481
WAIT #140375247563104: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=74022 tim=53267437532
WAIT #140375247563104: nam='db file sequential read' ela= 8 file#=12 block#=275 blocks=1 obj#=74023 tim=53267437679
WAIT #140375247563104: nam='db file sequential read' ela= 5 file#=12 block#=276 blocks=1 obj#=74023 tim=53267437785
WAIT #140375247563104: nam='db file sequential read' ela= 5 file#=12 block#=3019 blocks=1 obj#=74022 tim=53267437902
FETCH #140375247563104:c=0,e=368,p=3,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=187737470,tim=53267437977
WAIT #140375247563104: nam='PGA memory operation' ela= 14 p1=0 p2=0 p3=0 obj#=74022 tim=53267438017
WAIT #140375247563104: nam='SQL*Net message from client' ela= 280 driver id=1413697536 #bytes=1 p3=0 obj#=74022 tim=53267438385
WAIT #140375247563104: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=74022 tim=53267438419
WAIT #140375247563104: nam='db file sequential read' ela= 3 file#=12 block#=3020 blocks=1 obj#=74022 tim=53267438443
WAIT #140375247563104: nam='PGA memory operation' ela= 7 p1=1114112 p2=2 p3=0 obj#=74022 tim=53267438475
WAIT #140375247563104: nam='db file sequential read' ela= 5 file#=12 block#=3021 blocks=1 obj#=74022 tim=53267438504
WAIT #140375247563104: nam='db file sequential read' ela= 3 file#=12 block#=3022 blocks=1 obj#=74022 tim=53267438532
WAIT #140375247563104: nam='db file sequential read' ela= 2 file#=12 block#=3023 blocks=1 obj#=74022 tim=53267438552
WAIT #140375247563104: nam='db file sequential read' ela= 3 file#=12 block#=3024 blocks=1 obj#=74022 tim=53267438576
WAIT #140375247563104: nam='db file sequential read' ela= 4 file#=12 block#=3025 blocks=1 obj#=74022 tim=53267438603
WAIT #140375247563104: nam='db file sequential read' ela= 26 file#=12 block#=3026 blocks=1 obj#=74022 tim=53267438647
WAIT #140375247563104: nam='db file sequential read' ela= 4 file#=12 block#=3027 blocks=1 obj#=74022 tim=53267438680
WAIT #140375247563104: nam='db file sequential read' ela= 2 file#=12 block#=3028 blocks=1 obj#=74022 tim=53267438699
WAIT #140375247563104: nam='db file sequential read' ela= 4 file#=12 block#=3029 blocks=1 obj#=74022 tim=53267438781
WAIT #140375247563104: nam='db file sequential read' ela= 3 file#=12 block#=3030 blocks=1 obj#=74022 tim=53267438807
WAIT #140375247563104: nam='db file sequential read' ela= 28 file#=12 block#=3031 blocks=1 obj#=74022 tim=53267438878
...

The name ‘sequential read’ does not mean the same as the Postgres ‘sequential read’. It only means single-block reads that are done one after the other, but they are actually random reads. However, looking at the block# they appear as reading contiguous blocks.

At the end, because I have an index with good clustering factor, and because I’m using the defaults on Linux without direct read and asynchronous I/O, the execution is very similar to the postgres one: read the few index blocks and follow the pointer to the 140 blocks of the table.

The cost estimation looks similar (same number) between Postgres and Oracle but it is not the same unit. Postgres estimates the cost with sequential reads, but Oracle estimates the cost as random reads. In addition to that, Postgres, with its default planner parameters, gives more importance than Oracle to the CPU usage.

This is the good case of Index Access where we have a good clustering/correlation factor between the physical order of the table and the logical order of the index. The random reads are finally behaving as sequential read because there is no seek() between them. You can imagine that in the next post I’ll try the same with a very bad clustering factor.

 

Cet article Postgres vs. Oracle access paths VI – Index Scan est apparu en premier sur Blog dbi services.

Postgres vs. Oracle access paths VIII – Index Scan and Filter

$
0
0

In the previous post we have seen a nice optimization to lower the consequences of bad correlation between the index and the table physical order: a bitmap, which may include false positives and then requires a ‘recheck’ of the condition, but with the goal to read each page only once. Now we are back to the well-clustered table where we have seen two possible access paths: IndexOnlyScan when all columns we need are in the index, and IndexScan when we select additional columns. Here is a case in the middle: the index does not have all the columns required by the select, but can eliminate all rows.

The table created is:

create table demo1 as select generate_series n , 1 a , lpad('x',1000,'x') x from generate_series(1,10000);
SELECT 10000
create unique index demo1_n on demo1(n);
CREATE INDEX
vacuum demo1;
VACUUM

Index Only Scan and Filter

I use only one column (N), which is indexed, in the SELECT clause and the WHERE clause. And this WHERE clause is silly: in addition to the n<=1000 I've used in previous post to focus on 10% of rows, I add a condition which is always false: mod(n,100)=1000

explain (analyze,verbose,costs,buffers) select n from demo1 where n<=1000 and mod(n,100)=1000 ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n on public.demo1 (cost=0.29..38.78 rows=5 width=4) (actual time=0.276..0.276 rows=0 loops=1)
Output: n
Index Cond: (demo1.n <= 1000)
Filter: (mod(demo1.n, 100) = 1000)
Rows Removed by Filter: 1000
Heap Fetches: 0
Buffers: shared hit=5
Planning time: 0.454 ms
Execution time: 0.291 ms

Index Only Scan is used here because no other columns are used. The n<=1000 is the access condition (Index Cond.) doing a range scan on the index structure. The mod(n,100)=1000 is a filter predicate which is applied to the result of the index access (Filter) and we have additional information that the 1000 rows selected by the access predicate have been filtered out (Rows Removed by Filter). During the execution, 5 index buffers have been read for the range scan (branches + leaves). Because I vacuumed any changes, the visibility map knows that all rows can be displayed and there are no blocks to read from the table (Heap Fetches).

Now I’ll select another column in order to see an Index Scan. We have seen in the previous post that the huge cost of index access is the access to the table. Filtering most of the rows from the index entries is the most common recommendation to optimize a query. And my example here is running the extreme case: a predicate on the indexed column removes all rows.

Index Scan and Filter

I’ve just changed the ‘select n’ to ‘select a':


explain (analyze,verbose,costs,buffers) select a from demo1 where n<=1000 and mod(n,100)=1000 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using demo1_n on public.demo1 (cost=0.29..184.78 rows=5 width=4) (actual time=0.427..0.427 rows=0 loops=1)
Output: a
Index Cond: (demo1.n <= 1000)
Filter: (mod(demo1.n, 100) = 1000)
Rows Removed by Filter: 1000
Buffers: shared hit=147
Planning time: 0.434 ms
Execution time: 0.440 ms

I can understand that the cost is higher. The optimizer may not know that mod(n,100) will never be equal to 1000. Estimating 5 rows, as in the previous case, is ok for me. We see different Output (different SELECT clause) but same information about Index Cond, Filter, and Rows Removed (same WHERE clause). The estimation part looks good.

However, there’s something that I can’t understand. At execution, we know that all rows can be removed before going to the table. We go to the table to get the value from A but all rows were filtered out from the index. At least it was the case with the Index Only Scan, and we know that the filter condition has all values from the index.

However, 147 blocks were read here. We have seen that the index scan reads 5 index pages, and then we can guess that 142 table pages have been read, exactly 10% of the pages from my correlated table. It seems that all rows have been read from the table before being filtered out. The Index Scan being one operation, the filter occurs at the end only. This is only my guess and I hope to get comments about that.

Oracle

With Oracle, the first query, selecting only indexed columns is an INDEX RANGE SCAN similar to the Postgres one.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID fj36y2vph9u8f, child number 0
-------------------------------------
select /*+ */ n from demo1 where n<=1000 and mod(n,100)=1000
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| 0 |00:00:00.01 | 3 |
|* 1 | INDEX RANGE SCAN| DEMO1_N | 1 | 10 | 4 (0)| 0 |00:00:00.01 | 3 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("N"<=1000)
filter(MOD("N",100)=1000)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Oracle does not know either that the filter predicate mod(n,100)=1000 eliminates all rows and estimates this kind of predicate to 10% of rows (a generic value) after the access predicate returning 10% (this one is calculated from statistics). 3 blocks were read: index branch + leaves.

Reading an additional table from the column does not change this INDEX RANGE SCAN operation but just adds one step to go to the table:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 1rpmvq3jj8hgq, child number 0
-------------------------------------
select /*+ */ a from demo1 where n<=1000 and mod(n,100)=1000
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 6 (100)| 0 |00:00:00.01 | 3 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO1 | 1 | 10 | 6 (0)| 0 |00:00:00.01 | 3 |
|* 2 | INDEX RANGE SCAN | DEMO1_N | 1 | 10 | 4 (0)| 0 |00:00:00.01 | 3 |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("N"<=1000)
filter(MOD("N",100)=1000)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"[NUMBER,22] 2 - "DEMO1".ROWID[ROWID,10]

Having two operations, the filter removes the rows on the output of the index range scan on line 2 and then has to go to the table only for rows that remain. No additional buffer reads on this step 1 when there are no rows. With Oracle, we build indexes to optimize the access predicates and we add columns to optimize the filter predicate. We can go further by adding all projections and avoid completely the access to the table, but that is not always needed. If we can apply all where clause filters on the indexed columns, then the access to the table remains proportional to the result. And the end-user usually accept longer response time for long results. And index access response time is proportional to the result.

The decomposition in two operations is also convenient to see which columns projection is done for the index result or the table result. Here the only output of the index range scan at line 2 is the ROWID and the output from the table access at line 1 is the column we select. So, we have two operations here. We have seen that INDEX RANGE SCAN can run alone. And we will see in the next post that the TABLE ACCESS BY INDEX ROWID can also run alone.

So what?

I hope that Postgres experts will comment about the need to read the table pages even when we can filter all rows from the index scan. We can do something similar by re-writing the query where we can see that the access to the table is never executed:

explain (analyze,verbose,costs,buffers) select a from demo1 where n in (select n from demo1 where n<=1000 and mod(n,100)=1000 ) ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.57..76.35 rows=5 width=4) (actual time=0.285..0.285 rows=0 loops=1)
Output: demo1.a
Buffers: shared hit=5
-> Index Only Scan using demo1_n on public.demo1 demo1_1 (cost=0.29..34.78 rows=5 width=4) (actual time=0.284..0.284 rows=0 loops=1)
Output: demo1_1.n
Index Cond: (demo1_1.n <= 1000)
Filter: (mod(demo1_1.n, 100) = 1000)
Rows Removed by Filter: 1000
Heap Fetches: 0
Buffers: shared hit=5
-> Index Scan using demo1_n on public.demo1 (cost=0.29..8.30 rows=1 width=8) (never executed)
Output: demo1.n, demo1.a, demo1.x
Index Cond: (demo1.n = demo1_1.n)

But this involves a join, and join methods will deserve another series of blog posts. The next one on access paths will show the TABLE ACCESS BY INDEX ROWID equivalent, Tid Scan. Then I’ll have covered all access paths.

 

Cet article Postgres vs. Oracle access paths VIII – Index Scan and Filter est apparu en premier sur Blog dbi services.


Postgres vs. Oracle access paths IX – Tid Scan

$
0
0

In the previous post we have seen how Postgres and Oracle finds the table row from the index entry. It uses the TID / ROWID. I’ll focus on this access path and I will have covered all Postgres access paths to table data.

Oracle ACCESS BY ROWID

I start with Oracle because we already have seen the TABLE ACCESS BY ROWID. I’ll decompose an index acces to the table. The first step is getting the ROWID from the index entry:

SQL> select /*+ */ rowid from demo1 where n=1000;
 
ROWID
------------------
AAASPkAAMAAABIaAAF

The ROWID contains the data object ID (to be able to identify the segment and then the tablespace), the relative file number within the tablespace, the block number within this file and the row number within the block. This can be stored in 10 bytes. When in an index entry, except if this is a global index on a partitioned table, we don’t need the object ID (because there’s a one-to-one relationship between the table and the index objects) and the only 6 bytes are stored in the index entry.

This is a simple index access and the output (projection) is the ROWID:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 32tsqy19ctmd4, child number 0
-------------------------------------
select /*+ */ rowid from demo1 where n=1000
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 1 |00:00:00.01 | 2 |
|* 1 | INDEX UNIQUE SCAN| DEMO1_N | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 2 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("N"=1000)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - ROWID[ROWID,10]

Now with the ROWID, I query a column from the table:
SQL> select /*+ */ a from demo1 where rowid='AAASPkAAMAAABIaAAF';
 
A
----------
1

And the plan is exactly the ‘TABLE ACCESS’ part we have seen in previous posts on index scans:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID c46nq5t0sru8q, child number 0
-------------------------------------
select /*+ */ a from demo1 where rowid='AAASPkAAMAAABIaAAF'
Plan hash value: 3196731035
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 1 |00:00:00.01 | 1 |
| 1 | TABLE ACCESS BY USER ROWID| DEMO1 | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 1 |
-----------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"[NUMBER,22]

There’s no Predicate section visible here, but the access is done on the ROWID which contains the file number, block number, and row number. This is the fastest way to get one row: reading only one buffer.

Postgres Tid Scan

Same idea in Postgres where we can query the TID (Tumple ID):

select ctid from demo1 where n=1000 ;
ctid
---------
(142,6)
(1 row)

Because my table is stored in a file (no tablespace with multiple data files here) the TID contains only the block number and the row number within the block.
explain (analyze,verbose,costs,buffers) select ctid from demo1 where n=1000 ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Index Scan using demo1_n on public.demo1 (cost=0.29..8.30 rows=1 width=6) (actual time=0.009..0.009 rows=1 loops=1)
Output: ctid
Index Cond: (demo1.n = 1000)
Buffers: shared hit=3
Planning time: 0.429 ms
Execution time: 0.023 ms

We already have seen the cost of this operation: 116 startup operations, 2 index pages read at random_page_cost=4 and 1 result row at cpu_tuple_cost=0.01 (note that the query planner does not count the cpu_index_tuple_cost here).

Then here is the query using this TID:
explain (analyze,verbose,costs,buffers) select a from demo1 where ctid='(142,6)' ;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Tid Scan on public.demo1 (cost=0.00..4.01 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
Output: a
TID Cond: (demo1.ctid = '(142,6)'::tid)
Buffers: shared hit=1
Planning time: 0.351 ms
Execution time: 0.017 ms

The cost estimation is very simple here: 1 seek()+read() at random_page_cost=4 and 1 result row at cpu_tuple_cost=0.01

Since the post on Index Only Scan, I’m working on a vacuumed table with no modifications. Now that I have the simplest access path, I’ll show the same after an update, in the next post.

 

Cet article Postgres vs. Oracle access paths IX – Tid Scan est apparu en premier sur Blog dbi services.

Postgres vs. Oracle access paths X – Update

$
0
0

In the previous post we have seen the cheapest way to get one row, reading only one block from its physical location. But that’s the optimal case where the row has not moved. I’ll (nearly) conclude this series about access path with an update.

ROWID in Oracle

Here is the ROWID of one row in Oracle:

select rowid from demo1 where n=1000;
ROWID
------------------
AAAR4WAAMAAAAEaAAF

There’s enough information here to get directly to the block with file_name and offset:
select file_name,dbms_rowid.rowid_block_number('AAAR4WAAMAAAAEaAAF')*block_size offset
from dba_data_files join dba_tablespaces using(tablespace_name)
where file_id=dbms_rowid.rowid_to_absolute_fno('AAAR4WAAMAAAAEaAAF','DEMO','DEMO1');
 
FILE_NAME OFFSET
---------------------------------------- ----------
/u01/oradata/CDB1A/PDB/users01.dbf 2310144

The ROWID also contains the index of the row within the block’s row directory:

select dbms_rowid.rowid_row_number('AAAR4WAAMAAAAEaAAF') from dual;
 
DBMS_ROWID.ROWID_ROW_NUMBER('AAAR4WAAMAAAAEAAAF')
-------------------------------------------------
5

TID in Postgres

And the TID of similar row in Postgres:

select ctid from demo1 where n=1000;
ctid
---------
(142,6)

The file is known from the table, as there is only one file per table:

show data_directory;
data_directory
----------------------------
/usr/share/postgresql/data
 
select pg_relation_filepath('demo1');
pg_relation_filepath
----------------------
base/16437/125852

The blocksize is common for the whole database:

show block_size;
block_size
------------
8192

Then the block is at offset 142+8192=8334.
Within the block, the row is at index 6.

SELECT

We have seen in the previous post that we can select using the ROWID/TID and Oracle and Postgres behave the same: only one block to read, cost estimation based on one random read:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 8mj3ms08x0sfh, child number 0
-------------------------------------
select /*+ */ a from demo1 where rowid='AAAR4WAAMAAAAEaAAF'
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 1 |00:00:00.01 | 1 |
| 1 | TABLE ACCESS BY USER ROWID| DEMO1 | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 1 |
-----------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"[NUMBER,22]

Different units but same signification: cost=1 for Oracle is for random reads, cost=1 for Postgres is for sequential reads and random reads are estimated to cost=4:

explain (analyze,verbose,costs,buffers) select a from demo1 where ctid='(142,6)' ;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Tid Scan on public.demo1 (cost=0.00..4.01 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=1)
Output: a
TID Cond: (demo1.ctid = '(142,6)'::tid)
Buffers: shared hit=1
Planning time: 0.358 ms
Execution time: 0.016 ms

Oracle UPDATE

Now I’m updating this row, changing the column X which contains 1000 ‘x’ characters to 1000 ‘y’ characters:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gpqv4k6m1q20y, child number 0
-------------------------------------
update /*+ */ demo1 set x=lpad('y',1000,'y') where rowid='AAAR4WAAMAAAAEaAAF'
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 1 (100)| 0 |00:00:00.01 | 4 |
| 1 | UPDATE | DEMO1 | 1 | | | 0 |00:00:00.01 | 4 |
| 2 | TABLE ACCESS BY USER ROWID| DEMO1 | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 1 |
------------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
2 - (upd=2) ROWID[ROWID,10], "X"[VARCHAR2,1000]

In addition to the access to the block (1 buffer) the update had to read 3 additional buffers. There are no indexes on this updated column and then Oracle has no additional maintenance to do. One buffer is the table block to update (the TABLE ACCESS BY USER ROWID was a consistent get, the update needs the current version of the block).

Additional buffers are from the UNDO tablespace for MVCC (Multi Version Concurrency Control). It is the first modification in my transaction and then has to update the transaction table and undo segment, which is why we see 2 additional buffers. Another update within the same transaction reads only two buffers in total:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gpqv4k6m1q20y, child number 0
-------------------------------------
update /*+ */ demo1 set x=lpad('z',1000,'z') where rowid='AAAR4WAAMAAAAEaAAF'
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 1 (100)| 0 |00:00:00.01 | 2 |
| 1 | UPDATE | DEMO1 | 1 | | | 0 |00:00:00.01 | 2 |
| 2 | TABLE ACCESS BY USER ROWID| DEMO1 | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 1 |
------------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
2 - (upd=2) ROWID[ROWID,10], "X"[VARCHAR2,1000]

Only the table blocks are read: one consistent read as of the beginning of the query (or the transaction if in serializable isolation level) and one for the current block. Oracle has an optimization called In-Memory UNDO to avoid frequent access undo blocks.

There are no further re-visits needed. Oracle may choose to come back at commit if it can be done quickly (few blocks still in buffer cache) but that’s not required. The block can stay like this for years without the need to read it again for cleanup. If another session has to read it, then cleanup may be done by this session.

Postgres UPDATE

Here is the same update in Postgres:

explain (analyze,verbose,costs,buffers) update demo1 set x=lpad('y',1000,'y') where ctid='(142,6)' ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on public.demo1 (cost=0.00..4.01 rows=1 width=46) (actual time=0.214..0.214 rows=0 loops=1)
Buffers: shared hit=6 dirtied=3
-> Tid Scan on public.demo1 (cost=0.00..4.01 rows=1 width=46) (actual time=0.009..0.009 rows=1 loops=1)
Output: n, a, 'yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy'::text, ctid
TID Cond: (demo1.ctid = '(142,6)'::tid)
Buffers: shared hit=1
Planning time: 0.405 ms
Execution time: 0.232 ms

The Tid Scan is the same as for the select. Then the update has read 5 blocks and modified 3 of them. The update in Postgres is processed as a delete+insert. Here is my guess about those numbers. The new version is inserted, in a new block if there is no free space in the same block. The old version is updated. And the index must be maintained. Those are 3 blocks to modify. Here, the row was directly accessed through its TID. But we must find the index entry. The row contains the index value, and then an index scan is possible: two block reads for this small index having one branch only.

SELECT again

I said that with Oracle the row is updated in-place and doesn’t need further cleanup. If I run the same SELECT as the one I did before the UPDATE, I still have only one block to read:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 8mj3ms08x0sfh, child number 0
-------------------------------------
select /*+ */ a from demo1 where rowid='AAAR4WAAMAAAAEaAAF'
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 1 |00:00:00.01 | 1 |
| 1 | TABLE ACCESS BY USER ROWID| DEMO1 | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 1 |
-----------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"[NUMBER,22]

In Postgres, because the update was processed as insert+delete, running the same also reads only one block, but it returns no rows:

explain (analyze,verbose,costs,buffers) select a from demo1 where ctid='(142,6)' ;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Tid Scan on public.demo1 (cost=0.00..4.01 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1)
Output: a
TID Cond: (demo1.ctid = '(142,6)'::tid)
Buffers: shared hit=1
Planning time: 0.442 ms
Execution time: 0.028 ms

The new version is in another block, then the TID to find it is different:

select ctid from demo1 where n=1000;
ctid
----------
(1428,5)
(1 row)

There was not enough space for another version of the whole row within the same block. Free space was found in the last block (1428). Of course, this is why the index was updated even if the indexed column did not change: it had to address a different block.

Let’s query with the new TID:

explain (analyze,verbose,costs,buffers) select a from demo1 where ctid='(1428,5)' ;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Tid Scan on public.demo1 (cost=0.00..4.01 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=1)
Output: a
TID Cond: (demo1.ctid = '(1428,5)'::tid)
Buffers: shared hit=1
Planning time: 0.449 ms
Execution time: 0.023 ms

Only one buffer read. However, as we have seen with the Index Only Scan, there is a need for cleanup to avoid Heap Fetches. There are also the old tuples that should be removed later or the updated tables and indexes grow forever.

There’s only one Postgres access path remaining. That’s for the last post of this series, which will include the table of content.

 

Cet article Postgres vs. Oracle access paths X – Update est apparu en premier sur Blog dbi services.

Postgres vs. Oracle access paths XI – Sample Scan

$
0
0

I was going to end this series with the previous post because the last access path available in Postgres is a bit special: a Seq Scan that returns only a sample of the rows, at random. However, it is the occasion to come back to the difference between random and sequential reads.

I’m still working on the same table as in the previous posts, with 10000 rows in 1429 pages. 5% of rows is 500 rows and 5% of blocks is about 72 pages.

Rows

Sometimes, you can answer your business question on a sample of rows, when you need an approximate result, trend or pattern Let’s say that you want to sum() on only 5 percent of rows:
explain (analyze,verbose,costs,buffers) select sum(a) from demo1 tablesample bernoulli(5) ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1435.25..1435.26 rows=1 width=8) (actual time=1.940..1.940 rows=1 loops=1)
Output: sum(a)
Buffers: shared hit=1429
-> Sample Scan on public.demo1 (cost=0.00..1434.00 rows=500 width=4) (actual time=0.007..1.890 rows=509 loops=1)
Output: n, a, x
Sampling: bernoulli ('5'::real)
Buffers: shared hit=1429
Planning time: 0.373 ms
Execution time: 1.956 ms

This row sampling reads all rows and picks a sample of them at random. Unfortunately, it reads all blocks because you cannot get a good sample if you don’t know how many rows you have in each block. Working on a sample can make sense if you want to apply complex operations on the result. Here the cost in the database is similar to a Seq Scan: 1429 blocks read at seq_page_cost=1, but the sum() applied on 500 rows (cpu_operator_cost=0.0025) and 500 tuples from the scan and 1 tuple for the result, with cpu_tuple_cost=0.01

From execution statistics, you can see that the result is exactly what we asked: 500 rows returned.

Oracle has a different syntax and different algorithm:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 1tsadjdd9ddam, child number 0
-------------------------------------
select /*+ */ sum(a) from demo1 sample(5)
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 397 (100)| 1 |00:00:00.01 | 581 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 581 |
| 2 | TABLE ACCESS SAMPLE| DEMO1 | 1 | 500 | 397 (0)| 478 |00:00:00.01 | 581 |
-----------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) SUM("A")[22] 2 - (rowset=256) "A"[NUMBER,22]

Here we have not read all the blocks. Only 40% of them. This is faster than the Postgres approach, but the drawback is that the result is not exact: 478 rows were returned here.

Blocks

When we can afford an approximate sampling, we can sample on blocks rather than on rows:
explain (analyze,verbose,costs,buffers) select sum(a) from demo1 tablesample system(5) ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Aggregate (cost=290.25..290.26 rows=1 width=8) (actual time=0.479..0.480 rows=1 loops=1)
Output: sum(a)
Buffers: shared hit=73
-> Sample Scan on public.demo1 (cost=0.00..289.00 rows=500 width=4) (actual time=0.016..0.377 rows=511 loops=1)
Output: n, a, x
Sampling: system ('5'::real)
Buffers: shared hit=73
Planning time: 0.698 ms
Execution time: 0.509 ms

The number of rows is still good here, but the result may depend on the blocks sampled. Only 73 blocks were read, which is exactly 5% and of course, the rows may be distributed differently within the blocks. However, the advantage is that it is faster as it reads less blocks. But those blocks being picked at random, they are by definition random reads: 71 pages read at random_page_cost=0:4 and, as in the previous case, 501 cpu_tuple_cost and 500 cpu_operator_cost

With block sampling, Oracle reads a smaller number of blocks than with row sampling, but still more than 5%, and the number of rows is not exact: 798 rows here:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID fqgbwqfavgdrn, child number 0
-------------------------------------
select /*+ */ sum(a) from demo1 sample block(5)
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 22 (100)| 1 |00:00:00.01 | 134 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 134 |
| 2 | TABLE ACCESS SAMPLE| DEMO1 | 1 | 500 | 22 (0)| 798 |00:00:00.01 | 134 |
-----------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) SUM("A")[22] 2 - (rowset=256) "A"[NUMBER,22]

Again, as for the previous access paths: same concepts and different implementation between Postgres and Oracle. Everything looks similar and easily portable from a far overview, but going into details you see all those little differences which make it no so easy to be database agnostic or easily portable.

Summary

This is the end of this series comparing Postgres access path with Oracle ones. The goal is not to tell you that one is better than the other. They have a different approach, different targets, different price, different history and probably future. But understanding how they work and how they estimate the cost is a good way to learn. I’m myself learning a lot about Postgres while writing those posts, matching things I discover on Postgres with those I know for a while in Oracle.

Here is the list of posts on Access Path:

  1. Postgres vs. Oracle access paths – intro
  2. Postgres vs. Oracle access paths I – Seq Scan
  3. Postgres vs. Oracle access paths II – Index Only Scan
  4. Postgres vs. Oracle access paths III – Partial Index
  5. Postgres vs. Oracle access paths IV – Order By and Index
  6. Postgres vs. Oracle access paths V – FIRST ROWS and MIN/MAX
  7. Postgres vs. Oracle access paths VI – Index Scan
  8. Postgres vs. Oracle access paths VII – Bitmap Index Scan
  9. Postgres vs. Oracle access paths VIII – Index Scan and Filter
  10. Postgres vs. Oracle access paths IX – Tid Scan
  11. Postgres vs. Oracle access paths X – Update
  12. Postgres vs. Oracle access paths XI – Sample Scan

I think my next series will be on Join methods.

 

Cet article Postgres vs. Oracle access paths XI – Sample Scan est apparu en premier sur Blog dbi services.

Searching wikipedia from the command line

$
0
0

Wouldn’t it be nice if you could search wikipedia from the command line? I often need to quickly look up a definition or want to know more about a specific topic when I am working on the command line. So here is how you can do it …

What you need is npm and wikit. On my debian based system I can install both with:

$ sudo apt-get install npm
$ sudo npm install wikit -g
$ sudo ln -s /usr/bin/nodejs /usr/bin/node

The link is to avoid the following issue:

$ wikit postgresql
/usr/bin/env: ‘node’: No such file or directory

For Fedora/RedHat/Centos you should use yum:

$ sudo yum install npm -y
$ sudo npm install wikit -g

Once you have that you can use wikit to query wikipedia (summary):

$ wikit postgresql
 PostgreSQL, often simply Postgres, is an object-relational database management system
 (ORDBMS) with an emphasis on extensibility and standards compliance. As a database
 server, its primary functions are to store data securely and return that data in
 response to requests from other software applications. It can handle workloads ranging
 from small single-machine applications to large Internet-facing applications (or
 for data warehousing) with many concurrent users; on macOS Server, PostgreSQL is
 the default database; and it is also available for Microsoft Windows and Linux (supplied
 in most distributions). PostgreSQL is ACID-compliant and transactional. PostgreSQL
 has updatable views and materialized views, triggers, foreign keys; supports functions
 and stored procedures, and other expandability. PostgreSQL is developed by the PostgreSQL
 Global Development Group, a diverse group of many companies and individual contributors.
 It is free and open-source, released under the terms of the PostgreSQL License,
 a permissive software license.

Cool. When you want to read the output in your default browser instead of the console you can do this as well by adding then “-b” flag:

$ wikit postgresql -b

When you want to open the “disambiguation” page in your browser:

$ wikit postgresql -d

Selection_013

Changing the language is possible as well with the “-lang” switch:

$ wikit --lang de postgresql 
 PostgreSQL (englisch [,pəʊstgɹɛs kjʊ'ɛl]), oft kurz Postgres genannt, ist ein freies,
 objektrelationales Datenbankmanagementsystem (ORDBMS). Seine Entwicklung begann
 in den 1980er Jahren, seit 1997 wird die Software von einer Open-Source-Community
 weiterentwickelt. PostgreSQL ist weitgehend konform mit dem SQL-Standard ANSI-SQL
 2008, d.h. der Großteil der Funktionen ist verfügbar und verhält sich wie definiert.
 PostgreSQL ist vollständig ACID-konform (inklusive der Data Definition Language),
 und unterstützt erweiterbare Datentypen, Operatoren, Funktionen und Aggregate. Obwohl
 sich die Entwicklergemeinde sehr eng an den SQL-Standard hält, gibt es dennoch eine
 Reihe von PostgreSQL-spezifischen Funktionalitäten, wobei in der Dokumentation bei
 jeder Eigenschaft ein Hinweis erfolgt, ob dies dem SQL-Standard entspricht, oder
 ob es sich um eine spezifische Erweiterung handelt. Darüber hinaus verfügt PostgreSQL
 über ein umfangreiches Angebot an Erweiterungen durch Dritthersteller, wie z.B.
 PostGIS zur Verwaltung von Geo-Daten. PostgreSQL ist in den meisten Linux-Distributionen
 enthalten. Apple liefert ab der Version Mac OS X Lion (10.7) PostgreSQL als Standarddatenbank

Quite helpful …

 

Cet article Searching wikipedia from the command line est apparu en premier sur Blog dbi services.

Announcing the dbi OpenDB Appliance

$
0
0

As already announced on Twitter and LinkedIn here is the blog post to describe our OpenDB appliance in more detail. I am sure you wonder what this is about so let me explain why we are doing this. What we do see day by day at our customers is that more and more databases get consolidated on to a VMWare deployment. This is not only true for the smaller ones of those but also for the critical, potentially much bigger ones. What makes it complicated, especially for smaller companies that do not necessarily have the knowhow for the specific database, is that you need to apply the best practices not only to the database deployment but also to the operating system and the VMWare deployment. But even if you have this already in place: Do you know how to deploy the PostgeSQL binaries, how to setup a PostgreSQL instance, how to monitor and how to backup and restore all that? Do you know how to do this with MySQL/MariaDB, MongoDB, Cassandra? If your answer to this is no but you need to have a PostgreSQL/MySQL/MariaDB/MongoDB/Cassandra instance ready quite fast then the dbi OpenDB Appliance might be the solution for you. Let’s dig into some details.

OpenDB-logo

A typical use case: You are forced to support an application which is running on a specific database. What do you do? Quickly setup a Linux VM, download the installer, clicking next, next, next and hopefully make the application connect to what you just installed and then cross your fingers and hope that never ever something goes wrong? You laugh? There are deployments out there which got setup in exactly this way. Another option would be to hire someone who is experienced in that area. This will not help you either as you’d at least need two people (because people tend to want to go to holidays from time to time). The next option would be to work together with external consultants which probably will work as long as you work with the right ones. Completely outsourcing the stuff is another option (or even going to the cloud), if you want to do that. With the dbi OpenDB Appliance you’ll get another option: We deliver a fully pre-configured VMWare based virtual machine image which you can easily plug into your existing VMWare landscape. Can that work? Let me explain what you would get:

As said just before you get an image which you can import into your VMWare ESX. I said this image is pre-configured, what does that mean? Well, when you start it up it boots into a CentOS 7.3 x64 Linux operating system. No magic, I know :) Additionally you’ll get four pre-configured disks:

/       15GB    The Linux operating system
/boot	1GB	The boot images (kernels)
/u01	50GB	All files belonging to the OpenDB itself
                All required DMK packages
                All source files (PostgreSQL, MariaDB, MongoDB, Cassandra)
                The Linux yum repositories
                The HOMEs of all product installations
                The admin directories for the initialized products
/u02	10GB	The data files belonging to the initialized products
/u03	10GB	The redo/wal files belonging to the initialized products
/u04	10GB	Backups

You are not supposed to touch the root, /boot and /u01 partitions but of course you will be able to resize /u02 to /u04. The 10GB provided initially are just meant as minimum setup. Resize your VMWare disk images (vmdks) and the dbi OpenDB command line utility offers you to resize the file systems as well with just a single call. At this point you probably wonder what the dbi OpenDB command line utility is about. In short this is a wrapper around our various DMK packages. Using one of the various DMK packages you can deploy and monitor databases even today. The command line utility makes use of that and wraps around the various DMKs. The interface is menu driven to make it as easy as possible for you and helps you with initializing the appliance (setting the hostname, network configuration and disk resizing). In addition you can install the products we support and create database instances on top of that without knowing the details. We take care of implementing the best practices in the background (kernel settings, file system layout, initialization parameters, …). But that is not all: We’ll go a step further and implement monitoring, alerting and backup procedures as well. The idea is that you really do not need to take care of such things: It just comes when you setup a product.

To give you an idea you’ll get something like this when you fire up the command line utility:

==============================================================================================
=                                                                                            =
=                                                                                            =
=       _ _    _    ___                 ___  ___     _             _ _                       =
=    __| | |__(_)  / _ \ _ __  ___ _ _ |   \| _ )   /_\  _ __ _ __| (_)__ _ _ _  __ ___      =
=   / _  | '_ \ | | (_) | '_ \/ -_) ' \| |) | _ \  / _ \| '_ \ '_ \ | / _  | ' \/ _/ -_)     =
=   \__,_|_.__/_|  \___/| .__/\___|_||_|___/|___/ /_/ \_\ .__/ .__/_|_\__,_|_||_\__\___|     =
=                       |_|                             |_|  |_|                             =
=                                                                                            =
=                                                                                            =
=      Please make a selection from the menu below (type 'q' to exit):                       =
=                                                                                            =
=      1. Deploy a database home                                                             =
=      2. List the deployed database homes                                                   =
=      3. Setup a database instance                                                          =
=      4. List the deployed database instances                                               =
=                                                                                            =
=     10. Stop and remove a database instance                                                =
=     11. Remove a database home                                                             =
=                                                                                            =
=                                                                                            =
=     99. Initialize the appliance                                                           =
=                                                                                            =
=                                                                                            =
==============================================================================================
 
 Your input please: 

You would start by “Initialize the appliance” to set your preferred host name, to initialize the network and to provide the monitoring credentials. Once done you can go on and start deploying product homes (e.g. PostgreSQL) and instances on top of that. Of course you can deploy multiple instances on the same home and you can install several homes of the same product version.

What do we mean by a “product”? A product is what we support with a specific release of the appliance. Initially this probably will be:

  • PostgreSQL 9.6.5
  • PostgreSQL 9.5.9

So the menu would offer you something like this for deploying the binaries:

==============================================================================================
=                                                                                            =
=                                                                                            =
=       _ _    _    ___                 ___  ___     _             _ _                       =
=    __| | |__(_)  / _ \ _ __  ___ _ _ |   \| _ )   /_\  _ __ _ __| (_)__ _ _ _  __ ___      =
=   / _  | '_ \ | | (_) | '_ \/ -_) ' \| |) | _ \  / _ \| '_ \ '_ \ | / _  | ' \/ _/ -_)     =
=   \__,_|_.__/_|  \___/| .__/\___|_||_|___/|___/ /_/ \_\ .__/ .__/_|_\__,_|_||_\__\___|     =
=                       |_|                             |_|  |_|                             =
=                                                                                            =
=                                                                                            =
=      Please make a selection from the menu below (type 'q' to exit, 'b' to go back):       =
=                                                                                            =
=                                                                                            =
=     000 - PostgreSQL 9.6.5                                                                 =
=     001 - PostgreSQL 9.5.9                                                                 =
=                                                                                            =
=                                                                                            =
==============================================================================================
 
 Your input please: 

Once you have deployed the homes you require you can list them:

==============================================================================================
=                                                                                            =
=                                                                                            =
=       _ _    _    ___                 ___  ___     _             _ _                       =
=    __| | |__(_)  / _ \ _ __  ___ _ _ |   \| _ )   /_\  _ __ _ __| (_)__ _ _ _  __ ___      =
=   / _  | '_ \ | | (_) | '_ \/ -_) ' \| |) | _ \  / _ \| '_ \ '_ \ | / _  | ' \/ _/ -_)     =
=   \__,_|_.__/_|  \___/| .__/\___|_||_|___/|___/ /_/ \_\ .__/ .__/_|_\__,_|_||_\__\___|     =
=                       |_|                             |_|  |_|                             =
=                                                                                            =
=                                                                                            =
=      Please make a selection from the menu below (type 'q' to exit, 'b' to go back):       =
=                                                                                            =
=                                                                                            =
=     The following homes are available for deploying instances on:                          =
=                                                                                            =
=                                                                                            =
=     pg965:/u01/app/opendb/product/PG96/db_5/:dummy:9999:D                                  =
=     PG959:/u01/app/opendb/product/PG95/db_9/:dummy:9999:D                                  =
=     PG959_1:/u01/app/opendb/product/PG95/db_9_0:dummy:9999:D                               =
=     PG965_1:/u01/app/opendb/product/PG96/db_5_0:dummy:9999:D                               =
=                                                                                            =
=                                                                                            =
==============================================================================================
 
 Your input please: 

Here you can see that you can have multiple homes of the same release (two for PostgreSQL 9.6.5 and two for PostgreSQL 9.5.9 in this case). The path and naming for a home follow our best practices and are generated automatically. Having the homes you can start deploying you instances:

==============================================================================================
=                                                                                            =
=                                                                                            =
=       _ _    _    ___                 ___  ___     _             _ _                       =
=    __| | |__(_)  / _ \ _ __  ___ _ _ |   \| _ )   /_\  _ __ _ __| (_)__ _ _ _  __ ___      =
=   / _  | '_ \ | | (_) | '_ \/ -_) ' \| |) | _ \  / _ \| '_ \ '_ \ | / _  | ' \/ _/ -_)     =
=   \__,_|_.__/_|  \___/| .__/\___|_||_|___/|___/ /_/ \_\ .__/ .__/_|_\__,_|_||_\__\___|     =
=                       |_|                             |_|  |_|                             =
=                                                                                            =
=                                                                                            =
=      Please make a selection from the menu below (type 'q' to exit, 'b' to go back):       =
=                                                                                            =
=                                                                                            =
=     Please specify an alias for your new instance                                          =
=       The alias needs to be at least 4 characters                                          =
=       The alias needs to be at most  8 characters                                          =
=                                                                                            =
=                                                                                            =
=                                                                                            =
==============================================================================================
 
 Your input please: MYINST1 

What happens in the background then is that the PostgreSQL cluster is initialized, started and added to the auto start configuration (systemd) so that the instance will properly shutdown when the appliance is stopped and comes up when the appliance is started. Listing the deployed instances is possible, too, of course:

==============================================================================================
=                                                                                            =
=                                                                                            =
=       _ _    _    ___                 ___  ___     _             _ _                       =
=    __| | |__(_)  / _ \ _ __  ___ _ _ |   \| _ )   /_\  _ __ _ __| (_)__ _ _ _  __ ___      =
=   / _  | '_ \ | | (_) | '_ \/ -_) ' \| |) | _ \  / _ \| '_ \ '_ \ | / _  | ' \/ _/ -_)     =
=   \__,_|_.__/_|  \___/| .__/\___|_||_|___/|___/ /_/ \_\ .__/ .__/_|_\__,_|_||_\__\___|     =
=                       |_|                             |_|  |_|                             =
=                                                                                            =
=                                                                                            =
=      Please make a selection from the menu below (type 'q' to exit, 'b' to go back):       =
=                                                                                            =
=                                                                                            =
=     The following instances are currently deployed:                                        =
=                                                                                            =
=                                                                                            =
=     MYINST1:/u01/app/opendb/product/PG96/db_5/:/u02/opendb/pgdata/MYINST1:5432:Y           =
=                                                                                            =
=                                                                                            =
==============================================================================================
 
 Your input please: 

The cronjobs for monitoring, alerting and backup have been created as well:

[opendb@opendb ~]$ crontab -l
00 01 * * * /u01/app/opendb/local/dmk/dmk_postgres/bin/dmk-pg-dump.sh -s MYINST1 -t /u04/opendb/pgdata/MYINST1/dumps >/dev/null 2>&1
58 00 * * * /u01/app/opendb/local/dmk/dmk_postgres/bin/dmk-pg-badger-reports.sh -s MYINST1 >/dev/null 2>&1
*/10 * * * * /u01/app/opendb/local/dmk/dmk_postgres/bin/dmk-check-postgres.sh -s MYINST1 -m  >/dev/null 2>&1

With every new release/update of the appliance we plan to include more products such as MariaDB/MongoDB/Cassandra, provide patch sets for the existing ones and update the Linux operating system. Updates will be delivered as tarballs and the command line utility will take care of the rest, you do not need to worry about that. You can expect updates twice a year.

To visualize this:
OpenDB-big-picture

/u02 will hold all the files that contain your user data. /u03 is there for redo/wal/binlog where required and /u04 is for holding the backups. This is fixed and must not be changed. Independent of which product you choose to deploy you’ll get a combination of pcp (Performance Co-Pilot) and vector to do real time performance monitoring of the appliance (of course configured automatically).

Alerting will be done by a combination of third party (open source) projects and DMK. The tools we’ll use for PostgreSQL will be check_postgres and pgbadger, for example. For the other products we’ll announce what we will use when it will be included in a future release.

In addition to the VMWare template you can have the appliance also in the Hidora Cloud as a pay as you go service (although that is not fully ready).

If you have any questions just send as an email to: opendb[at]dbi-services[dot]com

 

Cet article Announcing the dbi OpenDB Appliance est apparu en premier sur Blog dbi services.

And finally it is there…PostgreSQL 10

PostgreSQL Index Suggestion With Powa

$
0
0

A few time ago my colleague Daniel did a blog about POWA. In a nice article he shown how this tool can be used to monitor our PostgreSQL.
In this present article I am going to show how this powerful tool can help by suggesting indexes which can optimize our queries.
I am using postgeSQL 9.6

[root@pgservertools extension]# yum install postgresql96-server.x86_64
[root@pgservertools extension]# yum install postgresql96-contrib.x86_64

And Then I initialize a cluster

[root@pgservertools extension]# /usr/pgsql-9.6/bin/postgresql96-setup initdb
Initializing database ... OK

POWA require following extensions:
pg_qualstats: gathers statistics on predicates found in WHERE statements and JOIN clauses
pg_stat_kcache : gathers statistics about real reads and writes done by the filesystem layer
hypopg : extension adding hypothetical indexes in PostgreSQL. This extension can be used to see if PostgreSQL will use the index or no
btree_gist : provides GiST index operator classes that implement B-tree equivalent behavior for various data types
powa_web : will provide access to powa via a navigator

Just we will note that following packages are installed to resolve some dependencies during the installation of these extensions.

yum install python-backports-ssl_match_hostname.noarch
rpm -ivh python-tornado-2.2.1-8.el7.noarch.rpm
rpm -ivh python-tornado-2.2.1-8.el7.noarch.rpm

And then extensions are installed using yum

yum install powa_96.x86_64 pg_qualstats96.x86_64 pg_stat_kcache96.x86_64 hypopg_96.x86_64 powa_96-web.x86_64

After the installation the postgresql.conf is modified to load the extensions

[root@pgservertools data]# grep shared_preload_libraries postgresql.conf | grep -v ^#
shared_preload_libraries = 'pg_stat_statements,powa,pg_stat_kcache,pg_qualstats' # (change requires restart)
[root@pgservertools data]#

And then restart the PostgreSQL

[root@pgservertools data]# systemctl restart postgresql-9.6.service

For POWA configuration, the first step is to create a user for powa

postgres=# CREATE ROLE powa SUPERUSER LOGIN PASSWORD 'root';
CREATE ROLE

and the repository database we will use.

postgres=# create database powa;
CREATE DATABASE

The extensions must be created in the repository database and in all databases we want to monitor

postgres=#\c powa
powa=# CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION
powa=# CREATE EXTENSION btree_gist;
CREATE EXTENSION
powa=# CREATE EXTENSION powa;
CREATE EXTENSION
powa=# CREATE EXTENSION pg_qualstats;
CREATE EXTENSION
powa=# CREATE EXTENSION pg_stat_kcache;
CREATE EXTENSION
powa=# CREATE EXTENSION hypopg;
CREATE EXTENSION

We can verify that extensions are loaded in the database using

powa=# \dx
List of installed extensions
Name | Version | Schema | Description
--------------------+---------+------------+-----------------------------------------------------------
btree_gist | 1.2 | public | support for indexing common datatypes in GiST
hypopg | 1.1.0 | public | Hypothetical indexes for PostgreSQL
pg_qualstats | 1.0.2 | public | An extension collecting statistics about quals
pg_stat_kcache | 2.0.3 | public | Kernel statistics gathering
pg_stat_statements | 1.4 | public | track execution statistics of all SQL statements executed
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
powa | 3.1.1 | public | PostgreSQL Workload Analyser-core
(7 rows)
powa=#

Now let’s create a database named mydb for our tests and let’s create all extensions inside the database.

[postgres@pgservertools ~]$ psql
psql (9.6.5)
Type "help" for help.
postgres=# create database mydb;
CREATE DATABASE
postgres=#

Let’s again verify extensions into the database mydb

mydb=# \dx
List of installed extensions
Name | Version | Schema | Description
--------------------+---------+------------+-----------------------------------------------------------
btree_gist | 1.2 | public | support for indexing common datatypes in GiST
hypopg | 1.1.0 | public | Hypothetical indexes for PostgreSQL
pg_qualstats | 1.0.2 | public | An extension collecting statistics about quals
pg_stat_kcache | 2.0.3 | public | Kernel statistics gathering
pg_stat_statements | 1.4 | public | track execution statistics of all SQL statements executed
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
powa | 3.1.1 | public | PostgreSQL Workload Analyser-core
(7 rows)
mydb=#

In mydb database we create a table mytab and insert in it some rows

mydb=# \d mytab
Table "public.mytab"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
val | text |
.
mydb=# select count(*) from mytab;
count
-----------
100000000
(1 row)

The last step is to configure the powa-web configuration file. Below is our file

[root@pgservertools etc]# pwd
/etc
[root@pgservertools etc]# cat powa-web.conf
servers={
'main': {
'host': 'localhost',
'port': '5432',
'database': 'powa',
'query': {'client_encoding': 'utf8'}
}
}
cookie_secret="secret"
[root@pgservertools etc]#

And then powa-beb can be started by following command

[root@pgservertools etc]# powa-web &
[1] 5600
[root@pgservertools etc]# [I 171006 13:54:42 powa-web:12] Starting powa-web on http://0.0.0.0:8888

We can now log with the user powa we created at http://localhost:8888/
powa1

And then we can choose mydb database to monitor it
powa2

Now let’s run some queries. As my load is very low I set my pg_qualstats.sample_rate=1 in the postgresql.conf file (thanks to Julien Rouhaud)

[postgres@pgservertools data]$ grep pg_qualstats.sample_rate postgresql.conf
pg_qualstats.sample_rate = 1


mydb=# select * from mytab where id in (75,25,2014,589);
id | val
------+-----------
25 | line 25
75 | line 75
589 | line 589
2014 | line 2014
(4 rows)

Time: 9472.525 ms
mydb=#

Using the tab Index suggestions, we click on Optimize the database. We can see that an index creation is recommended with the potential gain.
powa3
powa4
powa5
We will just note that PostgreSQL uses the extension hypopg to see if the index will be used or no. Let’s see how this extension works. Hypothetical indexes are useful to know if specific indexes can increase performance of a query. They do not cost CPU as they don’t exist.
Let’s create a virtual index in mydb database

mydb=# select * from hypopg_create_index('create index on mytab (id)');
indexrelid | indexname
------------+-----------------------
55799 | btree_mytab_id
(1 row)
mydb=#

We can verify the existence of the virtual index by

mydb=# SELECT * FROM hypopg_list_indexes();
indexrelid | indexname | nspname | relname | amname
------------+-----------------------+---------+---------+--------
55799 | btree_mytab_id | public | mytab | btree
(1 row)

Using explain, we can see that PostgreSQL will use the index.

mydb=# explain select * from mytab where id in (75,25,2014,589);
QUERY PLAN
-------------------------------------------------------------------------------------
Index Scan using btree_mytab_id on mytab (cost=0.07..20.34 rows=4 width=17)
Index Cond: (id = ANY ('{75,25,2014,589}'::integer[]))
(2 rows)

Just not that explain analyze will not use the virtual index
Conclusion
In this article we see how POWA can help for optimizing our PostgreSQL database.

References: https://pgxn.org/dist/hypopg/; http://powa.readthedocs.io/en/latest/

 

Cet article PostgreSQL Index Suggestion With Powa est apparu en premier sur Blog dbi services.


2017.pgconf.eu started, and look who is there …

$
0
0

So, finally 2017.pgconf.eu started today and while checking the schedule I noticed something that I could not even imagine some years ago. There is a session from Microsoft: Latest update on Azure managed service for PostgreSQL. Personally I really like to see that Microsoft is more and more present at Open Source conferences and starts engaging in the community. But, of course this is not the only session that will be interesting. There is an impressive list of sessions from well known community members and hackers, full list here.

Selection_031

There are plenty of speaker interviews you might want to read as an introduction, here.

A final highlight this evening will be the EDB Postgres Rocks Cafe.

Selection_030

I am already curious who will be there and what discussions we’ll have. Not much more to tell right now, stay tuned.

 

Cet article 2017.pgconf.eu started, and look who is there … est apparu en premier sur Blog dbi services.

2017.pgconf.eu, some impressions

$
0
0

After we survived the EDB Postgres Rocks cafe on Tuesday, Wednesday was packed with interesting sessions. Especially listening to Robert Haas is always fun and interesting. Getting information directly from the people who work on the core code is one of the beauties of the PostgreSQL community. Several other core developers had sessions as well, and all of them were great. Thanks for all of that.

Selection_031

On Thursday morning, finally, Jan (EDB) and me had the pleasure to talk about “What happens when 30 years of Oracle experience hit PostgreSQL”. As far as I can tell the session was well accepted and we had interesting discussions afterwards. The main goal was to highlight that working in the PostgreSQL area can be very confusing at the very beginning when your only background is Oracle. Seems we hit the goal and the people attending had fun.

DNDPk9zWkAA9fnu

A very big thanks to the organizers of the event: Everything, from the registration, the rooms, the food, the drinks and of the course the sessions was great. I do not doubt that next year will be great as well.

Another big thanks to the EDB people (especially Anja and Jan) who let me drop my jacket and notebook at their booth when required. Another big thanks to Devrim for announcing the rpm packages for SLES 12 during the lightning talks which is what I need for a customer project.

Once uploaded all the slides should be available on the PostgreSQL wiki. Check them out, there is really great content.

Btw: There are interesting choices of beer in Poland:
DNFwHGIXkAArS4N.jpg:large

 

Cet article 2017.pgconf.eu, some impressions est apparu en premier sur Blog dbi services.

Are temporary tables auto vacuumed in PostgreSQL?

$
0
0

While doing the EDB quiz at their booth last week at pgconfeu one of the questions was: Are temporary tables auto vacuumed? What do you think? My first thought was yes, but lets see. The first question we need to answer is: How can we check if a table (no matter if temporary or not for now) was auto vacuumed or not? PostgreSQL comes with many views that expose statistical information and one of those is pg_stat_all_tables. Lets have a look …

When you describe that view there is column named “last_autovacuum”:

postgres=# \d pg_stat_all_tables 
                      View "pg_catalog.pg_stat_all_tables"
       Column        |           Type           | Collation | Nullable | Default 
---------------------+--------------------------+-----------+----------+---------
 relid               | oid                      |           |          | 
 schemaname          | name                     |           |          | 
 relname             | name                     |           |          | 
 seq_scan            | bigint                   |           |          | 
 seq_tup_read        | bigint                   |           |          | 
 idx_scan            | bigint                   |           |          | 
 idx_tup_fetch       | bigint                   |           |          | 
 n_tup_ins           | bigint                   |           |          | 
 n_tup_upd           | bigint                   |           |          | 
 n_tup_del           | bigint                   |           |          | 
 n_tup_hot_upd       | bigint                   |           |          | 
 n_live_tup          | bigint                   |           |          | 
 n_dead_tup          | bigint                   |           |          | 
 n_mod_since_analyze | bigint                   |           |          | 
 last_vacuum         | timestamp with time zone |           |          | 
 last_autovacuum     | timestamp with time zone |           |          | 
 last_analyze        | timestamp with time zone |           |          | 
 last_autoanalyze    | timestamp with time zone |           |          | 
 vacuum_count        | bigint                   |           |          | 
 autovacuum_count    | bigint                   |           |          | 
 analyze_count       | bigint                   |           |          | 
 autoanalyze_count   | bigint                   |           |          | 

That should give us the time of the last autovacuum, right? Before we begin, here are my autovacuum settings which are all at their defaults:

postgres=# select name,setting from pg_settings where name like '%autovacuum%' order by 1;
                name                 |  setting  
-------------------------------------+-----------
 autovacuum                          | on
 autovacuum_analyze_scale_factor     | 0.1
 autovacuum_analyze_threshold        | 50
 autovacuum_freeze_max_age           | 200000000
 autovacuum_max_workers              | 3
 autovacuum_multixact_freeze_max_age | 400000000
 autovacuum_naptime                  | 60
 autovacuum_vacuum_cost_delay        | 20
 autovacuum_vacuum_cost_limit        | -1
 autovacuum_vacuum_scale_factor      | 0.2
 autovacuum_vacuum_threshold         | 50
 autovacuum_work_mem                 | -1
 log_autovacuum_min_duration         | -1
(13 rows)

That means autovacuum should kick in as soon as we change 50 rows in a table because autovacuum_vacuum_threshold is set to 50? The table:

postgres=# create table t1 (a int, b varchar(50));
CREATE TABLE
postgres=# insert into t1 (a,b) select a, md5(a::varchar) from generate_series ( 1, 1000000 ) a;
INSERT 0 1000000
postgres=# select count(*) from t1;
  count  
---------
 1000000
(1 row)

As soon as we change 50 or more rows we should see the last_autovacuum column updated in pg_stat_all_tables, so lets check:

postgres=# update t1 set a = a + 1 where a < 1000;
UPDATE 999
postgres=# select pg_sleep(10);
 pg_sleep 
----------
 
(1 row)
postgres=# select relname,last_autovacuum from pg_stat_all_tables where relname = 't1';
 relname | last_autovacuum 
---------+-----------------
 t1      | 
(1 row)

Hm, not really what was expected. When you check the documentation there is a formula we need to consider for our test, which is

vacuum threshold = autovacuum_vacuum_threshold +  autovacuum_vacuum_scale_factor * pg_class.reltuples

In our case that is:

postgres=# show autovacuum_vacuum_threshold;
 autovacuum_vacuum_threshold 
-----------------------------
 50
(1 row)

postgres=# show autovacuum_vacuum_scale_factor;
 autovacuum_vacuum_scale_factor 
--------------------------------
 0.2
(1 row)

postgres=# select reltuples::int from pg_class where relname = 't1';
 reltuples 
-----------
   1000000
(1 row)

postgres=# select 50 + 0.2 * 1000000;
 ?column? 
----------
 200050.0
(1 row)

This means we need to change at least 200050 rows to get autovacuum kicked in?

postgres=# update t1 set a = a + 1;
UPDATE 1000000

That should be fine as we updated all the rows in the table which is way more than 200050:

postgres=# select relname,last_autovacuum from pg_stat_all_tables where relname = 't1';
 relname |        last_autovacuum        
---------+-------------------------------
 t1      | 2017-10-31 07:40:56.553194+01
(1 row)

… and here we go. Now, as we know how to check that on a real table we can do the same test on temporary table:

postgres=# create temporary table tt1 as select * from t1;
SELECT 1000000
postgres=# update tt1 set a = a + 1;
UPDATE 1000000
postgres=# select relname,last_autovacuum from pg_stat_all_tables where relname = 'tt1';
 relname | last_autovacuum 
---------+-----------------
 tt1     | 
(1 row)

There is one point to consider: There is the parameter autovacuum_naptime which defaults to one minute so it might take some time until the autovacuum really did its work. But even when you wait for 10 minutes you’ll not see the last_autovacuum updated in pg_stat_all_tables for a temporary table. So, the answer is: No. There is no autovacuum on temporary tables but of course you can still do that manually:

postgres=# select relname,last_autovacuum, last_vacuum from pg_stat_all_tables where relname = 'tt1';
 relname | last_autovacuum |          last_vacuum          
---------+-----------------+-------------------------------
 tt1     |                 | 2017-10-31 07:50:58.041813+01
(1 row)

The same is true for the statistics used by the planner, you might need to analyze your temporary table manually:

postgres=# select last_analyze, last_autoanalyze from pg_stat_all_tables where relname = 'tt1';
 last_analyze | last_autoanalyze 
--------------+------------------
              | 
(1 row)

postgres=# analyze tt1;
ANALYZE
postgres=# select last_analyze, last_autoanalyze from pg_stat_all_tables where relname = 'tt1';
         last_analyze          | last_autoanalyze 
-------------------------------+------------------
 2017-10-31 07:52:27.690117+01 | 
(1 row)

Btw: This is clearly written in the documentation: “Temporary tables cannot be accessed by autovacuum. Therefore, appropriate vacuum and analyze operations should be performed via session SQL commands.”

Hope this helps …

 

Cet article Are temporary tables auto vacuumed in PostgreSQL? est apparu en premier sur Blog dbi services.

Are large objects supported in PostgreSQL 10 logical replication?

$
0
0

Another interesting topic that popped up last week during pgconfeu: Are large objects supported with logical replication in PostgreSQL 10? The only truth is a test, isn’t it? Lets go…

Obviously we need a table containing same large objects to start with:

postgres=# create table t1 ( a int, b oid);
CREATE TABLE

Before inserting some data lets create a publication for that table right now:

postgres=# create publication my_pub for table t1;
CREATE PUBLICATION

Ok, that works. Now we need a subscription for that, so on a second instance:

postgres=# create table t1 ( a int, b oid);
CREATE TABLE
postgres=# create subscription my_sub connection 'host=localhost port=6000 dbname=postgres user=postgres' publication my_pub;
CREATE SUBSCRIPTION

So far, so good. Lets insert some data on the publishing instance and see what happens:

postgres=# \! which cp
/usr/bin/cp
postgres=# insert into t1 (a,b) values (1, lo_import('/usr/bin/cp'));
INSERT 0 1

That worked. What do we see on the subscription side?

postgres=# select * from t1;
 a |   b   
---+-------
 1 | 16418
(1 row)

postgres=# select * from pg_size_pretty ( pg_relation_size ( 't1' ) );
 pg_size_pretty 
----------------
 8192 bytes
(1 row)

So, at least “something” is there. Lets prove it:

postgres=# select lo_export(b,'/tmp/cp') from t1;
ERROR:  large object 16418 does not exist
postgres=# 

Hm, this is not what was expected, right? Doing the same on the publishing side works:

postgres=# select lo_export(b,'/tmp/cp') from t1;
 lo_export 
-----------
         1
(1 row)

postgres=# \! chmod +x /tmp/cp
postgres=# \! /tmp/cp --help | head -1
Usage: /tmp/cp [OPTION]... [-T] SOURCE DEST

This means the OID is replicated but not the large object itself. So the answer is: No, large objects can not be used with PostgreSQL 10 logical replication.

 

Cet article Are large objects supported in PostgreSQL 10 logical replication? est apparu en premier sur Blog dbi services.

Can I have the same table published and subscribed (bi-directional) in PostgreSQL 10 logical replication?

$
0
0

When you start using PostgreSQL 10 logical replication you might think it is a good idea to setup bi-directional replication so you end up with two or more masters that are all writable. I will not go into the details of multi master replication here (conflict resolution, …) but will show what happens when you try to do that. Lets go …

My two instances run on the same host, one on port 6000 the other one on 6001. To start I’ll create the same table in both instances:

postgres=# create table t1 ( a int primary key, b varchar(50) );
CREATE TABLE
postgres=# alter table t1 replica identity using INDEX t1_pkey;
ALTER TABLE
postgres=# \d+ t1
                                            Table "public.t1"
 Column |         Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer               |           | not null |         | plain    |              | 
 b      | character varying(50) |           |          |         | extended |              | 
Indexes:
    "t1_pkey" PRIMARY KEY, btree (a) REPLICA IDENTITY

Create the same publication on both sides:

postgres=# create publication my_pub for table t1;
CREATE PUBLICATION
postgres=# select * from pg_publication;
 pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete 
---------+----------+--------------+-----------+-----------+-----------
 my_pub  |       10 | f            | t         | t         | t
(1 row)
postgres=# select * from pg_publication_tables;
 pubname | schemaname | tablename 
---------+------------+-----------
 my_pub  | public     | t1
(1 row)

Create the same subscription on both sides (except for the port, of course):

postgres=# show port;
 port 
------
 6000
(1 row)
ppostgres=# create subscription my_sub connection 'host=localhost port=6001 dbname=postgres user=postgres' publication my_pub;
CREATE SUBSCRIPTION
postgres=# select * from pg_subscription;
 subdbid | subname | subowner | subenabled |                      subconninfo                       | subslotname | 
---------+---------+----------+------------+--------------------------------------------------------+-------------+-
   13212 | my_sub  |       10 | t          | host=localhost port=6001 dbname=postgres user=postgres | my_sub      | 
(1 row)


### second instance

postgres=# show port;
 port 
------
 6001
(1 row)

postgres=# create subscription my_sub connection 'host=localhost port=6000 dbname=postgres user=postgres' publication my_pub;
CREATE SUBSCRIPTION
postgres=# select * from pg_subscription;
 subdbid | subname | subowner | subenabled |                      subconninfo                       | subslotname | 
---------+---------+----------+------------+--------------------------------------------------------+-------------+-
   13212 | my_sub  |       10 | t          | host=localhost port=6000 dbname=postgres user=postgres | my_sub      | 
(1 row)

So far, so good, everything worked until now. Now lets insert a row in the first instance:

postgres=# insert into t1 (a,b) values (1,'a');
INSERT 0 1
postgres=# select * from t1;
 a | b 
---+---
 1 | a
(1 row)

That seemed to worked as well as the row is there on the second instance as well:

postgres=# show port;
 port 
------
 6001
(1 row)

postgres=# select * from t1;
 a | b 
---+---
 1 | a
(1 row)

But: When you take a look at the log file of the first instance you’ll see something like this (which is repeated over and over again):

2017-11-03 09:56:29.176 CET - 2 - 10687 -  - @ ERROR:  duplicate key value violates unique constraint "t1_pkey"
2017-11-03 09:56:29.176 CET - 3 - 10687 -  - @ DETAIL:  Key (a)=(1) already exists.
2017-11-03 09:56:29.178 CET - 29 - 10027 -  - @ LOG:  worker process: logical replication worker for subscription 16437 (PID 10687) exited with exit code 1
2017-11-03 09:56:34.198 CET - 1 - 10693 -  - @ LOG:  logical replication apply worker for subscription "my_sub" has started

Now the second instance is constantly trying to insert the same row back to the first instance and that obviously can not work as the row is already there. So the answer to the original question: Do not try to do that, it will not work anyway.

 

Cet article Can I have the same table published and subscribed (bi-directional) in PostgreSQL 10 logical replication? est apparu en premier sur Blog dbi services.

Viewing all 526 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>