Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 1 week 5 days ago

How SQL Server MVCC compares to Oracle and PostgreSQL

Sun, 2020-02-09 13:42
By Franck Pachot

.
Microsoft SQL Server has implemented MVCC in 2005, which has been proven to be the best approach for transaction isolation (the I in ACID) in OLTP. But are you sure that writers do not block readers with READ_COMMITTED_SNAPSHOT? I’ll show here that some reads are still blocked by locked rows, contrary to the precursors of MVCC like PostgreSQL and Oracle.

For this demo, I run SQL Server 2019 RHEL image on docker in an Oracle Cloud compute running OEL7.7 as explained in the previous post. If you don’t have the memory limit mentioned, you can simply run:

docker run -d -e "ACCEPT_EULA=Y" -e 'MSSQL_PID=Express' -p 1433:1433 -e 'SA_PASSWORD=**P455w0rd**' --name mssql mcr.microsoft.com/mssql/rhel/server:2019-latest
time until docker logs mssql | grep -C10 "Recovery is complete." ; do sleep 1 ; done

Test scenario description

Here is what I’ll run in a first session:

  1. create a DEMO database
  2. (optional) set MVCC with Read Commited Snapshot isolation level
  3. create a DEMO table with two rows. One with “a”=1 and one with “a”=2
  4. (optional) build an index on column “a”
  5. update the first line where “a”=1


cat > session1.sql <<'SQL'
drop database if exists DEMO;
create database DEMO;
go
use DEMO;
go
-- set MVCC to read snapshot rather than locked current --
-- alter database DEMO set READ_COMMITTED_SNAPSHOT on;
go
drop table if exists DEMO;
create table DEMO(id int primary key, a int not null, b int);
begin transaction;
insert into DEMO values(1,1,1);
insert into DEMO values(2,2,2);
commit;
go
select * from DEMO;
go
-- index to read only rows that we want to modify --
-- create index DEMO_A on DEMO(a);
go
begin transaction;
update DEMO set b=b+1 where a=1;
go
SQL

I’ll run it in the background (you can also run it in another terminal) where it waits 60 seconds before quitting:

( cat session1.sql ; sleep 60 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

[root@instance-20200208-1719 ~]# ( cat session1.sql ; sleep 60 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &
[1] 27759
[root@instance-20200208-1719 ~]# Feb 09 17:05:43 drop database if exists DEMO;
Feb 09 17:05:43 create database DEMO;
Feb 09 17:05:43
Feb 09 17:05:43 use DEMO;
Feb 09 17:05:43
Feb 09 17:05:43 Changed database context to 'DEMO'.
Feb 09 17:05:43 -- set MVCC to read snapshot rather than locked current --
Feb 09 17:05:43 -- alter database DEMO set READ_COMMITTED_SNAPSHOT on;
Feb 09 17:05:43
Feb 09 17:05:43 drop table if exists DEMO;
Feb 09 17:05:43 create table DEMO(id int primary key, a int not null, b int);
Feb 09 17:05:43 begin transaction;
Feb 09 17:05:43 insert into DEMO values(1,1,1);
Feb 09 17:05:43 insert into DEMO values(2,2,2);
Feb 09 17:05:43 commit;
Feb 09 17:05:43
Feb 09 17:05:43
Feb 09 17:05:43 (1 rows affected)
Feb 09 17:05:43
Feb 09 17:05:43 (1 rows affected)
Feb 09 17:05:43 select * from DEMO;
Feb 09 17:05:43
Feb 09 17:05:43 id          a           b
Feb 09 17:05:43 ----------- ----------- -----------
Feb 09 17:05:43           1           1           1
Feb 09 17:05:43           2           2           2
Feb 09 17:05:43
Feb 09 17:05:43 (2 rows affected)
Feb 09 17:05:43 -- index to read only rows that we want to modify --
Feb 09 17:05:43 -- create index DEMO_A on DEMO(a);
Feb 09 17:05:43
Feb 09 17:05:43 begin transaction;
Feb 09 17:05:43 update DEMO set b=b+1 where a=1;
Feb 09 17:05:43
Feb 09 17:05:43
Feb 09 17:05:43 (1 rows affected)
SQL Server default

While this session has locked the first row I’ll run the following, reading the same row that is currently locked by the other transaction:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO | ts
-- read access the row that is not locked
select * from DEMO where a=2;
go

This hangs until the first transaction is canceled:

[root@instance-20200208-1719 ~]# docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO | ts
-- read access the row that is not locked
 select * from DEMO where a=2;
go
Feb 09 17:06:42
Feb 09 17:06:42
Feb 09 17:06:42
Feb 09 17:06:42 Sqlcmd: Warning: The last operation was terminated because the user pressed CTRL+C.
Feb 09 17:06:42
Feb 09 17:06:42 -- read access the row that is not locked
Feb 09 17:06:42  select * from DEMO where a=2;
Feb 09 17:06:42
Feb 09 17:06:42 id          a           b
Feb 09 17:06:42 ----------- ----------- -----------
Feb 09 17:06:42           2           2           2
Feb 09 17:06:42
Feb 09 17:06:42 (1 rows affected)

The “Sqlcmd: Warning: The last operation was terminated because the user pressed CTRL+C” message is fron the first session and only then my foreground session was able to continue. This is the worst you can encounter with the default isolation level in SQL Server where writes and reads are blocking each other even when not touching the same row (I read the a=2 row and only the a=1 one was locked). The reason for this is that I have no index for this predicate and I have to read all rows in order to find mine:

set showplan_text on ;
go
select * from DEMO where a=2;
go

go
Feb 09 17:07:24 set showplan_text on ;
Feb 09 17:07:24
select * from DEMO where a=2;
go
Feb 09 17:07:30 select * from DEMO where a=2;
Feb 09 17:07:30
Feb 09 17:07:30 StmtText
Feb 09 17:07:30 -------------------------------
Feb 09 17:07:30 select * from DEMO where a=2;
Feb 09 17:07:30
Feb 09 17:07:30 (1 rows affected)
Feb 09 17:07:30 StmtText
Feb 09 17:07:30 ---------------------------------------------------------------------------------------------------------------------------------------------------
Feb 09 17:07:30   |--Clustered Index Scan(OBJECT:([DEMO].[dbo].[DEMO].[PK__DEMO__3213E83F2AD8547F]), WHERE:([DEMO].[dbo].[DEMO].[a]=CONVERT_IMPLICIT(int,[@1],0)))
Feb 09 17:07:30
Feb 09 17:07:30 (1 rows affected)

Now, in order to avoid this situation, I’ll run the same but with an index on column “a”.
It was commented out in the session1.sql script and then I just re-ren everything without those comments:

( sed -e '/create index/s/--//' session1.sql ; sleep 60 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

I’m running the same, now with a 3 seconds timeout so that I don’t have to wait for my background session to terminate:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts
-- read access the row that is not locked
select * from DEMO where a=2;
go

[root@instance-20200208-1719 ~]# docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts
-- read access the row that is not locked
 select * from DEMO where a=2;
 go
Feb 09 17:29:25 -- read access the row that is not locked
Feb 09 17:29:25  select * from DEMO where a=2;
Feb 09 17:29:25
Feb 09 17:29:25 Timeout expired

Here I’m blocked again like in the previous scenario because the index was not used.
I can force the index access with an hint:

-- read access the row that is not locked forcing index access
select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
go

-- read access the row that is not locked forcing index access
 select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
 go
Feb 09 17:29:30 -- read access the row that is not locked forcing index access
Feb 09 17:29:30  select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
Feb 09 17:29:30
Feb 09 17:29:30 id          a           b
Feb 09 17:29:30 ----------- ----------- -----------
Feb 09 17:29:30           2           2           2
Feb 09 17:29:30
Feb 09 17:29:30 (1 rows affected)

This didn’t wait because the index access didn’t have to to to the locked row.

However, when I read the same row that is concurently locked I have to wait:

-- read access the row that is locked
select * from DEMO where a=1;
go

 -- read access the row that is locked
 select * from DEMO where a=1;
 go
Feb 09 17:29:34  -- read access the row that is locked
Feb 09 17:29:34  select * from DEMO where a=1;
Feb 09 17:29:34
Feb 09 17:29:34 Timeout expired

Here is the confirmation that the index was used only with the hint:

set showplan_text on ;
go
select * from DEMO where a=2;
go
select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
go

Feb 09 17:29:50 set showplan_text on ;
Feb 09 17:29:50
 select * from DEMO where a=2;
 go
Feb 09 17:29:50  select * from DEMO where a=2;
Feb 09 17:29:50
Feb 09 17:29:50 StmtText
Feb 09 17:29:50 --------------------------------
Feb 09 17:29:50  select * from DEMO where a=2;
Feb 09 17:29:50
Feb 09 17:29:50 (1 rows affected)
Feb 09 17:29:50 StmtText
Feb 09 17:29:50 --------------------------------------------------------------------------------------------------------------------------
Feb 09 17:29:50   |--Clustered Index Scan(OBJECT:([DEMO].[dbo].[DEMO].[PK__DEMO__3213E83F102B4054]), WHERE:([DEMO].[dbo].[DEMO].[a]=(2)))
Feb 09 17:29:50
Feb 09 17:29:50 (1 rows affected)
 select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
 go
Feb 09 17:29:52  select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
Feb 09 17:29:52
Feb 09 17:29:52 StmtText
Feb 09 17:29:52 -----------------------------------------------------
Feb 09 17:29:52  select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
Feb 09 17:29:52
Feb 09 17:29:52 (1 rows affected)
Feb 09 17:29:52 StmtText                                                                                                                                                
Feb 09 17:29:52 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Feb 09 17:29:52   |--Nested Loops(Inner Join, OUTER REFERENCES:([DEMO].[dbo].[DEMO].[id]))                                                                              
Feb 09 17:29:52        |--Index Seek(OBJECT:([DEMO].[dbo].[DEMO].[DEMO_A]), SEEK:([DEMO].[dbo].[DEMO].[a]=(2)) ORDERED FORWARD)                                         
Feb 09 17:29:52        |--Clustered Index Seek(OBJECT:([DEMO].[dbo].[DEMO].[PK__DEMO__3213E83F102B4054]), SEEK:([DEMO].[dbo].[DEMO].[id]=[DEMO].[dbo].[DEMO].[id]) LOOKUP ORDERED FORWARD)
Feb 09 17:29:52
Feb 09 17:29:52 (3 rows affected)

So, with de the default isolation level and index access, we can read a row that is not locked. The last query was blocked for the SELECT * FROM DEMO WHERE A=1 because we are in the legacy, and default, mode where readers are blocked by writers.

SQL Server MVCC

In order to improve this situation, Microsoft has implemented MVCC. With it, we do not need to read the current version of the rows (which requires waiting when it is concurrently modified) because the past version of the rows are stored in TEMPDB and we can read a past snapshot of it. Typically, with READ COMMITED SNAPSHOT isolation level, we read a snapshot as-of the point-in-time our query began. 
In general, we need to read all rows from a consistent point in time. This can be the one where our query started, and then while the query is running, a past version may be reconstructed to remove concurrent changes. Or, when there is no MVCC to rebuild this snapshot, this consistent point can only be the one when our query is completed. This means that while we read rows, we must lock them to be sure that they stay the same until the end of our query. Of course, even with MVCC there are cases where we want to read the latest value and then we will lock with something like a SELECT FOR UPDATE. But that’s not the topic here.

I’ll run the same test as the first one, but now have the database with READ_COMMITTED_SNAPSHOT on:

( sed -e '/READ_COMMITTED/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

[root@instance-20200208-1719 ~]# ( sed -e '/READ_COMMITTED/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &
[1] 38943
[root@instance-20200208-1719 ~]# Feb 09 18:21:19 drop database if exists DEMO;
Feb 09 18:21:19 create database DEMO;
Feb 09 18:21:19
Feb 09 18:21:19 use DEMO;
Feb 09 18:21:19
Feb 09 18:21:19 Changed database context to 'DEMO'.
Feb 09 18:21:19 -- set MVCC to read snapshot rather than locked current --
Feb 09 18:21:19  alter database DEMO set READ_COMMITTED_SNAPSHOT on;
Feb 09 18:21:19
Feb 09 18:21:19 drop table if exists DEMO;
Feb 09 18:21:19 create table DEMO(id int primary key, a int not null, b int);
Feb 09 18:21:19 begin transaction;
Feb 09 18:21:19 insert into DEMO values(1,1,1);
Feb 09 18:21:19 insert into DEMO values(2,2,2);
Feb 09 18:21:19 commit;
Feb 09 18:21:19
Feb 09 18:21:19
Feb 09 18:21:19 (1 rows affected)
Feb 09 18:21:19
Feb 09 18:21:19 (1 rows affected)
Feb 09 18:21:19 select * from DEMO;
Feb 09 18:21:19
Feb 09 18:21:19 id          a           b
Feb 09 18:21:19 ----------- ----------- -----------
Feb 09 18:21:19           1           1           1
Feb 09 18:21:19           2           2           2
Feb 09 18:21:19
Feb 09 18:21:19 (2 rows affected)
Feb 09 18:21:19 -- index to read only rows that we want to modify --
Feb 09 18:21:19 -- create index DEMO_A on DEMO(a);
Feb 09 18:21:19
Feb 09 18:21:19 begin transaction;
Feb 09 18:21:19 update DEMO set b=b+1 where a=1;
Feb 09 18:21:19
Feb 09 18:21:19
Feb 09 18:21:19 (1 rows affected)

And then running the same scenario:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts
-- read access the row that is not locked
select * from DEMO where a=2;
go
-- read access the row that is locked
select * from DEMO where a=1;
go
-- write access on the row that is not locked
delete from DEMO where a=2;
go

[root@instance-20200208-1719 ~]# docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts

-- read access the row that is not locked
select * from DEMO where a=2;
go
Feb 09 18:21:36 -- read access the row that is not locked
Feb 09 18:21:36 select * from DEMO where a=2;
Feb 09 18:21:36
Feb 09 18:21:36 id          a           b
Feb 09 18:21:36 ----------- ----------- -----------
Feb 09 18:21:36           2           2           2
Feb 09 18:21:36
Feb 09 18:21:36 (1 rows affected)

-- read access the row that is locked
select * from DEMO where a=1;
go
Feb 09 18:21:47 -- read access the row that is locked
Feb 09 18:21:47 select * from DEMO where a=1;
Feb 09 18:21:47
Feb 09 18:21:47 id          a           b
Feb 09 18:21:47 ----------- ----------- -----------
Feb 09 18:21:47           1           1           1
Feb 09 18:21:47
Feb 09 18:21:47 (1 rows affected)

-- write access on the row that is not locked
delete from DEMO where a=2;
go
Feb 09 18:22:01 -- write access on the row that is not locked
Feb 09 18:22:01 delete from DEMO where a=2;
Feb 09 18:22:01
Feb 09 18:22:01 Timeout expired

Ok, that’s better. I confirm that readers are not blocked by writers. But the modification on “A”=2 was blocked. This is not a writer-writer situation because we are not modifying the row that is locked by the other session. Here, I have no index on “A” and then the delete statement must first read the table and had to read this locked row. And obviously, this read is blocked. It seems that DML must read the current version of the row even when MVCC is available. That means that reads can be blocked by writes when those reads are in a writing transaction.

Last test on SQL Server: the same, with MVCC, and the index on “A”

( sed -e '/READ_COMMITTED/s/--//' -e '/create index/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

[root@instance-20200208-1719 ~]# ( sed -e '/READ_COMMITTED/s/--//' -e '/create index/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &
[1] 40320
[root@instance-20200208-1719 ~]#
[root@instance-20200208-1719 ~]# Feb 09 18:30:15 drop database if exists DEMO;
Feb 09 18:30:15 create database DEMO;
Feb 09 18:30:15
Feb 09 18:30:15 use DEMO;
Feb 09 18:30:15
Feb 09 18:30:15 Changed database context to 'DEMO'.
Feb 09 18:30:15 -- set MVCC to read snapshot rather than locked current --
Feb 09 18:30:15  alter database DEMO set READ_COMMITTED_SNAPSHOT on;
Feb 09 18:30:15
Feb 09 18:30:15 drop table if exists DEMO;
Feb 09 18:30:15 create table DEMO(id int primary key, a int not null, b int);
Feb 09 18:30:15 begin transaction;
Feb 09 18:30:15 insert into DEMO values(1,1,1);
Feb 09 18:30:15 insert into DEMO values(2,2,2);
Feb 09 18:30:15 commit;
Feb 09 18:30:15
Feb 09 18:30:15
Feb 09 18:30:15 (1 rows affected)
Feb 09 18:30:15
Feb 09 18:30:15 (1 rows affected)
Feb 09 18:30:15 select * from DEMO;
Feb 09 18:30:15
Feb 09 18:30:15 id          a           b
Feb 09 18:30:15 ----------- ----------- -----------
Feb 09 18:30:15           1           1           1
Feb 09 18:30:15           2           2           2
Feb 09 18:30:15
Feb 09 18:30:15 (2 rows affected)
Feb 09 18:30:15 -- index to read only rows that we want to modify --
Feb 09 18:30:15  create index DEMO_A on DEMO(a);
Feb 09 18:30:15
Feb 09 18:30:15 begin transaction;
Feb 09 18:30:15 update DEMO set b=b+1 where a=1;
Feb 09 18:30:15
Feb 09 18:30:15
Feb 09 18:30:15 (1 rows affected)

Here is my full scenario to see where it blocks:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3
-- read access the row that is not locked
select * from DEMO where a=2;
go
-- read access the row that is locked
select * from DEMO where a=1;
go
-- write access on the row that is not locked
delete from DEMO where a=2;
go
-- write access on the row that is locked
delete from DEMO where a=1;
go

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3
-- read access the row that is not locked
select * from DEMO where a=2;
go
-- read access the row that is not locked
select * from DEMO where a=2;

id          a           b
----------- ----------- -----------
          2           2           2

(1 rows affected)

-- read access the row that is locked
select * from DEMO where a=1;
go
-- read access the row that is locked
select * from DEMO where a=1;

id          a           b
----------- ----------- -----------
          1           1           1

(1 rows affected)

-- write access on the row that is not locked
delete from DEMO where a=2;
go
-- write access on the row that is not locked
delete from DEMO where a=2;


(1 rows affected)

-- write access on the row that is locked
delete from DEMO where a=1;
go
-- write access on the row that is locked
delete from DEMO where a=1;

Timeout expired

Finally, the only blocking situation here is when I want to write on the same row. The index access reduces the risk of being blocked.

In summary, we can achieve the best concurrency with READ_COMMITTED_SNAPSHOT isolation level, and ensuring that we read only the rows we will update, with proper indexing and maybe hinting. This is, in my opinion, very important to know because we rarely cover those situations during integration tests. But they can happen quickly in production with high load.

PostgreSQL

Let’s do the same with PostgreSQL which is natively MVCC:

cat > session1.sql <<'SQL'
drop database if exists DEMO;
create database DEMO;
\c demo
drop table if exists DEMO;
create table DEMO(id int primary key, a int not null, b int);
begin transaction;
insert into DEMO values(1,1,1);
insert into DEMO values(2,2,2);
commit;
select * from DEMO;
begin transaction;
update DEMO set b=b+1 where a=1;
SQL

No specific settings, and no index created here.

( cat session1.sql ; sleep 120 ; echo "commit;") | psql -e | ts &

-bash-4.2$ ( cat session1.sql ; sleep 120 ; echo "commit;") | psql -e | ts &
[1] 31125
-bash-4.2$
-bash-4.2$ Feb 09 18:42:48 drop database if exists DEMO;
Feb 09 18:42:48 DROP DATABASE
Feb 09 18:42:48 create database DEMO;
Feb 09 18:42:49 CREATE DATABASE
Feb 09 18:42:49 You are now connected to database "demo" as user "postgres".
Feb 09 18:42:49 drop table if exists DEMO;
NOTICE:  table "demo" does not exist, skipping
Feb 09 18:42:49 DROP TABLE
Feb 09 18:42:49 create table DEMO(id int primary key, a int not null, b int);
Feb 09 18:42:49 CREATE TABLE
Feb 09 18:42:49 begin transaction;
Feb 09 18:42:49 BEGIN
Feb 09 18:42:49 insert into DEMO values(1,1,1);
Feb 09 18:42:49 INSERT 0 1
Feb 09 18:42:49 insert into DEMO values(2,2,2);
Feb 09 18:42:49 INSERT 0 1
Feb 09 18:42:49 commit;
Feb 09 18:42:49 COMMIT
Feb 09 18:42:49 select * from DEMO;
Feb 09 18:42:49  id | a | b
Feb 09 18:42:49 ----+---+---
Feb 09 18:42:49   1 | 1 | 1
Feb 09 18:42:49   2 | 2 | 2
Feb 09 18:42:49 (2 rows)
Feb 09 18:42:49
Feb 09 18:42:49 begin transaction;
Feb 09 18:42:49 BEGIN
Feb 09 18:42:49 update DEMO set b=b+1 where a=1;
Feb 09 18:42:49 UPDATE 1

While the transaction updating the first row is in the background, I run the following readers and writers:

psql demo | ts
set statement_timeout=3000;
-- read access the row that is not locked
select * from DEMO where a=2;
-- read access the row that is locked
select * from DEMO where a=1;
-- write access on the row that is not locked
delete from DEMO where a=2;
-- write access on the row that is locked
delete from DEMO where a=1;

-bash-4.2$ psql demo | ts
set statement_timeout=3000;
Feb 09 18:43:00 SET
-- read access the row that is not locked
select * from DEMO where a=2;
Feb 09 18:43:08  id | a | b
Feb 09 18:43:08 ----+---+---
Feb 09 18:43:08   2 | 2 | 2
Feb 09 18:43:08 (1 row)
Feb 09 18:43:08
-- read access the row that is locked
select * from DEMO where a=1;
Feb 09 18:43:16  id | a | b
Feb 09 18:43:16 ----+---+---
Feb 09 18:43:16   1 | 1 | 1
Feb 09 18:43:16 (1 row)
Feb 09 18:43:16
-- write access on the row that is not locked
delete from DEMO where a=2;
Feb 09 18:43:24 DELETE 1
-- write access on the row that is locked
delete from DEMO where a=1;

ERROR:  canceling statement due to statement timeout
CONTEXT:  while deleting tuple (0,1) in relation "demo"

Nothing is blocked except, of course, when modifying the row that is locked.

Oracle Database

One of the many things I’ve learned from Tom Kyte when I was reading AskTom regularly is how to build the simplest test cases. And with Oracle there is no need to run multiple sessions to observe multiple transactions concurrency. I can do it with an autonomous transaction in one session and one advantage is that I can share a dbfiddle example:

Here, deadlock at line 14 means that only the “delete where a=1” encountered a blocking situation with “update where a=1”. All previous statements, select on any row and update of other rows, were executed without conflict.

A DML statement has two phases: one to find the rows and the second one to modify them. A DELETE or UPDATE in Oracle and Postgres runs the first in snapshot mode: non-blocking MVCC. The second must, of course, modify the current version. This is a very complex mechanism because it may require a retry (restart) when the current version does not match the consistent snapshot that was used for filtering. Both PostgreSQL and Oracle can ensure this write consistency without the need to block the reads. SQL Server has implemented MVCC more recently and provides non-blocking reads only for the SELECT reads. But a read can still be in blocking situation for the query phase of an update statement.

Cet article How SQL Server MVCC compares to Oracle and PostgreSQL est apparu en premier sur Blog dbi services.

Running SQL Server on the Oracle Free tier

Sat, 2020-02-08 16:25
By Franck Pachot

The Oracle Cloud is not only for Oracle Database. You can create a VM running Oracle Linux with full root access to it, even in the free tier: a free VM that will be always up, never expires, with full ssh connectivity to a sudoer user, where you are able to tunnel any port. Of course, there are some limits that I’ve detailed in a previous post. But that is sufficient to run a database, given that you configure a low memory usage. For Oracle Database XE, Kamil Stawiarski mentions that you can just hack the memory test in the RPM shell script.
But for Microsoft SQL Server, that’s a bit more complex because this test is hardcoded in the sqlservr binary and the solution I propose here is to intercept the call to the sysinfo() system call.

Creating a VM in the Oracle Cloud is very easy, here are the steps in one picture:

I’m connecting to the public IP Address with ssh (the public key is uploaded when creating the VM) and I’ll will run everything as root:

ssh opc@129.213.138.34
sudo su -
cat /etc/oracle-release

I install docker engine (version 19.3 there)
yum install -y docker-engine

I start docker

systemctl start docker
docker info


I’ll use the latest SQL Server 2019 image built on RHEL
docker pull mcr.microsoft.com/mssql/rhel/server:2019-latest
docker images

5 minutes to download a 1.5GB image. Now trying to start it.
The nice thing (when I compare to Oracle) is that we don’t have to manually accept the license terms with a click-through process. I just mention that I have read and accepted them with: ACCEPT_EULA=Y 

I try to run it:
docker run \
-e "ACCEPT_EULA=Y" \
-e 'MSSQL_PID=Express' \
-p 1433:1433 \
-e 'SA_PASSWORD=**P455w0rd**' \
--name mssql \
mcr.microsoft.com/mssql/rhel/server:2019-latest

There’s a hardcoded prerequisite verification to check that the system has at least 2000 MB of RAM. And I have less than one GB here in this free tier:


awk '/^Mem/{print $0,$2/1024" MB"}' /proc/meminfo

Fortunately, there’s always a nice geek on the internet with an awesome solution: hack the sysinfo() system call with a LD_PRELOAD’ed wrapper : A Slightly Liberated Microsoft SQL Server Docker image

Let’s get it:
git clone https://github.com/justin2004/mssql_server_tiny.git
cd mssql_server_tiny

I changed the FROM to build from the 2019 RHEL image and I preferred to use /etc/ld.so.preload rather than overriding the CMD command with LD_LIBRARY:


FROM oraclelinux:7-slim AS build0
WORKDIR /root
RUN yum update -y && yum install -y binutils gcc
ADD wrapper.c /root/
RUN gcc -shared -ldl -fPIC -o wrapper.so wrapper.c
FROM mcr.microsoft.com/mssql/rhel/server:2019-latest
COPY --from=build0 /root/wrapper.so /root/
ADD wrapper.c /root/
USER root
RUN echo "/root/wrapper.so" > /etc/ld.so.preload
USER mssql

I didn’t change the wrapper for the sysinfo function:
#define _GNU_SOURCE
#include
#include
#include
int sysinfo(struct sysinfo *info){
// clear it
//dlerror();
void *pt=NULL;
typedef int (*real_sysinfo)(struct sysinfo *info);
// we need the real sysinfo function address
pt = dlsym(RTLD_NEXT,"sysinfo");
//printf("pt: %x\n", *(char *)pt);
// call the real sysinfo system call
int real_return_val=((real_sysinfo)pt)(info);
// but then modify its returned totalram field if necessary
// because sqlserver needs to believe it has "2000 megabytes"
// physical memory
if( info->totalram totalram = 1000l * 1000l * 1000l * 2l ;
}
return real_return_val;
}

I build the image from there:

docker build -t mssql .


I run it:

docker run -d \
-e "ACCEPT_EULA=Y" \
-e 'MSSQL_PID=Express' \
-p 1433:1433 \
-e 'SA_PASSWORD=**P455w0rd**' \
--name mssql \
mssql

I wait until it is ready:

until docker logs mssql | grep -C10 "Recovery is complete." ; do sleep 1 ; done

All is ok and I connect and check the version:

Well… as you can see my first attempt failed. I am running with very low memory here, and then many memory allocation problems can be expected. If you look at the logs after a while, many automatic system tasks fail. But that’s sufficient for a minimal lab and you can tweak some Linux and SQL Server parameters if you need it. Comments are welcome here for feedback and ideas…

The port 1433 is exposed here locally and it can be tunneled through ssh. This is a free lab environment always accessible from everywhere to do small tests in MS SQL, running on the Oracle free tier. Here is how I connect with DBeaver from my laptop, just mentioning the public IP address, private ssh key and connection information:

Cet article Running SQL Server on the Oracle Free tier est apparu en premier sur Blog dbi services.

Should I go for ODA 19.5 or should I wait until 19.6?

Fri, 2020-02-07 05:25
Introduction

As you may know, Oracle Database 19c is available for new (X8-2) or older Oracle Database Appliances since several weeks. Current version is 19.5. But when you go to the official ODA documentation , it still first proposes version 18.7 not compatible with 19c databases. Here is why.

19c database is the final 12.2

First of all, 19c is an important release because it’s the terminal release of the 12.2, as 11.2.0.4 was for 11.2. Please refer to my other blog to understand the new Oracle versioning. ODA always supports new releases few months after being available on Linux, and it’s why it’s only available now.

Drawbacks of 19.5

19.5 is available on your ODA, but you will not be able to patch to this version. Reason is quite simple, it’s not a complete patch, you can only download ISO for reimaging and 19c grid and database software and that’s it. The reason for not yet having a patch resides in the difficulty of updating the OS part. 19.5 runs on Linux 7.7, and all previous releases are stuck with Linux 6.10, meaning that the patch should include the OS upgrade, and this jump is not so easy. It’s the first drawback.

Second drawback is that you cannot run another database version. If you still need 18c, 12.2, 12.1 or 11.2, this 19.5 is not for you.

The third drawback is that you will not be able to patch from 19.5 to 19.6 or newer version. Simply because 19.5 is an out of the way release.

Another drawback concerns the documentation not yet complete: many parts are copy/paste from 18.7. For example, described initcl command to restart the dcs agent is not a command that actually exists on Linux 7.

Moreover, my first tests on this version show annoying bugs related to database creation, those under investigation by Oracle.

When 19.6 will be ready?

19.6 is planned for 2020, yes but which month? There is no official date, it could come in march, or during the summer, nobody knows. As a result, you will have to wait for this patch to be released to start your migration to 19c on ODA.

So, what to do?

3 solutions are possible:

  • You can deal with your old databases until the patch is released: buy extended support for 11gR2/12cR1. Premier support is still OK for 12.2.0.1 and 18c
  • Migrate your old 11gR2 and 12cR1 to 18c to be prepared for 19c and avoid buying extended support, differences between 18c and 19c should be minimal
  • Deploy 19.5 for testing purpose on a test ODA and start your migration project to get prepared for 19.6. Once available, patch or redeploy your ODAs and migrate all your databases
Conclusion

Not having 19.6 now is really annoying. Afterall we choose ODA because it’s easier to get updates. But you can still prepare everything for 19c migration, by first migrate to 18c or give a try to 19c with this 19.5 release.

Cet article Should I go for ODA 19.5 or should I wait until 19.6? est apparu en premier sur Blog dbi services.

ROLLBACK TO SAVEPOINT;

Tue, 2020-02-04 14:07
By Franck Pachot

.
I love databases and, rather than trying to compare and rank them, I like to understand their difference. Sometimes, you make a mistake and encounter an error. Let’s take the following example:
create table DEMO (n int);
begin transaction;
insert into DEMO values (0);
select n "after insert" from DEMO;
update DEMO set n=1/n;
select n "after error" from DEMO;
commit;
select n "after commit" from DEMO;

The “begin transaction” is not valid syntax in all databases because transactions may be started implicitly, but the other statements are valid syntax in all the common SQL databases. They all raise an error in the update execution because there’s one row with N=0 and then we cannot calculate 1/N as it is a math error. But, what about the result of the last select?

If I run this with Oracle, DB2, MS SQL Server, My SQL (links go to example in db<>fiddle), the row added by the insert is always visible by my session: after the insert, of course, after the update error, and after the commit (then visible by everybody).

The same statements run with PostgreSQL have a different result. You cannot do anything after the error. Only rollback the transaction. Even if you “commit” it will rollback. 

Yes, no rows are remaining there! Same code but different result.

You can have the same behavior as the other databases by defining a savepoint before the statement, and rollback to savepoint after the error. Here is the db<>fiddle. With PostgreSQL you have to define an explicit savepoint if you want to continue in your transaction after the error. Other databases take an implicit savepoint. By the way, I said “statement” but here is Tanel Poder showing that in Oracle the transaction is actually not related to the statement but the user call: Oracle State Objects and Reading System State Dumps Hacking Session Video – Tanel Poder’s blog

In Oracle, you can run multiple statements in a user call with a PL/SQL block. With PostgreSQL, you can group multiple statements in one command but you can also run a PL/pgSQL block. And with both, you can catch errors in the exception block. And then, it is PostgreSQL that takes now an implicit savepoint as I explained in a previous post: PostgreSQL subtransactions, savepoints, and exception blocks

This previous post was on Medium ( you can read https://www.linkedin.com/pulse/technology-advocacy-why-i-am-still-nomad-blogger-franck-pachot/ where I explain my blog “nomadism”), but as you can see I’m back on the dbi-services blog for my 500th post there. 

My last post here was called “COMMIThttps://blog.dbi-services.com/commit/ where I explained that I was quitting consulting for CERN to start something new. But even if I decided to change, I was really happy at dbi-services (as I mentioned on a LinkedIn post about great places to work). And when people like to work together it creates an implicit SAVEPOINT where you can come back if you encounter some unexpected results. Yes… this far-fetched analogy just to mention that I’m happy to come back to dbi services and this is where I’ll blog again.

As with many analogies, it reaches the limits of the comparison very quickly. You do not ROLLBACK a COMMIT and it is not a real rollback because this year at CERN was a good experience. I’ve met great people there, learned interesting things about matter and anti-matter, and went out of my comfort zone like co-organizing a PostgreSQL meetup and inviting external people ( https://www.linkedin.com/pulse/working-consultants-only-externalization-franck-pachot/) for visits and conferences. 

This “rollback” is actually a step further, but back in the context I like: solve customer problems in a company that cares about its employees and customers. And I’m not exactly coming back at the same “savepoint”. I was mostly focused on Oracle and I’m now covering more technologies in the database ecosystem. Of course, consulting on Oracle Database will still be a major activity. But today, many other databases are raising: NoSQL, NewSQL… Open Source is more and more relevant. And in this jungle, the replication and federation technologies are raising. I’ll continue to share on these areas and you can follow this blog, the RSS feed, and/or my twitter account.

Cet article ROLLBACK TO SAVEPOINT; est apparu en premier sur Blog dbi services.

EDB PEM – Monitor your Postgres cluster

Tue, 2020-02-04 01:00

In my last post, I explained, how to setup a HA Postgres Cluster using EDB Postgres Advanced Server and Failover Manager. As a next step, we want to install the EDB Postgres Enterprise Manager to monitor what we setup before.

Introduction

There are, of course, many good tools to monitor your Postgres Cluster, but in case you run your Postgres Cluster using EDB tool, you should really think about using EDB Postgres Enterprise Manager . It allows you to monitor EDB Postgres Advanced Server Clusters as well as open source Postgres Clusters. But PEM is not only a nice GUI, it alerts and you can tune your clusters from one single point.

Installation

Before you start with the Installation of PEM, you need a PostgreSQL cluster on a host.
Once you have it, you can go on with the installation of edb-pem-server.

$ sudo yum install wxBase mod_wsgi mod_ssl edb-pem-server -y

Afterwards you need to set a password for the enterprisedb user in your cluster

$ psql
postgres=# alter user enterprisedb with password '******';
ALTER ROLE
postgres=# \q

That’s it. So let’s go on with the configuration.

Configuration

EDB delivers the PEM Server with a nice script to configure the server. This makes it really easy!

$ sudo /usr/edb/pem/bin/configure-pem-server.sh
------------------------------------------------------
 EDB Postgres Enterprise Manager
 -----------------------------------------------------
Install type: 1:Web Services and Database, 2:Web Services 3: Database [ ] :1
Enter local database server installation path (i.e. /usr/edb/as10 , or /usr/pgsql-10, etc.) [ ] :/usr/edb/as11
Enter database super user name [ ] :enterprisedb
Enter database server port number [ ] :5444
Enter database super user password [ ] :
Please enter CIDR formatted network address range that agents will connect to the server from, to be added to the server's pg_hba.conf file. For example, 192.168.1.0/24 [ 0.0.0.0/0 ] :192.168.22.53/32
Enter database systemd unit file or init script name (i.e. edb-as-10 or postgresql-10, etc.) [ ] :edb-as-11
Please specify agent certificate path (Script will attempt to create this directory, if it does not exists) [ ~/.pem/ ] :
CREATE EXTENSION
[Info] Configuring database server.
CREATE DATABASE
..
..
[Info] Configured the webservice for EDB Postgres Enterprise Manager (PEM) Server on port '8443'.
[Info] PEM server can be accessed at https://192.168.22.53:8443/pem at your browser

Now you can try to access PEM with your Webbrowser.

Agent installation and configuration

Once the PEM Server is setup, you need to install the edb-pem-agent on all hosts, which you want to monitor.

$ sudo yum install edb-pem-agent -y

Be sure to have the pg_hba.conf entries correct on the Agent and PEM Node.
As soon as the installation is finished, you can register the agent to the PEM Server. For this you need the IP, the port and the user for PEM.

$ sudo bash
$ export PEM_MONITORED_SERVER_PASSWORD=edb
$ export PEM_SERVER_PASSWORD=edb
$ /usr/edb/pem/agent/bin/pemworker --register-agent --pem-server 192.168.22.53 --pem-port 5400 --pem-user enterprisedb --allow_server_restart true --allow-batch-probes true --batch-script-user enterprisedb 

As last step you need to add the configuration to the agent.cfg and start/enable the pemagent service.

$ echo "allow_streaming_replication=true" >> /usr/edb/pem/agent/etc/agent.cfg
$ echo "ca_file=/usr/libexec/libcurl-pem/share/certs/ca-bundle.crt" >> /usr/edb/pem/agent/etc/agent.cfg
$ systemctl enable pemagent
$ systemctl start pemagent

Now you can have a look at the PEM Dashboard and you will see the registered agents.

As a next step you can add the PostgreSQL clusters to PEM, as I already explained that here I want go into this here.

Conclusion

Now you can enjoy the full monitoring experience using PEM. As well as all the nice put into graphs.

Cet article EDB PEM – Monitor your Postgres cluster est apparu en premier sur Blog dbi services.

Setup a two node Postgres HA Cluster using EDB Advanced Server and EFM

Mon, 2020-02-03 01:00

Some time ago I had a customer looking for a two node PostgreSQL Master/Replica solution. As we need Oracle compatibility in a later step, we decided to go with the EnterpriseDB tools. This article should give you an introduction on how to setup the environment.

Prerequisites

There are just some few things, that you need to prepare.
You need (at least) three servers with:

  • EPEL repository available
  • Subscription for EDB
  • EDB repository available

To make everything working with our DMK some folders and links are needed:

mkdir -p /u01/app/postgres/product/as11
mkdir -p /u01as11
mkdir -p /usr/edb
mkdir -p /u02/pgdata/11/PG1
ln -s /u02/pgdata/11/PG1/ /u01as11/data
ln -s /u01/app/postgres/product/as11/ /usr/edb/as11
yum install -y unzip xorg-x11-xauth screen
EDB Advanced Server Installation

Let’s start with the installation of the EDB Advanced Server This is really straight forward:

$ yum install edb-as11-server
$ chown enterprisedb:enterprisedb /u02/pgdata/11/epg1/
$ chown -R enterprisedb:enterprisedb /u01/app/
$ rm -rf /u01as11/backups/
$ passwd enterprisedb

Now you can install and configure our DMK. Make sure to adjust var::PGUSER::=::nooption::”enterprisedb” in the dmk.conf.

As soon as the installation is done, you can initialize a new primary cluster.

enterprisedb@edb1:/var/lib/edb/ [PG1] cat /etc/pgtab
PG1:/u01/app/postgres/product/as11/:/u02/pgdata/11/PG1/:5444:Y
enterprisedb@edb1:/var/lib/edb/ [PG1] dmk
enterprisedb@edb1:/var/lib/edb/ [pg950] PG1

********* dbi services Ltd. ****************

STATUS           : CLOSED

********************************************
enterprisedb@ad1:/var/lib/edb/ [PG1] sudo mkdir -p /u02/pgdata/PG1
enterprisedb@ad1:/var/lib/edb/ [PG1] sudo chown enterprisedb:enterprisedb /u02/pgdata/PG1
enterprisedb@ad1:/var/lib/edb/ [PG1] /u01/app/postgres/product/as11/bin/initdb --pgdata=/u02/pgdata/PG1/ --pwprompt --data-checksums --auth=md5
The files belonging to this database system will be owned by user "enterprisedb".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

Enter new superuser password:
Enter it again:

fixing permissions on existing directory /u02/pgdata/PG1 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Europe/Berlin
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
creating edb sys ... ok
loading edb contrib modules ...
edb_redwood_bytea.sql ok
edb_redwood_date.sql ok
dbms_alert_public.sql ok
dbms_alert.plb ok
dbms_job_public.sql ok
dbms_job.plb ok
dbms_lob_public.sql ok
dbms_lob.plb ok
dbms_output_public.sql ok
dbms_output.plb ok
dbms_pipe_public.sql ok
dbms_pipe.plb ok
dbms_rls_public.sql ok
dbms_rls.plb ok
dbms_sql_public.sql ok
dbms_sql.plb ok
dbms_utility_public.sql ok
dbms_utility.plb ok
dbms_aqadm_public.sql ok
dbms_aqadm.plb ok
dbms_aq_public.sql ok
dbms_aq.plb ok
dbms_profiler_public.sql ok
dbms_profiler.plb ok
dbms_random_public.sql ok
dbms_random.plb ok
dbms_redact_public.sql ok
dbms_redact.plb ok
dbms_lock_public.sql ok
dbms_lock.plb ok
dbms_scheduler_public.sql ok
dbms_scheduler.plb ok
dbms_crypto_public.sql ok
dbms_crypto.plb ok
dbms_mview_public.sql ok
dbms_mview.plb ok
dbms_session_public.sql ok
dbms_session.plb ok
edb_bulkload.sql ok
edb_gen.sql ok
edb_objects.sql ok
edb_redwood_casts.sql ok
edb_redwood_strings.sql ok
edb_redwood_views.sql ok
utl_encode_public.sql ok
utl_encode.plb ok
utl_http_public.sql ok
utl_http.plb ok
utl_file.plb ok
utl_tcp_public.sql ok
utl_tcp.plb ok
utl_smtp_public.sql ok
utl_smtp.plb ok
utl_mail_public.sql ok
utl_mail.plb ok
utl_url_public.sql ok
utl_url.plb ok
utl_raw_public.sql ok
utl_raw.plb ok
commoncriteria.sql ok
waitstates.sql ok
installing extension edb_dblink_libpq ... ok
installing extension edb_dblink_oci ... ok
installing extension pldbgapi ... ok
snap_tables.sql ok
snap_functions.sql ok
dblink_ora.sql ok
sys_stats.sql ok
finalizing initial databases ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    /u01/app/postgres/product/as11/bin/pg_ctl -D /u02/pgdata/PG1/ -l logfile start

enterprisedb@ad1:/var/lib/edb/ [PG1]

Than adjust wal_keep_segments and afterwards the edb-as service can be enabled and started.

$ echo "wal_keep_segments=100" >> $PGDATA/postgresql.auto.conf
$ sudo systemctl enable edb-as-11.service
$ sudo systemctl start edb-as-11

To be sure everything works as expected, reboot the server (if possible).

All above steps should also be done on your additional nodes, but without the systemctl start.

Configuration

First, on Node 1 (Master) you need to create the replication role.

postgres=# create role replication with REPLICATioN PASSWORD 'replication' login;
CREATE ROLE

Second, you need to add replication to pg_hba.conf.

local   replication    all             127.0.0.1/32            trust
host    replication    all             192.168.22.53/32        trust
host    replication    all             192.168.22.51/32        trust
host    replication    all             192.168.22.52/32        trust
host    replication    all             ::1/128                 trust

And last but not least, your should exchange the ssh-key of all nodes:

enterprisedb@edb1:/u01 [PG1] ssh-keygen
enterprisedb@edb1:/u01 [PG1] ssh-copy-id enterprisedb@edb1
enterprisedb@edb1:/u01 [PG1] ssh-copy-id enterprisedb@edb2
enterprisedb@edb1:/u01 [PG1] ssh-copy-id enterprisedb@edb3
Create the replica

As already mentioned, you need almost all steps done on Node 2 as well, but without starting the service.
Make sure to have all hosts in pg_hba.conf of Master.

To create the replication create a pg_basebackup into Node 2:

enterprisedb@edb2:/u01 [PG1] pg_basebackup -h 192.168.22.51 -U replication -p 5432 -D $PGDATA -Fp -Xs -P -R
49414/49414 kB (100%), 1/1 tablespace

Once finish, check if the recovery.conf is available and add the following lines:

enterprisedb@edb2:/u01 [PG1] echo "recovery_target_timeline = 'latest'" >> /u02/pgdata/11/PG1/recovery.conf
enterprisedb@edb2:/u01 [PG1] echo "trigger_file='/u02/pgdata/11/PG1/trigger_failover'" >> /u02/pgdata/11/PG1/recovery.conf

To test, if the recovery is working, start the cluster and check the recovery status.

enterprisedb@edb2:/u01 [PG1] pgstart
enterprisedb@edb2:/u01 [PG1] psql -U enterprisedb -c "select pg_is_in_recovery()" postgres
 pg_is_in_recovery
-------------------
 t
(1 row)

enterprisedb@edb2:/u01 [PG1] sudo systemctl enable edb-as-11.service
enterprisedb@edb2:/u01 [PG1] pgstop
enterprisedb@edb2:/u01 [PG1] systemctl start edb-as-11
EDB Postgres Failover Manager (EFM)

To make our two Node setup High Available, we need to install the EDB Postgres Failover Manager on three nodes. On the both installed with the Master / Replica and on a third one as a witness server.

Installation

Installation for EFM is straight forward as well, therefore your have to do the following steps on all three nodes. To use EFM toghether with our DMK, you need to create some links.

$ sudo yum install edb-efm37
$ sudo yum install java-1.8.0-openjdk
$ sudo chown -R enterprisedb:enterprisedb /etc/edb/efm-3.7/
$ cat /etc/edb/efm-3.7/efm.nodes
$ sudo ln -s /usr/edb/efm-3.7 /usr/edb/efm
$ sudo ln -s /etc/edb/efm-3.7 /etc/edb/efm
Configuration

On the master you have to set a password for the enterprisedb user and encrypt the password using EFM.

$ psql -U enterprisedb -c "alter user enterprisedb with password '*******'" postgres
$ /usr/edb/efm/bin/efm encrypt efm

The enrypted password generated by efm encrypt will be needed in the efm.properties files

As next step we need an efm.properties file on Node 1 and 2 (parameters to adjust below).

$ cp /u01/app/postgres/local/dmk/templates/postgres/efm.properties.in /etc/edb/efm-3.7/efm.properties
$ vi /etc/edb/efm-3.7/efm.properties
  db.user=enterprisedb
  db.password.encrypted=f17db6033ef1be48ec1955d38b4c9c46
  db.port=5400
  db.database=postgres
  db.bin=/u01/app/postgres/product/as11/bin
  db.recovery.dir=/u02/pgdata/11/EPAS
  bind.address=192.168.22.51:7800
  admin.port=7809
  is.witness=false 
  virtualIp=192.168.22.55
  virtualIp.interface=enp0s8
  virtualIp.prefix=24
  virtualIp.single=true
$ sudo chown enterprisedb:enterprisedb /etc/edb/efm/efm.properties

We also need a efm.nodes file to have all nodes of the cluster.

$ cat /etc/edb/efm/efm.nodes
# List of node address:port combinations separated by whitespace.
# The list should include at least the membership coordinator's address.
192.168.22.51:7800 192.168.22.52:7800 192.168.22.53:7800
$ chown efm:efm efm.nodes
$ chmod 660 /etc/edb/efm/efm.nodes

To conclude, enable and start the efm-3.7.service.

sudo systemctl enable efm-3.7.service
sudo systemctl start efm-3.7.service

On node 3 we need to create a efm.properties file as well, but we need the efm.properties_witness file of dmk as draft.

$ cp /u01/app/postgres/local/dmk/templates/postgres/efm.properties_witness /etc/edb/efm-3.7/efm.properties

Adjust the parameters as shown in the step for node 1 and 2, but be careful to have:

is.witness=true

Afterwards start the efm-3.7 service on node 3 as well.

$ sudo systemctl start efm-3.7.service

In the end, you can check if everything is running as expected using EFM.

$ efm cluster-status efm
Cluster Status: efm

        Agent Type  Address              Agent  DB       VIP
        -----------------------------------------------------------------------
        Standby     192.168.22.51        UP     UP       192.168.22.55
        Master      192.168.22.52        UP     UP       192.168.22.55*
        Witness     192.168.22.53        UP     N/A      192.168.22.55

Allowed node host list:
        192.168.22.52 192.168.22.51 192.168.22.53

Membership coordinator: 192.168.22.52

Standby priority host list:
        192.168.22.51

Promote Status:

        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info
        ---------------------------------------------------------------------------
        Master      192.168.22.52                           0/110007B8
        Standby     192.168.22.51        0/110007B8         0/110007B8

        Standby database(s) in sync with master. It is safe to promote.

That’s it, now you have a Master/Replica system using EDB tools.
In a next step we will have a look at the setup of the cluster monitoring using EDB Enterprise Manager.

Cet article Setup a two node Postgres HA Cluster using EDB Advanced Server and EFM est apparu en premier sur Blog dbi services.

Tracking Logs Inside a Documentum Container (part II)

Wed, 2020-01-29 18:45
Testing the log watcher

This is part II of the article. Part I is here.
All the above code has to be included in the entrypoint script so it gets executed at container start up time but it can also be tested more simply in a traditional repository installation.
First, we’ll move the code into a excutable script, e.g. entrypoint.sh, and run it in the background in a first terminal. Soon, we will notice that lots of log messages get displayed, e.g. from jobs executing into the repository:

---------------------------------------
Job Arguments:
(StandardJobArgs: docbase_name: dmtest.dmtest user_name: dmadmin job_id: 0800c35080007042 method_trace_level: 0 )
window_interval=120
queueperson=null
max_job_threads=3
 
---------------------------------------
2020-01-02T07:03:15.807617 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
Executing query to collect XCP automatic tasks to be processed:
select r_object_id from dmi_workitem where a_wq_name = 'to be processed by job' and r_auto_method_id != '0000000000000000' and (r_runtime_state = 0 or r_runtime_state = 1) order by r_creation_date asc
Report End 2020/01/02 07:03:15
2020-01-02T07:03:16.314586 5888[5888] 0100c3508000ab90 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab90 quit."
Thu Jan 02 07:04:44 2020 [INFORMATION] [LAUNCHER 6311] Detected during program initialization: Version: 16.4.0080.0129 Linux64
2020-01-02T07:04:44.736720 6341[6341] 0100c3508000ab91 [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab91 started for user dmadmin."
Thu Jan 02 07:04:45 2020 [INFORMATION] [LAUNCHER 6311] Detected while preparing job dm_Initialize_WQ for execution: Agent Exec connected to server dmtest: [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab91 started for user dmadmin."
 
 
2020-01-02T07:04:45.686337 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
2020-01-02T07:04:45.698970 1597[1597] 0100c3508000ab7f [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab7f is owned by user dmadmin now."
Initialize_WQ Report For DocBase dmtest.dmtest As Of 2020/01/02 07:04:45
 
---------------------------------------
Job Arguments:
(StandardJobArgs: docbase_name: dmtest.dmtest user_name: dmadmin job_id: 0800c3508000218b method_trace_level: 0 )
window_interval=120
queueperson=null
max_job_threads=3
 
---------------------------------------
Starting WQInitialisation job:
2020-01-02T07:04:45.756339 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
Executing query to collect Unassigned worktiems to be processed:
select r_object_id, r_act_def_id from dmi_workitem where a_held_by = 'dm_system' and r_runtime_state = 0 order by r_creation_date
Total no. of workqueue tasks initialized 0
Report End 2020/01/02 07:04:45
2020-01-02T07:04:46.222728 6341[6341] 0100c3508000ab91 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab91 quit."
Thu Jan 02 07:05:14 2020 [INFORMATION] [LAUNCHER 6522] Detected during program initialization: Version: 16.4.0080.0129 Linux64
2020-01-02T07:05:14.828073 6552[6552] 0100c3508000ab92 [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab92 started for user dmadmin."
Thu Jan 02 07:05:15 2020 [INFORMATION] [LAUNCHER 6522] Detected while preparing job dm_bpm_XCPAutoTaskMgmt for execution: Agent Exec connected to server dmtest: [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab92 started for user dmadmin."
 
 
2020-01-02T07:05:15.714803 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
2020-01-02T07:05:15.726601 1597[1597] 0100c3508000ab7f [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab7f is owned by user dmadmin now."
bpm_XCPAutoTaskMgmt Report For DocBase dmtest.dmtest As Of 2020/01/02 07:05:15
 
---------------------------------------
 

Then, from a second terminal, we’ll start and stop several idql sessions and observe the resulting output. We will notice the familiar lines *_START and *_QUIT from the session’s logs:

---------------------------------------
2020-01-02T07:09:16.076945 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
Executing query to collect XCP automatic tasks to be processed:
select r_object_id from dmi_workitem where a_wq_name = 'to be processed by job' and r_auto_method_id != '0000000000000000' and (r_runtime_state = 0 or r_runtime_state = 1) order by r_creation_date asc
Report End 2020/01/02 07:09:16
2020-01-02T07:09:16.584776 7907[7907] 0100c3508000ab97 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab97 quit."
2020-01-02T07:09:44.969770 8080[8080] 0100c3508000ab98 [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab98 started for user dmadmin."
2020-01-02T07:09:47.329406 8080[8080] 0100c3508000ab98 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab98 quit."
...

So, inotifywatch is pretty effective as a file watcher.

Let’se see how many tail processes are currently running:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 4818 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 4846 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 4850 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 8375 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c3508000218b
1 8389 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab99
1 8407 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/Initialize_WQDoc.txt
1 8599 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c35080000386
1 8614 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9a
1 8657 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9b
1 8673 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/DataDictionaryPublisherDoc.txt

And after a while:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 4818 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 4846 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 4850 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 8599 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c35080000386
1 8614 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9a
1 8657 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9b
1 8673 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/DataDictionaryPublisherDoc.txt
1 8824 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c35080007042
1 8834 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9c
1 8852 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/bpm_XCPAutoTaskMgmtDoc.txt

Again:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 10058 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 10078 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c3508000218b
1 10111 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9f
1 10131 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 10135 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 10139 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/Initialize_WQDoc.txt

And again:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 10892 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 10896 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c3508000218b
1 10907 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000aba1
1 10921 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 10925 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 10930 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/Initialize_WQDoc.txt

And, after disabling, the aggregation of new logs:

$ echo 0 > $tail_on_off
$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 24676 24277 9328 pts/1 26096 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 24710 24277 9328 pts/1 26096 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 24714 24277 9328 pts/1 26096 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f

And eventually:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
<nothing>

From now on, until the aggregation is turned back on, this list will be empty as no new tail commands will be started.
The number of tail commands grows and shrinks as expected, so the process clean up is working.
The agentexec.log file is often here because jobs are started frequently and therefore this file regularly triggers the MODIFY event. dmadmin and sysadmin owned session logs come and go for each started job. The DataDictionaryPublisherDoc.txt, bpm_XCPAutoTaskMgmtDoc.txt and Initialize_WQDoc.txt are a few followed job’s logs.
In the used test system, an out of the box docbase without any running applications, the output is obviously very quiet with long pauses in between but it can become extremely dense in a busy system. When investigating problems in such systems, it can be useful to redirect the whole output to a text file and use one’s favorite editor to search it at leisure. It is also possible to restrict it to an as narrow as desired time window (if docker logs is used) in order to reduce its noise, exclude files from being watched (see the next paragraph) and even to stop aggregating new logs via the $tail_on_off file so only the currently ones are followed as long as they are active (i.e. written in).

Dynamically reconfiguring inotifywait via a parameter file

In the preceding examples, the inotifywait command has been passed a fixed, hard-coded subtree name to watch. In certain cases, it could be useful to be able to watch a different, random sub-directory or list of arbitrary files in unrelated sub-directories with a possible list of excluded files; for even more flexibility, the other inotifywait’s parameters could also be changed dynamically. In such cases, the running inotifywait command will have to be stopped and restarted to change its command-line parameters. One could imagine to check a communication file (like the $tail_on_off file above) inside the entrypoint’s main loop, as shown below. Here the diff is relative to the precedent static parameters version:

10a11,13
> # file that contains the watcher's parameters;
> export new_inotify_parameters_file=${watcher_workdir}/inotify_parameters_file
> 
15a19,20
>    inotify_params="$1"
>  
19c24
    eval $private_inotify ${inotify_params} $watcher_workdir | gawk -v tail_timeout=$((tail_timeout * 60)) -v excluded_files="$excluded_files" -v bMust_tail=1 -v tail_on_off=$tail_on_off -v heartbeat_file=$heartbeat_file -v env_private_tail=private_tail -v FS="|" -v Apo="'" 'BEGIN {
122a128,142
> process_new_inotify_parameters() {
>    # read in the new parameters from file $new_inotify_parameters_file;
>    # first line of the $new_inotify_parameters_file must contains the non-target parameters, e.g. -m -e create,modify,attrib -r --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T %e %w%f";
>    new_params=$(cat $new_inotify_parameters_file)
>    rm $new_inotify_parameters_file
>  
>    # kill current watcher;
>    pkill -f $private_inotify
> 
>    # kill the current private tail commands too;
>    pkill -f $private_tail
> 
>    # restart inotify with new command-line parameters taken from $new_inotify_parameters_file;
>    follow_logs "$new_params" &
> }
138,139c158,162
< # start the watcher;
 # default inotifywait parameters;
> # the creation of this file will trigger a restart of the watcher using the new parameters;
> cat - < $new_inotify_parameters_file
>    --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format '%T|%e|%w%f' \${DOCUMENTUM}/dba/log --exclude \$new_inotify_parameters_file
> eot
143a167
>    [[ -f $new_inotify_parameters_file ]] && process_new_inotify_parameters

To apply the patch, save the first listing into a file, e.g. inotify_static_parameters.sh, the above diff output into a file, e.g. static2dynamic.diff, and use the command below to create the script inotify_dynamic_parameters.sh:

patch inotify_static_parameters.sh static2dynamic.diff -o inotify_dynamic_parameters.sh

In order not to trigger events when it is created, the $new_inotify_parameters_file must be excluded from inotifywait’s watched directories.
If, for instance, we want later to exclude jobs’ logs in order to focus on more relevant logs, we could use the following inotifywait’s parameters:

# cat - <<-eot > $new_inotify_parameters_file
   --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T|%e|%w%f" \${DOCUMENTUM}/dba/log --exclude '${DOCUMENTUM}/dba/log/dmtest/agentexec/job_.*' --exclude $new_inotify_parameters_file
eot

From the outside of a container:

docker exec <container> /bin/bash -c 'cat - < $new_inotify_parameters_file
   --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T|%e|%w%f" \${DOCUMENTUM}/dba/log --exclude "\${DOCUMENTUM}/dba/log/dmtest/agentexec/job_.*" --exclude \$new_inotify_parameters_file
eot
'

where the new option −−exclude specifies a POSIX-compliant regular expression to be quoted so the shell does not perform its variable expansions.
After at most $pause_duration seconds, this file is detected (line 34), read (line 14), deleted (line 15) and the current watcher gets restarted in the background with the new parameters (line 24).
This approach is akin to polling the parameter file and it looks a bit unsophisticated. We use events to control the tailing through the $tail_on_off file and for the heartbeat. Why not for the parameter file ? The next paragraph will show just that.

Dynamically reconfiguring inotifywait via an event

A better way would be to use the watcher itself to check if the parameter file has changed ! After all, we shall put our money where our mouth is, right ?
In order to do this, and dedicated watcher is set up to watch the parameter file. For technical reasons (in order to watch a file, that file must already exist. Also, when the watched file is erased, inotifywait does not react to any events on that file any longer; it just sits there idly), instead of watching $new_inotify_parameters_file, it watches its parent directory. To avoid scattering files in too many locations, the parameter file will be stored along with the technical files in ${watcher_workdir} and, in order to avoid impacting the performance, it will be excluded from the main watcher (parameter –exclude \$new_inotify_parameters_file, do not forget to append it otherwise the main watcher will raise events for nothing; it should not impede the parameter file to be processed though).
When a MODIFY event on this file occurs, which includes a CREATE event if the file did not pre-exist, the event is processed as shown precedently.
Here the diff compared to the preceding polled parameter file version:

143a144,151
> # the parameter file's watcher;
> param_watcher() {
>    IFS="|"
>    inotifywait --quiet --monitor --event create,modify,attrib --timefmt "%Y/%m/%d-%H:%M:%S" --format '%T|%e|%w%f' $watcher_workdir --exclude \$tail_on_off \$heartbeat_file | while read timestamp event target_file; do
>       [[ $target_file == $new_inotify_parameters_file ]] && process_new_inotify_parameters
>    done
> }
> 
162a171,173
> 
> # starts the parameter file's watcher;
> param_watcher &

The delta is quite short with just the function param_watcher() to start the watcher on the parameter file and process its signal by invoking the same function process_new_inotify_parameters() introduced in the polling version.
To apply the patch, save the above diff output into a file, e.g. dynamic2event.diff, and use the command below to create the script inotify_dynamic_parameters_event.sh from the polling version’s script inotify_dynamic_parameters.sh created above:

patch inotify_dynamic_parameters.sh dynamic2event.diff -o inotify_dynamic_parameters_event.sh

The dedicated watcher runs in the background and restarts the main watcher whenever the latter receives new parameters. Quite simple.

Dynamically reconfiguring inotifywait via signals

Yet another way to interact with inotifywait’s script (e.g. the container’s entrypoint) could be through signals. To this effect, we’ll need first to choose suitable signals, say SIGUSR[12], define a trap handler and implement it. And while we are at it, let’s also add switching the watcher on and off, as illustrated below:

2a3,5
> trap 'bounce_watcher' SIGUSR1
> trap 'toggle_watcher' SIGUSR2
> 
7a11,14
> # new global variable to hold the current tail on/off status;
> # used to toggle the watcher;
> export bMust_tail=1
> 
144,149c151,162
< # the parameter file's watcher;
< param_watcher() {
<    IFS="|"
<    inotifywait --quiet --monitor --event create,modify,attrib --timefmt "%Y/%m/%d-%H:%M:%S" --format '%T|%e|%w%f' $watcher_workdir --exclude \$tail_on_off \$heartbeat_file | while read timestamp event target_file; do
<       [[ $target_file == $new_inotify_parameters_file ]] && process_new_inotify_parameters
 # restart the watcher with new parameters;
> bounce_watcher() {
>    [[ -f $new_inotify_parameters_file ]] && process_new_inotify_parameters
>    echo "parameters reloaded"
> }
> 
> # flip the watcher on/off;
> # as file $tail_on_off is watched, modifying it will trigger the processing of the boolean bMust_tail in the gawk script;
> toggle_watcher() {
>    (( bMust_tail = (bMust_tail + 1) % 2 ))
>    echo $bMust_tail > $tail_on_off
>    echo "toggled the watcher, it is $bMust_tail now"
172,173c185,187
< # starts the parameter file's watcher;
 process_new_inotify_parameters
> 
> echo "process $$ started"
178d191
<    [[ -f $new_inotify_parameters_file ]] && process_new_inotify_parameters

To apply the patch, save the above diff output into a file, e.g. event2signals.diff, and use the command below to create the script inotify_dynamic_parameters_signals.sh from the event version’s script inotify_dynamic_parameters_event.sh created above:

patch inotify_dynamic_parameters_events.sh event2signals.diff -o inotify_dynamic_parameters_signals.sh

The loop is simpler now as the functions are directly invoked by outside signals.
To use it, just send the SIGUSR[12] signals to the container, as shown below:

# write a new configuration file:
...
# and restart the watcher:
$ docker kill --signal SIGUSR1 <container>
 
# toggle the watcher;
$ docker kill --signal SIGUSR2 <container>

If testing outside a container, the commands would be:

/bin/kill --signal SIGUSR1 pid
# or:
/bin/kill --signal SIGUSR2 pid

where pid is the pid displayed when the script starts.
We won’t indulge more on signals, see How to stop Documentum processes in a docker container, and more (part I) for many more examples of using signals to talk to containers.

Conclusion

inotifywait is simple to configure and extremely fast, pratically instantaneous, at sending notifications. Although the aggregated output looks a bit confusing when the system is loaded, there are several ways to reduce its volume to make it easier to use. It is an interesting addition to an administrator to help troubleshooting repositories and their client applications. With suitable parameters, it can be used to support any containerized or traditional installations. It is one of those no frills tools that just work (mostly) as expected. If you ever happen to need a file watcher, consider it, you won’t be disappointed and it is fun to use.

Cet article Tracking Logs Inside a Documentum Container (part II) est apparu en premier sur Blog dbi services.

Tracking Logs Inside a Documentum Container (part I)

Wed, 2020-01-29 18:44

Containers running under docker can have their stdout observed from the outside through the “docker logs”command; here is an excerpt of its usage:

docker logs --help
Usage:	docker logs [OPTIONS] CONTAINER

Fetch the logs of a container

Options:
      --details        Show extra details provided to logs
  -f, --follow         Follow log output
      --since string   Show logs since timestamp (e.g. 2013-01-02T13:23:37) or relative (e.g. 42m
                       for 42 minutes)
      --tail string    Number of lines to show from the end of the logs (default "all")
  -t, --timestamps     Show timestamps
      --until string   Show logs before a timestamp (e.g. 2013-01-02T13:23:37) or relative (e.g.
                       42m for 42 minutes)

e.g.:

Example of output:

docker logs --follow --timestamps container05bis
...
2019-07-10T03:50:38.624862914Z ==> /app/dctm/dba/log/docbroker.container05bis.1489.log <==
2019-07-10T03:50:38.624888183Z OpenText Documentum Connection Broker (version 16.4.0000.0248 Linux64)
2019-07-10T03:50:38.624893936Z Copyright (c) 2018. OpenText Corporation
2019-07-10T03:50:38.624898034Z 2019-07-10T05:50:38.519721 [DM_DOCBROKER_I_START]info: "Docbroker has started. Process id: 35"
2019-07-10T03:50:38.624902047Z 2019-07-10T05:50:38.521502 [DM_DOCBROKER_I_REGISTERED_PORT]info: "The Docbroker registered using port (1489)."
2019-07-10T03:50:38.624906087Z 2019-07-10T05:50:38.521544 [DM_DOCBROKER_I_LISTENING]info: "The Docbroker is listening on network address: (INET_ADDR: family: 2, port: 1489, host: container05bis (192.168.33.7, 0721a8c0))"
2019-07-10T03:50:38.624911984Z
2019-07-10T03:50:38.624915369Z ==> /app/dctm/dba/log/dmtest05bis.log <==
2019-07-10T03:50:38.625040316Z
2019-07-10T03:50:38.625050474Z ==> /app/dctm/dba/log/dmtest05bis/agentexec/agentexec.log <==
2019-07-10T03:50:38.625055299Z Wed Jul 10 03:33:48 2019 [INFORMATION] [LAUNCHER 4251] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625059065Z Wed Jul 10 03:34:18 2019 [INFORMATION] [LAUNCHER 4442] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625071866Z Wed Jul 10 03:34:48 2019 [INFORMATION] [LAUNCHER 4504] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625075268Z Wed Jul 10 03:36:18 2019 [INFORMATION] [LAUNCHER 4891] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625078688Z Wed Jul 10 03:36:49 2019 [INFORMATION] [LAUNCHER 4971] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625082182Z Wed Jul 10 03:48:18 2019 [INFORMATION] [LAUNCHER 6916] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625096886Z
2019-07-10T03:50:38.625101275Z ==> /app/dctm/wildfly9.0.1/server/DctmServer_MethodServer/logs/ServerApps.log <==
2019-07-10T03:50:38.625105098Z at io.undertow.servlet.core.DeploymentManagerImpl.start(DeploymentManagerImpl.java:511)
2019-07-10T03:50:38.625108575Z at org.wildfly.extension.undertow.deployment.UndertowDeploymentService.startContext(UndertowDeploymentService.java:101)
2019-07-10T03:50:38.625112342Z at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:82)
2019-07-10T03:50:38.625116110Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2019-07-10T03:50:38.625120084Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2019-07-10T03:50:38.625123672Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2019-07-10T03:50:38.625127341Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2019-07-10T03:50:38.625131122Z at java.lang.Thread.run(Thread.java:748)
2019-07-10T03:50:38.625134828Z at org.jboss.threads.JBossThread.run(JBossThread.java:320)
2019-07-10T03:50:38.625139133Z 05:34:58,050 INFO [ServerService Thread Pool -- 96] com.documentum.cs.otds.DfSessionHandler - DFC Client Successfully initialized
2019-07-10T03:50:38.625143291Z
2019-07-10T03:50:38.625146939Z ==> /app/dctm/wildfly9.0.1/server/DctmServer_MethodServer/logs/ServerApps_trace.log <==
2019-07-10T03:50:38.625150991Z
2019-07-10T03:50:38.625154528Z ==> /app/dctm/wildfly9.0.1/server/DctmServer_MethodServer/logs/dmotdsrest.log <==
2019-07-10T03:50:38.625159563Z 03:33:16,505 [ServerService Thread Pool -- 66] DFC Client Successfully initialized
2019-07-10T03:50:38.625163445Z 05:34:58,050 [ServerService Thread Pool -- 96] DFC Client Successfully initialized
2019-07-10T03:50:38.625955045Z Setting up watches. Beware: since -r was given, this may take a while!
2019-07-10T03:50:38.627044196Z Watches established.
2019-07-10T03:50:38.934648542Z
2019-07-10T03:50:38.934668467Z ==> /app/dctm/dba/log/dmtest05bis.log <==
2019-07-10T03:50:38.934673076Z Wed Jul 10 05:50:38 2019[DM_STARTUP_W_DOCBASE_OWNER_NOT_FOUND] *** warning *** : The database user, dmtest05bisc as specified by your server.ini is not a valid user as determined using the system password check api. This will likely severly impair the operation of your docbase.
2019-07-10T03:50:39.001793811Z
2019-07-10T03:50:39.001816266Z
2019-07-10T03:50:39.001821414Z OpenText Documentum Content Server (version 16.4.0000.0248 Linux64.Oracle)
2019-07-10T03:50:39.001825146Z Copyright (c) 2018. OpenText Corporation
2019-07-10T03:50:39.001828983Z All rights reserved.
2019-07-10T03:50:39.001832816Z
2019-07-10T03:50:39.005318448Z 2019-07-10T05:50:39.005068 193[193] 0000000000000000 [DM_SERVER_I_START_SERVER]info: "Docbase dmtest05bis attempting to open"
2019-07-10T03:50:39.005337158Z
2019-07-10T03:50:39.005342472Z 2019-07-10T05:50:39.005172 193[193] 0000000000000000 [DM_SERVER_I_START_KEY_STORAGE_MODE]info: "Docbase dmtest05bis is using database for cryptographic key storage"
...

If the information is plethoric, the above command can be narrowed to a time window by adding −−since and −−until restrictions, e.g.:

docker logs --timestamps --follow --since 5m <container>

Thus, everything sent to the container’s stdout can be aggregated and viewed very simply from one place, yielding a cheap console log. In particular, Documentum containers could expose their well-known log files to the outside world, e.g. the docbroker log, the content server(s) log(s) and the method server logs. To this effect, it would be enough to just start a “tail -F ” on those files from within the entrypoint as illustrated below:

tail -F ${DOCUMENTUM}/dba/log/docbroker.log ${DOCUMENTUM}/dba/log/dmtest.log ${DOCUMENTUM}/wildfly9.0.1/server/DctmServer_MethodServer/logs/ServerApps.log ...

The -F option guarantees that the logs continue being followed even after a possible rotation.
Admittedly, this output can be a bit hard to read because the logs are interlaced, i.e. lines or block of lines from different logs are displayed sorted by the time they were produced and not by their origin. Actually, this is a benefit because it makes it easier to find correlations between distinct, apparently unrelated events.
Viewing a particular log is still possible from without the container, e.g.:

docker exec <container_name> /bin/bash -c "tail -f \${DOCUMENTUM}/dba/log/dmtest05bis.log"

provided the tail command exists in the container, which is not obvious as there is a definitive will to make images as stripped down as possible.
As those files are statically known (i.e. at build time), such command could be defined as early as in the buildfile and invoked in its entrypoint script.
Unfortunately, the content server logs are not very verbose and the most useful messages are directed to session or ad hoc logs. The session logs are dynamically created for each new session with the session id as their name, which makes it unpredictable. Since those names are only known at run-time, the above buildfile’s “tail -F” command cannot include them and consequently they are not displayed by the “docker logs” command. The same applies to on-demand trace files with variable names, e.g. with a timestamp suffix.
So, is there a way to follow those dynamic session logs (or any dynamically named files at that) anyway ? An obvious way is to use a file or directory watcher to be notified of any created or modified file. If a watcher process running inside the container could wait for such conditions and signal any occurence thereof, a listener process, also running inside the container, could receive the notification and dynamically fork a tail -F command to follow that file. Externally, “docker logs” would continue displaying whatever is sent to the container’s stdout, including the newly discovered files.
Under Linux, we can use inotifywait as the file watcher. Let’s see how to set it up.

Installing inotifywait

Under a Debian Linux derivative such as Ubuntu, inotifywait can be easily installed from the inotify-tools package through the usual command:

sudo apt-get install inotify-tools

Under a Red Hat Linux derivative such as Centos, a two-step method is to first grab the rpm package from its on-line repository and then install it; the latest release as of this writing is the 3.14-9:

  curl https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/i/inotify-tools-3.14-9.el7.x86_64.rpm -o inotify-tools-3.14-9.el7.x86_64.rpm
  sudo rpm -Uvh inotify-tools-3.14-9.el7.x86_64.rpm 

In order to verify that the installation was successful, just try to launch “inotifywait”:

$ inotifywait
No files specified to watch!

Good, the command exists; let’s now test it.

Testing inotifywait

The command inotifywait has the following invocation syntax:

inotifywait [-hcmrq] [-e  ] [-t  ] [--format  ] [--timefmt  ]  [ ... ]

The man page explains very well each of the parameters so, please, refer there for details.
As the whole point is to detect new or changed files whose name we don’t know at image build time, but whose location is known (e.g. a path such as ${DOCUMENTUM}/dba/log), we will be watching directories. In such a case, one parameter will be a list of directories to recursively watch for any new or modified files.
That command was designed to output to stdout any event whose occurence it was configured to wait for.
The default output format is the following:

watched_filename EVENT_NAMES event_filename

one line per file.
EVENT_NAMES is the event that occurred, in our case mainly one of CREATE or MODIFY as requested through the -e command-line parameter.
As we watch directories, watched_filename is the name of the watched directory where the event occured and event_filename, the created/modified file.
Here is an example of use as a test with custom formatting for a better presentation:

$ inotifywait --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T %e %w%f" /tmp &

The command will recursively (option −−recursive) watch the /tmp directory for any created or modified (option −−event …) file or a change of file attributes such as the timestamp or permissions. It will run in the background (option −−monitor) and issue events to stdout prefixed with a properly formatted time stamp (options −−timefmt and −−format).
With inotifywait running in the background, create now a dummy file in /tmp:

$ touch /tmp/xx
# the following lines get displayed:
2019/12/31-17:43:45 CREATE /tmp/xx
2019/12/31-17:43:45 ATTRIB /tmp/xx

Actually, the /tmp directory is a busy directory and a lot of file activity occurs in there very quickly:

2019/12/31-17:43:52 CREATE /tmp/hsperfdata_dmadmin/471
2019/12/31-17:43:52 MODIFY /tmp/hsperfdata_dmadmin/471
2019/12/31-17:43:52 MODIFY /tmp/hsperfdata_dmadmin/471
2019/12/31-17:44:06 MODIFY /tmp/nohup.log
2019/12/31-17:44:06 MODIFY /tmp/nohup.log
2019/12/31-17:44:06 MODIFY /tmp/nohup.log
2019/12/31-17:45:06 MODIFY /tmp/nohup.log
2019/12/31-17:45:06 MODIFY /tmp/nohup.log
2019/12/31-17:45:06 MODIFY /tmp/nohup.log
2019/12/31-17:45:22 CREATE /tmp/hsperfdata_dmadmin/510
2019/12/31-17:45:22 MODIFY /tmp/hsperfdata_dmadmin/510
2019/12/31-17:45:22 MODIFY /tmp/hsperfdata_dmadmin/510
2019/12/31-17:45:50 CREATE /tmp/runc-process785366568
2019/12/31-17:45:50 MODIFY /tmp/runc-process785366568
...

Let’s see if the directory is really watched recursively:

$ mkdir -p /tmp/dir1/dir2
> 2019/12/31-17:49:51 CREATE,ISDIR /tmp/dir1
> 2019/12/31-17:49:51 CREATE,ISDIR /tmp/dir1/dir2

We notice that the directory creation is trapped too.

$ touch /tmp/dir1/dir2/xx
> 2019/12/31-17:49:59 CREATE /tmp/dir1/dir2/xx
> 2019/12/31-17:49:59 ATTRIB /tmp/dir1/dir2/xx

It works as advertised. Moreover, two events are raised here, CREATE for the file creation as it didn’t previously exist, and ATTRIB for the change of timestamp. Let’s verify this:

$ touch /tmp/dir1/dir2/xx
> 2019/12/31-17:50:01 ATTRIB /tmp/dir1/dir2/xx
$ touch /tmp/dir1/dir2/xx
> 2019/12/31-17:50:03 ATTRIB /tmp/dir1/dir2/xx

Indeed.
Let’s see if a change of attribute is also noticed:

$ chmod -r /tmp/dir1/dir2/xx
2019/12/31-18:03:50 ATTRIB /tmp/dir1/dir2/xx
$ chmod +r /tmp/dir1/dir2/xx
2019/12/31-18:03:55 ATTRIB /tmp/dir1/dir2/xx

It is, fine.

Using inotify in a [containerized] Documentum installation

inotifywait will be used with the syntax shown above against the Documentum log directory ${DOCUMENTUM}/dba/log. Its output will be piped into a gawk script that will spawn “tail -F” commands when needed and keep a list of such processes so they can be killed after a timeout of inactivity, i.e. when they are following a file that did not get updated within a given time interval (let’s jokingly name this value “tail time to live”, or TTTL). This is to prevent a potentially unlimited number of such processes to hog the system’s resources, not that they consume much CPU cycles but it is pointless to leave hundreds of such idle processes sitting in memory. So the script will start a tail command on a file upon its CREATE event, or a MODIFY event if not already followed, and clean up existing idle tail commands. As said before, the code below could be included into a container’s entrypoint to make it run constantly in the background.

#!/bin/bash

export watcher_workdir=/tmp/watcher
export tail_on_off=${watcher_workdir}/tail_on_off
export heartbeat_file=${watcher_workdir}/heartbeat_file
export pause_duration=3

# comma-separated list of files watched but excluded from tailing;
export excluded_files="$tail_on_off,$heartbeat_file"

# make private copies of inotifywait and tail so they can be easily identified and killed from the list of running processes;
export private_inotify=~/dctm-inotifywait
export private_tail=~/dctm-tail

follow_logs() {
   # tail time to live, maximum duration in minutes an unmodified tailed file will stay tailed before the tail is killed, i.e. TTTL ;-);
   tail_timeout=1
 
   $private_inotify --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T|%e|%w%f" ${DOCUMENTUM}/dba/log $watcher_workdir | gawk -v tail_timeout=$((tail_timeout * 60)) -v excluded_files="$excluded_files" -v tail_on_off=$tail_on_off -v heartbeat_file=$heartbeat_file -v env_private_tail=private_tail -v FS="|" -v Apo="'" 'BEGIN {
      # get the dedicated tail command from the environment;
      private_tail=ENVIRON[env_private_tail]

      # get current time;
      cmd_now = "date +\"%Y/%m/%d-%H:%M:%S\""
 
      # get the time of the next check, i.e. now + the timeout delay;
      cmd_future = "date +\"%Y/%m/%d-%H:%M:%S\" --date=\"" tail_timeout  " seconds\""
      cmd_future | getline next_check_date; close(cmd_future)
 
      # find the running private tail commands;
      # use the same FS defined in the outer gawk scope to guarantee thet $0 can be split smoothly;
      cmd_tail = "pgrep -fa " private_tail " | gawk " Apo "{print $1 \"" FS "\" $NF}" Apo
 
      # get the most recent time a running idle tail command is allowed to live;
      cmd_oldest = "date +\"%Y/%m/%d-%H:%M:%S\" --date=\"" tail_timeout  " seconds ago\""
 
      # command to get the modifiy date of a tailed file;
      cmd_file_stamp = "date +\"%Y/%m/%d-%H:%M:%S\" --reference="
 
      # files can be excluded from tailing if specified in parameter exclude_files;
      # convert the space-separated list of excluded files to an associative array, which is easier to search as it is indexed by file names;
      nb_files = split(excluded_files, tmp_excluded_files, ",")
      for (;nb_files > 0; nb_files--)
         tab_excluded_files[tmp_excluded_files[nb_files]] = 0

      bMust_tail = 1
   }
   {
      # skip directories (they have a trailing /);
      if (match($3, /\/$/)) next
 
      # check if logs must be tailed;
      if (tail_on_off == $3 && ("CREATE" == $2 || "MODIFY" == $2)) {
         if ((getline bMust_tail < tail_on_off) >  0) {
            close(tail_on_off)
            system("rm " tail_on_off " 2>/dev/null")
         }
      } else 
         # tailing on ?
         if (bMust_tail && !($3 in current_tails))
            # CREATE event ?
            if ("CREATE" == $2 && !($3 in tab_excluded_files)) {
               # newly created file not in exclusion list: tail it !
               system(private_tail " --follow=name --retry " $3 " 2>/dev/null &")
               # ... and keep track of it;
               current_tails[$3] = 0
            }
            # MODIFY event ?
            else if ("MODIFY" == $2 && !($3 in tab_excluded_files)) {
               # modified file not in exclusion list nor already tailed: tail it !
               system(private_tail " --follow=name --retry " $3 " 2>/dev/null &")
               # ... and keep track of it;
               current_tails[$3] = 0
            }
 
      # clean up logic starts here;
      # only upon a heartbeat event;
      if ("ATTRIB" == $2 && $3 == heartbeat_file) {
         # time to check ?
         cmd_now | getline now; close(cmd_now)
         if (now >= next_check_date) {
            # maximum time to live for idle tails;
            cmd_oldest | getline oldest_date; close(cmd_oldest)
            # loop though all the running tail commands;
            while ((cmd_tail | getline) > 0) {
               # cannot explain this spurious entry, ignoring it explicitly eventhough it will anyway; 
               if ("$NF}" Apo == $2) continue
               tail_pid = $1
               tailed_file = $2
               # is it one of the watched files ?
               if (tailed_file in current_tails) {
                  # get the current tailed file last modification date;
                  (cmd_file_stamp tailed_file " 2>/dev/null") | getline last_modif_date; close(cmd_file_stamp tailed_file " 2>/dev/null")
                  # tailed file not updated during time to live period ?
                  if (last_modif_date <= oldest_date) {
                     # yes, kill the tailing process;
                     system("kill -9 " tail_pid " 2> /dev/null")
                     # ... and update the list of tailed files;
                     delete current_tails[tailed_file]
                  }
                  else current_tails[tailed_file] = 1
               }
               # else it should not happen because private tail commands are only issues from here and get tracked;
            }
            close(cmd_tail)

            # resynchronize internal list with actual tailed files;
            for (f in current_tails)
               if (0 == current_tails[f])
                  # tracked file not tailed any more (this should not happen because private tail commands are killed from here only), untrack it;
                  delete current_tails[f]
               else
                  # tracked file was checked and is still alive;
                  current_tails[f] = 0
 
            # compute the next check time;
            cmd_future | getline next_check_date; close(cmd_future)
         }
      }
   }'
}
 
 
# -----------
# main;
# -----------

# create an inotifywait alias so it is easily identified to be stopped later;
ln -s /usr/bin/inotifywait $private_inotify 2> /dev/null
ln -s /usr/bin/tail $private_tail           2> /dev/null

# create the watcher's workdir;
mkdir -p $watcher_workdir 2> /dev/null

# enable following new or modified logs;
echo 1 > $tail_on_off
 
# start the watcher;
follow_logs &

while [ true ]; do
   # send heartbeat to inotify;
   touch $heartbeat_file
   # do something else here;
   # ...
   sleep $pause_duration
done

Admittedly, the fact that gawk relies a lot on external commands and pipes (because of its lack of libraries, but this can be arranged by extending it, see other articles in this blog such this one) obfuscates somewhat the statements’ purpose. Also, all the logic is contained in the not-so-obvious automatic loop which is executed each time an event is notified. Anyway, as usual the code is only provided as an illustration.
On line 3, a “technical” sub-directory is defined and created later on line 134. It will contain the heartbeat file (file heartbeat_file, see next paragraph for more details) and a file to switch tailing on and off (file tail_on_off). This directory is watched (see the call to inotifywait on line 19).
One line 12 and 13, private versions of the commands inotifywait and tail are defined and created later on line 130 and 131 as symlinks; this is to facilitate their identification in the list of running processes to kill them if needed.
This script is encapsulated in function follow_logs() started on line 15. It is launched in the background on line 140.
On line 46 the boolean bMust_tail is initialized; it gets updated on line 53 upon a CREATE event on the file $tail_on_off; after it has been read into bMust_tail, the file is remove. By writing 0 or a positive number in it, aggregation of new logs is respectively disabled or enabled:

# disable following new logs;
# from within a container or in a traditional installation:
$ echo 0 > $tail_on_off
 
# or, for a container, supposing $tail_on_off is defined in the container's current user's environment:
$ docker exec container_name /bin/bash -c "echo 0 > \$tail_on_off"
 
# enable following new logs;
# from within a container or in a traditional installation:
$ echo 1 > $tail_on_off
 
# or, for a container, supposing $tail_on_off is defined in the container's current user's environment:
$ docker exec container_name /bin/bash -c "echo 1 > \$tail_on_off"

Currently running tail commands are not impacted.
Note how the $ in $tail_on_off is escaped so it is not consumed by the docker’s host shell and is passed as-is to the container.
On line 63 and 70, private tails commands are started when new files have appeared in the watched sub-directory, or modified files which are not already tailed.
One line 79, a clean-up of idle (private) tail commands is performed and the internal associative array that keeps track of them is refreshed to make sure it reflects the currently running tails.
One line 142, the well-known container entrypoint’s never-ending loop is entered.
The above script can also be run stand-alone in a classic, non containerized installation.

Keeping the gawk script busy

A problem to solve is that the gawk script runs as a coroutine to inotify, i.e. it is synchronized with its output. If there is none, because no event were detected, the script blocks waiting for input, so no cleaning of idle tail commands gets done. This not such a big deal because no new tails commands are started either so the status quo is maintained. However, an easy work-around is possible: In order to force the script to enter the next iteration and give it a chance to perform the cleaning, we introduce a heartbeat with one watched file, e.g.:

...
while [ true ]; do
   # send heartbeat to inotify;
   touch $heartbeat_file
   # do something else here;
   # ...
   sleep $pause_duration
done


This explains the presence of the file $heartbeat_file in the list of inotifywait’s target (see line 19 in the code above).
Now, because its timestamp is updated, at least one event is always raised periodically, an ATTRIB on the heartbeat file $heartbeat_file, as shown before. Although ATTRIB events are listened to, no action is done upon them generally, except when occuring on $heartbeat_file in which case their sole purpose is to trigger the script’s execution, more precisely, the cleanup of inactive tail commands.

Let’s test it now.

Cet article Tracking Logs Inside a Documentum Container (part I) est apparu en premier sur Blog dbi services.

SQL Server Tips: Path of the default trace file is null

Tue, 2020-01-28 03:01

In addition of my precedent blog about this subject “SQL Server Tips: Default trace enabled but no file is active…”, I add a new case where the default path of the trace file was empty.

The first step was to verify if the default trace is enabled with the command:

SELECT * FROM sys.configurations WHERE name=’default trace enable’

It is enabled, then I check the current running trace with the view sys.traces

SELECT * FROM sys.traces


As you can see, this time I have a trace but with a null in the Path for the trace file…

To correct this issue, the only way is to stop and reactive the trace in the configuration:

EXEC sp_configure 'show advanced options',1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure 'default trace enabled',0;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure 'default trace enabled',1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure 'show advanced options',0;
GO
RECONFIGURE WITH OVERRIDE;
GO

Et voila, I have a trace file now…

Cet article SQL Server Tips: Path of the default trace file is null est apparu en premier sur Blog dbi services.

PostgreSQL Meetup at CERN, Geneva, January 17, 2020

Thu, 2020-01-23 10:37
Registration

Last Friday (17/01/20) I had the opportunity to go to Geneva at CERN to attend a PostgreSQL meetup.
I’m really happy to write a blog about for a lot of reasons. First of all, it was about PostgreSQL (essential these days), also for the contributors as Oleg Bartunov and finally because of the organizers: Franck Pachot (old & soon new colleague by dbi-services) & Laeticia Avrot I met at the SOUG Day in Lausane.


After the registration process where a lot of people were waiting to get their badge, I followed the instructions given to reach the Building 503.  I was rapidly lost in this  labyrinth and I had to ask several times to find the conference room.

Unfortunately I was late and missed the “Welcome & opening speech” at 02pm. Worst, the first minutes of Gülçin Yıldırım’s session.
I took discreetly a seat closed to the door in the conference room. I was immediately impressed as already more than 100 persons were in.
It seems to be that there was a huge interest for this event.

So the first talk was about the evolution of Fault Tolerance in PostgreSQL.
Gülcin mainly focused on the evolution of fault tolerance capabilities in PostgreSQL throughout its versions.
Robustness, replication methods, failover & Switchover were the main topics.

The second talk was from Pavlo Golub , a PostgreSQL expert and developer at Cybertec (Software & database partner in Austria).
After some words about the existing scheduling tools in the Postgres community, he introduced a new Open Source tool from Cybertec, called pg_timetable.
He went very deeply in details & discussed some advanced topics like transaction support and cross-platform tasks.
At the end we had a successful demo.

Break

After this session, we had a break as it was 04pm. A super kings cake’s party (Galette des Rois) accompanied by excellent ciders were offered.
We could even choose either the one from Normandy or from Brittany.
I tasted both and appreciated.
By the way, I found the bean twice. I have to admit that this was a really good idea as the athmospere was relaxed and friendly.

At around 04:45pm, Anastasia Lubennikova started her session. A comparison between various backup/recovery solutions for PostgreSQL.
The goal  was to help us to choose the most appropriate tool. She covered several advanced topics in that area as: incremental backups, archive management, backup validation, retention policies.

The next session from Vik Fearing was a short one – Advent of Code Using PostgreSQL
During his talk, Vik showed some techniques to solve general-purpose programming challenges using just a single query with lateral joins.

Romuald Thion, a PostgreSQL teacher at the university of Lyon (France) followed: his talk was about some lessons learned
since he has progressively introduced PostgreSQL in its courses in 2018.
Since then,  400 students enrolled from second to fifth year use PostgreSQL in databases, web, security or system administration courses.

Finally and for the last session like an apotheosis, came on stage Oleg Bartunov, a PostgreSQL Major Contributor & developer.
“All You Need Is Postgres” was the title of his session. This could be soon a hymn.
Oleg talked a lot about his long Postgres professional experience since Postgres95, about all the projects he participated as GiST, GIN and SP-GiST, and also full text search, NoSQL features (hstore and jsonb). He talked also about astronomy and Supernovae.
This was really fascinating and we could listen to him for hours.

Conclusion

A Drinks & Food party concluded this meetup. It was a nice opportunity to meet all speakers. To discuss with them and also some customers or old colleagues (Pierre Boizot).
Learn and share, this is part of dbi-services spirit and matches our values!
One last thing, I would like to address special thanks to Franck Pachot & Laetitia Avrot for the good organization & overall the cool atmosphere during this meetup.
Hoping for next…

Cet article PostgreSQL Meetup at CERN, Geneva, January 17, 2020 est apparu en premier sur Blog dbi services.

Adding PostgreSQL extensions without being super user?

Wed, 2020-01-22 09:33

Usually, when you need to install a PostgreSQL extension you do this as superuser (or at least I am doing it like this). The downside of that is, of course, that a super user must be available once a new extension is required or that all the extensions are installed by default (e.g. in template1). Recently the question popped up internally if you can install extensions without being super user. The answer is: yes and no. It depends on the objects that get created by the extensions and it depends on the permissions of the user that wants to install the extension. As always, lets do some demos as this is the easiest way to understand how things work.

Currently there are no extensions installed in my database except for the standard PL/pgSQLl extension:

postgres=# \dx
                 List of installed extensions
  Name   | Version |   Schema   |         Description          
---------+---------+------------+------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(1 row)

For the purpose of this post lets create a new user without any special permissions other than login:

postgres=# create user a with login password 'a';
CREATE ROLE
postgres=# \du a
           List of roles
 Role name | Attributes | Member of 
-----------+------------+-----------
 a         |            | {}

This user, even if not granted anything, by default has permissions to create objects in the public schema (you should definitely avoid that, check here):

postgres=# \c postgres a
You are now connected to database "postgres" as user "a".
postgres=> select current_database();
 current_database 
------------------
 postgres
(1 row)
postgres=> create table tab1 ( a int );
CREATE TABLE

What this user is not able to do, is to use create extension” to install a new extension into the database:

postgres=> create extension lo;
ERROR:  permission denied to create extension "lo"
HINT:  Must be superuser to create this extension.
postgres=> 

Why is that? If we take a look at the extension’s SQL file the first statement is this:

CREATE FUNCTION ltree_in(cstring)
RETURNS ltree
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;

“CREATE FUNCTION” does work as we are able to create objects in the public schema. The issue is this:

postgres=> CREATE FUNCTION ltree_in(cstring)
postgres-> RETURNS ltree
postgres-> AS 'MODULE_PATHNAME'
postgres-> LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
ERROR:  permission denied for language c
postgres=> 

We do not have access to the language. Lets try to grant the required privilege for using the language:

postgres=# grant USAGE on LANGUAGE c to a;
ERROR:  language "c" is not trusted
DETAIL:  GRANT and REVOKE are not allowed on untrusted languages, because only superusers can use untrusted languages.

That does not work as well, as c is untrusted. The same is true for “language internal” as in intagg–1.1.sql:

postgres=> CREATE FUNCTION int_agg_state (internal, int4)
postgres-> RETURNS internal
postgres-> AS 'array_agg_transfn'
postgres-> PARALLEL SAFE
postgres-> LANGUAGE INTERNAL;
ERROR:  permission denied for language internal

As all the extensions in standard PostgreSQL community do reference either “c” or “internal” somehow we do not have any chance to get an extension installed as user “a”. Lets do another test and create a new database with user “a” as it’s owner:

postgres=> \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# create database dba with owner a;
CREATE DATABASE
postgres=# \c dba a
You are now connected to database "dba" as user "a".
dba=> 

Can we install extensions now?

dba=> create extension lo;
ERROR:  permission denied to create extension "lo"
HINT:  Must be superuser to create this extension.
dba=> 

Another error message but it still does not work. We can, however, install PL/Perl as this is a trusted language:

dba=> create extension plperl;
CREATE EXTENSION
dba=> 

Actually PL/Perl is the only extension that can be installed in this scenario, which can be confirmed by a simply PL/pgSQL code block:

dba=> drop extension plperl;
DROP EXTENSION
dba=> do $$
dba$> declare
dba$>   ext record;
dba$>   exception_text text;
dba$> begin
dba$>   for ext in
dba$>       select name
dba$>         from pg_available_extensions
dba$>        order by name
dba$>   loop
dba$>     begin
dba$>        execute 'create extension '||ext.name;
dba$>        raise notice 'SUCCESS for %', ext.name;
dba$>     exception
dba$>        when others
dba$>          then get stacked diagnostics exception_text = MESSAGE_TEXT;
dba$>          raise notice '% failed with %', ext.name, exception_text;
dba$>     end;
dba$>   end loop;
dba$> end $$;
NOTICE:  adminpack failed with permission denied to create extension "adminpack"
NOTICE:  amcheck failed with permission denied to create extension "amcheck"
NOTICE:  autoinc failed with permission denied to create extension "autoinc"
NOTICE:  bloom failed with permission denied to create extension "bloom"
NOTICE:  btree_gin failed with permission denied to create extension "btree_gin"
NOTICE:  btree_gist failed with permission denied to create extension "btree_gist"
NOTICE:  citext failed with permission denied to create extension "citext"
NOTICE:  cube failed with permission denied to create extension "cube"
NOTICE:  dblink failed with permission denied to create extension "dblink"
NOTICE:  dict_int failed with permission denied to create extension "dict_int"
NOTICE:  dict_xsyn failed with permission denied to create extension "dict_xsyn"
NOTICE:  earthdistance failed with required extension "cube" is not installed
NOTICE:  file_fdw failed with permission denied to create extension "file_fdw"
NOTICE:  fuzzystrmatch failed with permission denied to create extension "fuzzystrmatch"
NOTICE:  hstore failed with permission denied to create extension "hstore"
NOTICE:  hstore_plperl failed with required extension "hstore" is not installed
NOTICE:  hstore_plperlu failed with required extension "hstore" is not installed
NOTICE:  hstore_plpython2u failed with required extension "hstore" is not installed
NOTICE:  hstore_plpython3u failed with required extension "hstore" is not installed
NOTICE:  hstore_plpythonu failed with required extension "hstore" is not installed
NOTICE:  insert_username failed with permission denied to create extension "insert_username"
NOTICE:  intagg failed with permission denied to create extension "intagg"
NOTICE:  intarray failed with permission denied to create extension "intarray"
NOTICE:  isn failed with permission denied to create extension "isn"
NOTICE:  jsonb_plperl failed with required extension "plperl" is not installed
NOTICE:  jsonb_plperlu failed with required extension "plperlu" is not installed
NOTICE:  jsonb_plpython2u failed with required extension "plpython2u" is not installed
NOTICE:  jsonb_plpython3u failed with required extension "plpython3u" is not installed
NOTICE:  jsonb_plpythonu failed with required extension "plpythonu" is not installed
NOTICE:  lo failed with permission denied to create extension "lo"
NOTICE:  ltree failed with permission denied to create extension "ltree"
NOTICE:  ltree_plpython2u failed with required extension "ltree" is not installed
NOTICE:  ltree_plpython3u failed with required extension "ltree" is not installed
NOTICE:  ltree_plpythonu failed with required extension "ltree" is not installed
NOTICE:  moddatetime failed with permission denied to create extension "moddatetime"
NOTICE:  pageinspect failed with permission denied to create extension "pageinspect"
NOTICE:  pg_buffercache failed with permission denied to create extension "pg_buffercache"
NOTICE:  pg_freespacemap failed with permission denied to create extension "pg_freespacemap"
NOTICE:  pg_prewarm failed with permission denied to create extension "pg_prewarm"
NOTICE:  pg_stat_statements failed with permission denied to create extension "pg_stat_statements"
NOTICE:  pg_trgm failed with permission denied to create extension "pg_trgm"
NOTICE:  pg_visibility failed with permission denied to create extension "pg_visibility"
NOTICE:  pgcrypto failed with permission denied to create extension "pgcrypto"
NOTICE:  pgrowlocks failed with permission denied to create extension "pgrowlocks"
NOTICE:  pgstattuple failed with permission denied to create extension "pgstattuple"
NOTICE:  SUCCESS for plperl
NOTICE:  plperlu failed with permission denied to create extension "plperlu"
NOTICE:  plpgsql failed with extension "plpgsql" already exists
NOTICE:  plpython3u failed with permission denied to create extension "plpython3u"
NOTICE:  postgres_fdw failed with permission denied to create extension "postgres_fdw"
NOTICE:  refint failed with permission denied to create extension "refint"
NOTICE:  seg failed with permission denied to create extension "seg"
NOTICE:  sslinfo failed with permission denied to create extension "sslinfo"
NOTICE:  tablefunc failed with permission denied to create extension "tablefunc"
NOTICE:  tcn failed with permission denied to create extension "tcn"
NOTICE:  tsm_system_rows failed with permission denied to create extension "tsm_system_rows"
NOTICE:  tsm_system_time failed with permission denied to create extension "tsm_system_time"
NOTICE:  unaccent failed with permission denied to create extension "unaccent"
NOTICE:  xml2 failed with permission denied to create extension "xml2"
DO

If you want to install an extension that only creates objects you are allowed to create anyway, that would succeed. The probably best way to handle extensions is to install all the required ones by default and then grant permissions to the users who need them.

Cet article Adding PostgreSQL extensions without being super user? est apparu en premier sur Blog dbi services.

Make Oracle database simple again!

Wed, 2020-01-22 09:30
Introduction

Let’s have a look at how to make Oracle database as simple as it was before.

Oracle database is a great piece of software, yes it’s quite expensive, but it’s still the reference and most of the companies can find a configuration that fits their needs according to a budget. Another complain about Oracle is the complexity: nothing is really simple, and you’ll need skillful DBA(s) to deploy, manage, upgrade, troubleshoot your databases. But complexity is sometimes caused by wrong decisions you make without having the knowledge, mainly because some choices add significant complexity compared to others.

The goal

Why the things need to be simple?

Obviously, simplification is:

  • easier troubleshooting
  • more “understandable by the others”
  • reinstallation made possible in case of big troubles
  • avoiding bugs related to the mix of multiple components
  • less work, because you probably have enough work with migration, patching, performance, …
  • more reliability because less components is less problems
On the hardware side

Rules for simplifying on the hardware side are:

  • Choose the same hardware for all your environments (DEV/TEST/QUAL/PROD/…): same server models, same CPU family, same revision. Make only slight variations on memory amount, number of disks and processor cores configuration if needed. Order all the servers at the same time. If a problem is related to hardware, you will be able to test the fix on a less important environment before going on production
  • Don’t use SAN: SAN is very nice, but SAN is not the performance guarantee you expect. Adopt local SSD disks: NVMe type SSDs have amazing speed, they are a true game changer in today’s database performance. Getting rid of the SAN is also getting rid of multipathing, resource sharing, complex troubleshooting, external dependencies and so on
  • Provision very large volumes for data: dealing with space pressure is not the most interesting part of your job. And it’s time consuming. You need 4TB of disks? Order 12TB and you’ll be ready for each situation even those not planned. For sure it’s more expensive, but adding disks afterall is not always that simple. It makes me think about a customer case where trying to add a single disk led to a nightmare (production down for several days)
  • Consider ODA appliances (Small or Medium): even if it’s not simplifying everything, at least hardware is all that you need and is dedicated to Oracle database software
  • Think consolidation: Oracle database has a strong database isolation thus leading to easy consolidation. Consolidate to limit the number of servers you need to manage is also simplifying your environment
  • Avoid virtualization: without talking about the license, virtualization is for sure underlying complexity
On the system side

Some rules are also good to know regarding the system:

  • Go for Redhat or Oracle Linux: mainly because it’s the most common OS for Oracle databases. Releases and patches are always available for Linux first. UNIX and Windows are decreasing in popularity for Oracle Databases these past 10 years
  • Same OS: please keep your operating systems strictly identical from development to production. If you decide to upgrade the OS, do that first on TEST/DEV and finish with production servers without waiting months. And never update through internet as packages can be different each time you update a system
  • Limit the filesystems number for your databases: 1 big oradata and 1 big FRA is enough on SSD, you don’t need to slice everything as we did before, and slicing is always wasting space
On the software side

A lot of things should be done, or not done regarding software:

  • Install the same Oracle version (release + patch) and use the same tree everywhere. Use OFA (/u01/…) or not but be consistent
  • Limit the Oracle versions in use: inform your software vendors that your platform cannot support too old versions, especially non-terminal releases like 11.2.0.3. 19c, 12.1.0.2 and eventually 11.2.0.4 are the only recommended version to deploy
  • Don’t use ASM: because ext4 is fine and SSDs now bring you maximum performance even on a classic filesystem. ASM will always be linked to Grid Infrastructure making dependencies between the DB Homes and that Grid stack, making patching much more complex
  • Don’t use RAC: because most of your applications cannot correctly manage high availability. RAC is much more complex compared to single instance databases. Not choosing RAC is getting rid of interconnect, cluster repository, fusion cache for SGA sharing, SAN or NAS technologies, split brains, scan listeners and complex troubleshooting. Replacing RAC with Data Guard or Dbvisit standby is the new way of doing sort of high availability without high complexity
Simplify backup configuration

For sure you will use RMAN, but how to simplify backups with RMAN?

  • Use the same backup script for all the databases
  • Use the same backup frequency for each database because when you need to restore, you’d better have a fresh backup
  • Configure only different retention on each database
  • Backup to disk (the most convenient being on a big nfs share) and without any specific library (backup your /backup filesystem later with your enterprise backup tool if needed)
  • provision large enough filesystem to never need to delete backups still in the retention period

Using the same backup strategy means being able to use the same restore procedure on all databases because you always need a quick restore of a broken database.

Always backup controlfile and spfile on the standby databases, the resulting backupset has a very small footprint and makes easier restore of the standby using database backupsets from the primary without the need for duplication.

Consider RMAN catalog only if you have enough databases to manage.

Simplify database management and configuration
  • Create scripts for database configuration and tablespace creation (for example: configure_SID.sql and tablespaces_SID.sql) to be able to reconfigure the same database elsewhere
  • don’t create grid and oracle users if you plan to use Grid Infrastructure/ASM: as a DBA you probably manage both ASM and database instances. Instead of loosing time switching between these 2 users, configure only one oracle user for both
  • never use graphical tools to create a database, deploy an appliance, configure something: because screenshots are far less convenient than pure text commands easily repeatable and scriptable
  • Use OMF: configure only db_create_file_dest and db_recovery_file_dest and Oracle will multiplex the controlfile and the redologs in these areas. OMF is also naming datafiles for you: there is no need for manual naming, who really cares about the name of the datafiles?
  • Don’t use multitenant: multitenant is fine but it’s been years we’re living with non-CDB databases and it works like a charm. You can still use non-CDB architecture in 19c, so multitenant is not mandatory even on this latest version. Later migration from non-CDB to pluggable database is quite easy, you will be able to use multitenant later
  • Keep your spfile clean: don’t specify unused parameters or parameters that already have the given value as a standard. Remove from the spfile these parameters using ALTER SYSTEM RESET parameter SCOPE=spfile SID='*';
Simplify patching

Patching can also be simplified:

  • Patch once a year, because you need to patch, but you don’t need to spend all your time applying each PSU every 3 months
  • Start with test/dev databases and take the time to test from the application
  • Don’t wait too much to patch the other environments: production should be patched few weeks after the first patched environment
Simplify Oracle*Net configuration

Simplifying also concerns Oracle*Net configuration:

  • Avoid configuring multiple listeners on a system because one is enough for all your databases
  • put your Oracle*Net configuration files in /etc because you don’t want multiple files in multiple homes
  • Keep your Oracle*Net configuration files clean and organized for increased readability
Make your database movable

Last but not least, one of the biggest mistake is to create a strong dependency between a database and a system. How to make your database easily movable? By configuring a standby database and using Data Guard or Dbvisit standby. Moving your database to another server is done within a few minutes with a single SWITCHOVER operation.

Using standby databases make your life easier for all of these purposes:

  • you need to move your server to another datacenter
  • a power outage happened on one site
  • you need to update hardware or software
  • you suspect a hardware problem impacting the database

Don’t only create standbys for production databases: even development databases are some kind of production for developers. If a database cannot be down for 1 day, you need a standby database.

Finest configuration is not dedicating a server for the primaries and a server for the standbys but dispatching the primaries between 2 identical servers on 2 sites. Each database having a preference server for its primary, the standby being on the opposite server.

Conclusion

It’s so easy to increase complexity without any good reason. Simplifying is the power of saying NO. No to interesting features and configurations that are not absolutely necessary. All you need for your databases is reliability, safety, availability, performance. Simplicity helps you in that way.

Cet article Make Oracle database simple again! est apparu en premier sur Blog dbi services.

PostgreSQL 13: parallel vacuum for indexes

Mon, 2020-01-20 01:06

Because of its implementation of MVCC PostgreSQL needs a way to cleanup old/dead rows and this is the responsibility of vacuum. Up to PostgreSQL 12 this is done table per table and index per index. There are a lot of parameters to fine tune auto vacuum but none of those allowed vacuum to run in parallel against a relation. The only option you had to allow auto vacuum to do more work in parallel was to increase autovacuum_max_workers so that more relations can be worked on at the same time. Working against multiple indexes of the same table was not possible. Once PostgreSQL 13 will be released this will change.

When you have a look at the help of vacuum you will notice a new option:

postgres=# \h vacuum
Command:     VACUUM
Description: garbage-collect and optionally analyze a database
Syntax:
VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, ...] ]

where option can be one of:

    FULL [ boolean ]
    FREEZE [ boolean ]
    VERBOSE [ boolean ]
    ANALYZE [ boolean ]
    DISABLE_PAGE_SKIPPING [ boolean ]
    SKIP_LOCKED [ boolean ]
    INDEX_CLEANUP [ boolean ]
    TRUNCATE [ boolean ]
    PARALLEL integer

and table_and_columns is:

    table_name [ ( column_name [, ...] ) ]

URL: https://www.postgresql.org/docs/devel/sql-vacuum.html

By providing a positive integer to the “PARALLEL” option you tell vacuum how many background workers should be used to vacuum indexes for a given table (0 will disable parallel processing). Lets do a small test setup to demonstrate this:

postgres=# create table t1 as select i as a, i::text as b, now() as c from generate_series(1,3000000) i;
SELECT 3000000
postgres=# create index i1 on t1(a);
CREATE INDEX
postgres=# 
postgres=# create index i2 on t1(b);
CREATE INDEX
postgres=# create index i3 on t1(c);
CREATE INDEX

One table, four indexes. If we go for parallel 4 we should see four background workers doing the work against the indexes:

postgres=# update t1 set a=5,b='ccc',c=now() where mod(a,5)=0;
UPDATE 600000
postgres=# vacuum (parallel 4) t1;
VACUUM

As the table and the indexes are quite small we need to be fast but at least 2 parallel workers show up in the process list for the vacuum operation:

postgres 16688 15925 13 07:30 ?        00:01:07 postgres: postgres postgres [local] VACUUM
postgres 19184 15925  0 07:39 ?        00:00:00 postgres: parallel worker for PID 16688   
postgres 19185 15925  0 07:39 ?        00:00:00 postgres: parallel worker for PID 16688   

Nice. Please note that indexes are only considered for parallel vacuum when they meet the min_parallel_index_scan_size criteria. For FULL vacuum there will be no parallel processing as well.

Btw: The current maximal value is 1024:

postgres=# vacuum (parallel -4) t1;
ERROR:  parallel vacuum degree must be between 0 and 1024
LINE 1: vacuum (parallel -4) t1;

You can also see the parallel stuff on the verbose output:

postgres=# vacuum (parallel 4, verbose true) t1;
INFO:  vacuuming "public.t1"
INFO:  launched 2 parallel vacuum workers for index vacuuming (planned: 2)
INFO:  scanned index "i2" to remove 600000 row versions by parallel vacuum worker
DETAIL:  CPU: user: 0.24 s, system: 0.06 s, elapsed: 0.89 s
INFO:  scanned index "i1" to remove 600000 row versions
DETAIL:  CPU: user: 0.17 s, system: 0.10 s, elapsed: 1.83 s
INFO:  scanned index "i3" to remove 600000 row versions by parallel vacuum worker
DETAIL:  CPU: user: 0.16 s, system: 0.14 s, elapsed: 1.69 s
INFO:  scanned index "i4" to remove 600000 row versions by parallel vacuum worker
DETAIL:  CPU: user: 0.25 s, system: 0.09 s, elapsed: 1.17 s
INFO:  "t1": removed 600000 row versions in 20452 pages
DETAIL:  CPU: user: 0.17 s, system: 0.16 s, elapsed: 1.43 s
INFO:  index "i1" now contains 3000000 row versions in 14308 pages
DETAIL:  600000 index row versions were removed.
1852 index pages have been deleted, 640 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "i2" now contains 3000000 row versions in 14305 pages
DETAIL:  600000 index row versions were removed.
1851 index pages have been deleted, 640 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "i3" now contains 3000000 row versions in 14326 pages
DETAIL:  600000 index row versions were removed.
3941 index pages have been deleted, 1603 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "i4" now contains 3000000 row versions in 23391 pages
DETAIL:  600000 index row versions were removed.
5527 index pages have been deleted, 2246 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "t1": found 600000 removable, 3000000 nonremovable row versions in 21835 out of 22072 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 499
There were 132 unused item identifiers.
Skipped 0 pages due to buffer pins, 237 frozen pages.
0 pages are entirely empty.
CPU: user: 0.75 s, system: 0.36 s, elapsed: 5.07 s.
INFO:  vacuuming "pg_toast.pg_toast_16392"
INFO:  index "pg_toast_16392_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pg_toast_16392": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 499
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

Cet article PostgreSQL 13: parallel vacuum for indexes est apparu en premier sur Blog dbi services.

Deploying your own PostgreSQL image on Nutanix Era – 2 – Deploying a new PostgreSQL VM

Sat, 2020-01-18 00:00

In the last post I described how you can create your own PostgreSQL image in Nutanix Era. In Nutanix wording this is a “Software profile”. This profile can now be used to deploy PostgreSQL VMs with just a few clicks or by using the API. In this post we’ll look at how this can be done and if it is really as easy as Nutanix promises. We’ll be doing it by using the graphical console first and then by using the API. Lets go.

For deploying new databases we obviously need to go to the database section of Era:

For now we are going to deploy a single instance:

Our new software profile is available and this is what we use. Additionally a public ssh key should be provided for accessing the node:

Provide the details for the new instance:

Time machine allows you to go back in time and creates automatic storage snapshots according to the schedule you define here:

SLAs define retention policies for the snapshots. I will not cover that her.

Kick it off and wait for the task to finish:

A few minutes later it is done:

Our new database is ready:

From a deployment point of view this is super simple: A few clicks and a new PostgreSQL server is ready to use in minutes. The API call to do the same on the command line would be this:

curl -k -X POST \
	https://10.38.11.9/era/v0.8/databases/provision \
	-H 'Content-Type: application/json' \
	-H 'Authorization: Basic YWRtaW46bngyVGVjaDAwNyE=' \
	-d \
	'{"databaseType":"postgres_database","databaseName":"db2","databaseDescription":"db2","clusterId":"ccdab636-2a11-477b-a358-2b8db0382421","softwareProfileId":"aa099964-e17c-4264-b047-00881dc5e2f8","softwareProfileVersionId":"7ae7a44c-d7c5-46bc-bf0f-f9fe7665f537","computeProfileId":"fb1d20bb-38e4-4964-852f-6209990402c6","networkProfileId":"8ed5214c-4e2a-4ce5-a899-a07103bc60cc","dbParameterProfileId":"3679ab5b-7688-41c4-bcf3-9fb4491544ea","newDbServerTimeZone":"Europe/Zurich","timeMachineInfo":{"name":"db2_TM","description":"","slaId":"4d9dcd6d-b6f8-47f0-8015-9e691c1d3cf4","schedule":{"snapshotTimeOfDay":{"hours":1,"minutes":0,"seconds":0},"continuousSchedule":{"enabled":true,"logBackupInterval":30,"snapshotsPerDay":1},"weeklySchedule":{"enabled":true,"dayOfWeek":"FRIDAY"},"monthlySchedule":{"enabled":true,"dayOfMonth":"17"},"quartelySchedule":{"enabled":true,"startMonth":"JANUARY","dayOfMonth":"17"},"yearlySchedule":{"enabled":false,"dayOfMonth":31,"month":"DECEMBER"}},"tags":[],"autoTuneLogDrive":true},"provisionInfo":[{"name":"application_type","value":"postgres_database"},{"name":"nodes","value":"1"},{"name":"listener_port","value":"5432"},{"name":"proxy_read_port","value":"5001"},{"name":"proxy_write_port","value":"5000"},{"name":"database_size","value":10},{"name":"working_dir","value":"/tmp"},{"name":"auto_tune_staging_drive","value":true},{"name":"enable_synchronous_mode","value":false},{"name":"dbserver_name","value":"postgres-2"},{"name":"dbserver_description","value":"postgres-2"},{"name":"ssh_public_key","value":"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDgJb4co+aaRuyAJv8F6fsz2YgO0ZHldX6qS22c813iDeUGWP/1bGhF598uODlgK5nx29sW2WTlcmorZuCgHC2BJfzhL1xsL5Ca94uAEg48MbA+1+Woe49mF6bPDJWLXqC0YUpCDBZI9VHnrij4QhX2r3G87W0WNnNww1POwIJN3xpVq/DnDWye+9ZL3s+eGR8d5f71iKnBo0BkcvY5gliOtM/DuLT7Cs7KkjPgY+S6l5Cztvcj6hGO96zwlOVN3kjB18RH/4q6B7YkhSxgTDMXCdoOf9F6BIhAjZga4lodHGbkMEBW+BJOnYnFTl+jVB/aY1dALECnh4V+rr0Pd8EL daw@ltdaw"},{"name":"db_password","value":"postgres"}]}'

As said, provisioning is easy. Now lets see how the instance got created. Using the IP Address that was assigned, the postgres user and the public ssh key we provided we can connect to the VM:

ssh postgres@10.38.11.40
The authenticity of host '10.38.11.40 (10.38.11.40)' can't be established.
ECDSA key fingerprint is SHA256:rN4QzdL2y55Lv8oALnW6pBQzBKOInP+X4obiJrqvK8o.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.38.11.40' (ECDSA) to the list of known hosts.
Last login: Fri Jan 17 18:48:26 2020 from 10.38.11.9

The first thing I would try is to connect to PostgreSQL:

-bash-4.2$ psql postgres
psql (11.6 dbi services build)
Type "help" for help.

postgres=# select version();
                                                          version
----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)

postgres=#

Ok, that works. What I did before I created the Software profile in the last post: I installed our DMK and that gets started automatically as the required stuff is added to “.bash_profile” (you can see that in the last post because PS1 is not the default). But here I did not get our environment because Era has overwritten “.bash_profile”:

-bash-4.2$ cat .bash_profile
export PATH=/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/u01/app/postgres/product/11/db_6/bin:/u01/app/postgres/product/11/db_6/bin
export PGDATA=/pg_data_544474d1_822c_4912_8db4_7c1c7f45df54
export PGDATABASE=postgres
export PGUSER=postgres
export PGPORT=5432
export ANSIBLE_LIBRARY=/opt/era_base/era_engine/drivers/nutanix_era/era_drivers/AnsibleModules/bin

Would be great if the original “.bash_profile” would still be there so custom adjustment are not lost.

Looking at mountpoinzs:

-bash-4.2$ df -h
Filesystem                                                                                                                          Size  Used Avail Use% Mounted on
devtmpfs                                                                                                                            7.8G     0  7.8G   0% /dev
tmpfs                                                                                                                               7.8G  8.0K  7.8G   1% /dev/shm
tmpfs                                                                                                                               7.8G  9.7M  7.8G   1% /run
tmpfs                                                                                                                               7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/mapper/centos_centos7postgres12-root                                                                                            26G  2.4G   24G   9% /
/dev/sda1                                                                                                                          1014M  149M  866M  15% /boot
/dev/sdb                                                                                                                             27G   74M   26G   1% /u01/app/postgres/product/11/db_6
tmpfs                                                                                                                               1.6G     0  1.6G   0% /run/user/1000
/dev/mapper/VG_PG_DATA_544474d1_822c_4912_8db4_7c1c7f45df54-LV_PG_DATA_544474d1_822c_4912_8db4_7c1c7f45df54                          50G  108M   47G   1% /pg_data_544474d1_822c_4912_8db4_7c1c7f45df54
/dev/mapper/VG_PG_DATA_TBLSPC_544474d1_822c_4912_8db4_7c1c7f45df54-LV_PG_DATA_TBLSPC_544474d1_822c_4912_8db4_7c1c7f45df54           9.8G   44M  9.2G   1% /pg_dbidb1_544474d1_822c_4912_8db4_7c1c7f45df54
/dev/mapper/ntnx_era_agent_vg_1700f784395111eab53f506b8d241a39-ntnx_era_agent_lv_era_software_1700f784395111eab53f506b8d241a39      1.5G  534M  821M  40% /opt/era_base/era_engine
/dev/mapper/ntnx_era_agent_vg_1700f784395111eab53f506b8d241a39-ntnx_era_agent_lv_era_logs_1700f784395111eab53f506b8d241a39          3.9G   18M  3.6G   1% /opt/era_base/logs
/dev/mapper/ntnx_era_agent_vg_1700f784395111eab53f506b8d241a39-ntnx_era_agent_lv_era_config_1700f784395111eab53f506b8d241a39         47M  1.1M   42M   3% /opt/era_base/cfg
/dev/mapper/ntnx_era_agent_vg_5c3a7120261b4354a3b38ee168726298-ntnx_era_agent_lv_db_stagging_logs_5c3a7120261b4354a3b38ee168726298   99G   93M   94G   1% /opt/era_base/db_logs

The PostgreSQL binaries got their own mountpoint. PGDATA is on its own mountpoint and it seems we have a tablespace in “/pg_dbidb1_544474d1_822c_4912_8db4_7c1c7f45df54”:

-bash-4.2$ psql -c "\db" postgres
                            List of tablespaces
     Name      |  Owner   |                    Location
---------------+----------+-------------------------------------------------
 pg_default    | postgres |
 pg_global     | postgres |
 tblspc_dbidb1 | postgres | /pg_dbidb1_544474d1_822c_4912_8db4_7c1c7f45df54
(3 rows)

-bash-4.2$ psql -c "\l+" postgres
                                                                     List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   |  Tablespace   |                Description
-----------+----------+----------+-------------+-------------+-----------------------+---------+---------------+--------------------------------------------
 dbidb1    | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 7717 kB | tblspc_dbidb1 |
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 7717 kB | pg_default    | default administrative connection database
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7577 kB | pg_default    | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |               |
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7577 kB | pg_default    | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |               |
(4 rows)

This confirms that our database “dbidb1” was placed into a separate tablespace. There is another mountpoint which seems to be there for the archived WAL files, and we can confirm that as well:

-bash-4.2$ psql
psql (11.6 dbi services build)
Type "help" for help.

postgres=# show archive_command ;
                      archive_command
-----------------------------------------------------------
  sh /opt/era_base/cfg/postgres/archive_command.sh %p  %f
(1 row)

postgres=# \! cat /opt/era_base/cfg/postgres/archive_command.sh
test ! -f /opt/era_base/db_logs/dbidb1/$2 &&  cp -p $1 /opt/era_base/db_logs/dbidb1//$2
postgres=#

So archiving is enabled and this is what I expected. The costing parameters seem to be the default:

postgres=# select name,setting from pg_settings where name like '%cost%';
             name             | setting
------------------------------+---------
 autovacuum_vacuum_cost_delay | 20
 autovacuum_vacuum_cost_limit | -1
 cpu_index_tuple_cost         | 0.005
 cpu_operator_cost            | 0.0025
 cpu_tuple_cost               | 0.01
 jit_above_cost               | 100000
 jit_inline_above_cost        | 500000
 jit_optimize_above_cost      | 500000
 parallel_setup_cost          | 1000
 parallel_tuple_cost          | 0.1
 random_page_cost             | 4
 seq_page_cost                | 1
 vacuum_cost_delay            | 0
 vacuum_cost_limit            | 200
 vacuum_cost_page_dirty       | 20
 vacuum_cost_page_hit         | 1
 vacuum_cost_page_miss        | 10
(17 rows)

Memory parameters seem to be the default as well:

postgres=# show shared_buffers ;
 shared_buffers
----------------
 128MB
(1 row)

postgres=# show work_mem;
 work_mem
----------
 4MB
(1 row)

postgres=# show maintenance_work_mem ;
 maintenance_work_mem
----------------------
 64MB
(1 row)

postgres=# show effective_cache_size ;
 effective_cache_size
----------------------
 4GB
(1 row)

There are “Parameter profiles” you can use and create but currently there are not that many parameters that can be adjusted:

That’s it for now. Deploying new PostgreSQL instances is really easy. I am currently not sure why there needs to be a tablespace but maybe that becomes clear in a future post when we’ll look at backup and restore. From a PostgreSQL configuration perspective you for sure need to do some adjustments when the databases become bigger and load increases.

Cet article Deploying your own PostgreSQL image on Nutanix Era – 2 – Deploying a new PostgreSQL VM est apparu en premier sur Blog dbi services.

Dbvisit 9: Adding datafiles and or tempfiles

Fri, 2020-01-17 11:57

One question I was asking is if the standby_file_management parameter is relevant in a Dbvisit environment with Oracle Standard Edition. I did some tests and I show here what I did.
We suppose that the Dbvisit is already set and that the replication is fine

[oracle@dbvisit1 trace]$ /u01/app/dbvisit/standby/dbvctl -d dbstd -i
=============================================================
Dbvisit Standby Database Technology (9.0.08_0_g99a272b) (pid 19567)
dbvctl started on dbvisit1: Fri Jan 17 16:48:16 2020
=============================================================

Dbvisit Standby log gap report for dbstd at 202001171648:
-------------------------------------------------------------
Description       | SCN          | Timestamp
-------------------------------------------------------------
Source              2041731        2020-01-17:16:48:18 +01:00
Destination         2041718        2020-01-17:16:48:01 +01:00

Standby database time lag (DAYS-HH:MI:SS): +00:00:17

Report for Thread 1
-------------------
SOURCE
Current Sequence 8
Last Archived Sequence 7
Last Transferred Sequence 7
Last Transferred Timestamp 2020-01-17 16:48:07

DESTINATION
Recovery Sequence 8

Transfer Log Gap 0
Apply Log Gap 0

=============================================================
dbvctl ended on dbvisit1: Fri Jan 17 16:48:23 2020
=============================================================

[oracle@dbvisit1 trace]$

While the standby_file_management is set to MANUAL on both servers

[oracle@dbvisit1 trace]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Jan 17 16:50:50 2020
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

SQL> show parameter standby_file_management;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
standby_file_management              string      MANUAL
SQL>


[oracle@dbvisit2 back_dbvisit]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Jan 17 16:51:15 2020
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

SQL>  show parameter standby_file_management;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
standby_file_management              string      MANUAL
SQL>

Let’s create a tablespace MYTAB on the primary database

SQL> create tablespace mytab datafile '/u01/app/oracle/oradata/DBSTD/mytab01.dbf' size 10M;

Tablespace created.

SQL>

SQL> alter system switch logfile;

System altered.

SQL> select name from v$tablespace;

NAME
------------------------------
SYSAUX
SYSTEM
UNDOTBS1
USERS
TEMP
MYTAB

6 rows selected.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/system01.dbf
/u01/app/oracle/oradata/DBSTD/sysaux01.dbf
/u01/app/oracle/oradata/DBSTD/undotbs01.dbf
/u01/app/oracle/oradata/DBSTD/mytab01.dbf
/u01/app/oracle/oradata/DBSTD/users01.dbf

SQL>

A few moment we can see that the new datafile is replicated on the standby

SQL> select name from v$tablespace;

NAME
------------------------------
SYSAUX
SYSTEM
UNDOTBS1
USERS
TEMP
MYTAB

6 rows selected.

SQL>  select name from v$datafile
  2  ;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/system01.dbf
/u01/app/oracle/oradata/DBSTD/sysaux01.dbf
/u01/app/oracle/oradata/DBSTD/undotbs01.dbf
/u01/app/oracle/oradata/DBSTD/mytab01.dbf
/u01/app/oracle/oradata/DBSTD/users01.dbf

SQL>

Now let’s repeat the tablespace creation while the parameter is set to AUTO on both side

SQL> show parameter standby_file_management;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
standby_file_management              string      AUTO
SQL>  create tablespace mytab2 datafile '/u01/app/oracle/oradata/DBSTD/mytab201.dbf' size 10M;

Tablespace created.

SQL>

SQL> alter system switch logfile;

System altered.

SQL>

A few moment later the tablespace mytab2 was also replicated on the standby

SQL> select name from v$tablespace;

NAME
------------------------------
SYSAUX
SYSTEM
UNDOTBS1
USERS
TEMP
MYTAB
MYTAB2

7 rows selected.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/system01.dbf
/u01/app/oracle/oradata/DBSTD/mytab201.dbf
/u01/app/oracle/oradata/DBSTD/sysaux01.dbf
/u01/app/oracle/oradata/DBSTD/undotbs01.dbf
/u01/app/oracle/oradata/DBSTD/mytab01.dbf
/u01/app/oracle/oradata/DBSTD/users01.dbf

6 rows selected.

In Dbvisit documentation we can find this
Note 2: This feature is independent of the Oracle parameter STANDBY_FILE_MANAGEMENT. Dbvisit Standby will detect if STANDBY_FILE_MANAGEMENT has added the datafile to the standby database, and if so, Dbvisit Standby will not add the datafile.
Note 3: STANDBY_FILE_MANAGEMENT can only be used in Enterprise Edition and should not be set in Standard Edition.

Dbvisit does not use STANDBY_FILE_MANAGEMENT for datafile replication. So I decide to set this value to its default value which is MANUAL.

What about adding tempfile in a dbvisit environment. In the primary I create a new temporary tablespace

SQL> show parameter standby_file_management;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
standby_file_management              string      MANUAL
SQL> create temporary tablespace temp2 tempfile '/u01/app/oracle/oradata/DBSTD/temp2_01.dbf' size 10M;

Tablespace created.

SQL> alter system switch logfile;

System altered.

SQL>

We can see on the primary that we now have two tempfiles.

SQL> select name from v$tablespace;

NAME
------------------------------
SYSAUX
SYSTEM
UNDOTBS1
USERS
TEMP
MYTAB
MYTAB2
TEMP2

8 rows selected.

SQL>  select name from v$tempfile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/temp01.dbf
/u01/app/oracle/oradata/DBSTD/temp2_01.dbf

SQL>

On standby side, the new temporary tablespace was replicated.

SQL> select name from v$tablespace;

NAME
------------------------------
SYSAUX
SYSTEM
UNDOTBS1
USERS
TEMP
MYTAB
MYTAB2
TEMP2

8 rows selected.

But the new tempfile is not listed on the standby

SQL>  select name from v$tempfile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/temp01.dbf

SQL>

In fact it’s the expected behavior. In the documentation we can find following
If your preference is to have exactly the same number of temp files referenced in the standby control file as your current primary database, then once a new temp file has been added on the primary, you need to recreate a standby control file by running the following command from the primary server:
dbvctl -f create_standby_ctl -d DDC

So let’s recreate the standby control file

[oracle@dbvisit1 ~]$ /u01/app/dbvisit/standby/dbvctl -f create_standby_ctl -d dbstd
=>Replace current standby controfiles on dbvisit2 with new standby control
file?  [No]: yes

>>> Create standby control file... done

>>> Copy standby control file to dbvisit2... done

>>> Recreate standby control file... done

>>> Standby controfile(s) on dbvisit2 recreated. To complete please run dbvctl on the
    primary, then on the standby.
[oracle@dbvisit1 ~]$

And then after we can verify that the new tempfile is now visible at standby side

SQL> select name from v$tempfile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/temp01.dbf
/u01/app/oracle/oradata/DBSTD/temp2_01.dbf

SQL>

Cet article Dbvisit 9: Adding datafiles and or tempfiles est apparu en premier sur Blog dbi services.

Deploying your own PostgreSQL image on Nutanix Era

Fri, 2020-01-17 11:03

Some days ago we had a very good training on Nutanix. Nutanix is a Hyper-converged infrastructure and that means that all is software driven and the system can be deployed on many hardware configurations. I will not go into the details of the system itself but rather look at one component/module which is called Era. Era promises to simplify database deployments by providing a clean and simple user interface (and an API) that provides deployment procedures for PostgreSQL, MS SQL, MySQL, MariaDB and Oracle. There are predefined templates you can use but in this post I’ll look at how you can use Era to deploy your own PostgreSQL image.

Before you can register a software profile with Era there needs to be a VM up and running which already has PostgreSQL installed. For that I’ll import the latest CentOS 7 ISO with Prism (CentOS 8 is not yet supported).

Importing images is done in the “Images Configuration” section under “Settings” of Prism:


Once you start the upload a new task is generated which can be monitored in the tasks section:

Now that the image is ready we need to deploy a new virtual machine which will use the image as installation source:








As the virtual machine is now defined we need to power it on and then launch the console:


Follow your preferred way of doing the CentOS installation and once it is done you need to power off the virtual machine for removing the ISO. Otherwise you will always land in the installation procedure when the virtual machine is started:


After you powered of the virtual machine again you should be able to connect with ssh:

The next step is to install PostgreSQL as you prefer to do it. Here is an example for doing it from source code. We will not create a PostgreSQL instance, the binaries are enough. In my case everything was installed here:

 postgres@centos7postgres12:/home/postgres/ [pg121] echo $PGHOME
/u01/app/postgres/product/12/db_1/
postgres@centos7postgres12:/home/postgres/ [pg121] ls $PGHOME
bin  include  lib  share

Now that we have out PostgreSQL server we need to register the server in Era. Before doing that you should download and execute the pre-check script on the new database server:

postgres@centos7postgres12:/home/postgres/ [pg121] sudo ./era_linux_prechecks.sh

----------------------------------------------------------------------------------
   Error: Database type not specified
   Syntax: $ ./era_linux_prechecks.sh -t|--database_type  [-c|--cluster_ip ] [-p|--cluster_port] [-d|--detailed]
   Database type can be: oracle_database, postgres_database, mariadb_database, mysql_database
----------------------------------------------------------------------------------

postgres@centos7postgres12:/home/postgres/ [pg121] sudo ./era_linux_prechecks.sh -t postgres_database
libselinux-python-2.5-14.1.el7.x86_64


--------------------------------------------------------------------
|              Era Pre-requirements Validation Report              |
--------------------------------------------------------------------

     General Checks:
     ---------------
         1] Username           : root
         2] Package manager    : yum
         2] Database type      : postgres_database

     Era Configuration Dependencies:
     -------------------------------
         1] User has sudo access                         : YES
         2] User has sudo with NOPASS access             : YES
         3] Crontab configured for user                  : YES
         4] Secure paths configured in /etc/sudoers file : YES

     Era Software Dependencies:
     --------------------------
          1] GCC                  : N/A
          2] readline             : YES
          3] libselinux-python    : YES
          4] crontab              : YES
          5] lvcreate             : YES
          6] lvscan               : YES
          7] lvdisplay            : YES
          8] vgcreate             : YES
          9] vgscan               : YES
         10] vgdisplay            : YES
         11] pvcreate             : YES
         12] pvscan               : YES
         13] pvdisplay            : YES
         14] zip                  : NO
         15] unzip                : YES
         16] rsync                : NO

     Summary:
     --------
         This machine does not satisfy all of the dependencies required by Era.
         It can not be onboarded to Era unless all of these are satified.

     **WARNING: Cluster API was not provided. Couldn't go ahead with the Prism API connectivity check.
     Please ensure Prism APIs are callable from the host.
====================================================================
1postgres@centos7postgres12:/home/postgres/ [pg121]

In my case only “zip” and “rsync” are missing which of course is easy to fix:

postgres@centos7postgres12:/home/postgres/ [pg121] sudo yum install -y zip rsync
...
postgres@centos7postgres12:/home/postgres/ [pg121] sudo ./era_linux_prechecks.sh -t postgres_database
libselinux-python-2.5-14.1.el7.x86_64


--------------------------------------------------------------------
|              Era Pre-requirements Validation Report              |
--------------------------------------------------------------------

     General Checks:
     ---------------
         1] Username           : root
         2] Package manager    : yum
         2] Database type      : postgres_database

     Era Configuration Dependencies:
     -------------------------------
         1] User has sudo access                         : YES
         2] User has sudo with NOPASS access             : YES
         3] Crontab configured for user                  : YES
         4] Secure paths configured in /etc/sudoers file : YES

     Era Software Dependencies:
     --------------------------
          1] GCC                  : N/A
          2] readline             : YES
          3] libselinux-python    : YES
          4] crontab              : YES
          5] lvcreate             : YES
          6] lvscan               : YES
          7] lvdisplay            : YES
          8] vgcreate             : YES
          9] vgscan               : YES
         10] vgdisplay            : YES
         11] pvcreate             : YES
         12] pvscan               : YES
         13] pvdisplay            : YES
         14] zip                  : YES
         15] unzip                : YES
         16] rsync                : YES

     Summary:
     --------
         This machine satisfies dependencies required by Era, it can be onboarded.

Looks good and the database server can now be registered:




Era as well has a task list which can be monitored:

… and then it fails because PostgreSQL 12.1 is not supported. That is fine but I would have expected the pre-check script to tell me that. Same procedure again, this time with PostgreSQL 11.6 and that succeeds:

This database server is now the source for a new “Software profile”:




And that’s it: Our new PostgreSQL software profile is ready to use. In the next post we’ll try to deploy a new virtual machine from that profile.

Cet article Deploying your own PostgreSQL image on Nutanix Era est apparu en premier sur Blog dbi services.

Dbvisit Standby 9 : Do you know the new snapshot feature?

Thu, 2020-01-16 10:23

Dbvisit snapshot option is a new feature available starting from version 9.0.06. I have tested this option and in this blog I am describing the tasks I have done.
The configuration I am using is following
dbvist1 : primary server
dbvist 2 : standby server
orcl : oracle 19c database
We suppose that the dbvisit environment is already set and the replication is going fine. See previous blog for setting up dbvisit standby
First there are two options
-Snapshot Groups
-Single Snapshots

Snapshot Groups
This option is ideal for companies that are using Oracle Standard Edition (SE,SE1,SE2) where they would like to have a “database that is read-only, but also being kept up to date” . It allows to create 2 or more read-only snapshots of the standby at a time interval. The snapshot will be opened in a read-only mode after its creation. A new service which will point to the latest snapshot is created and will be automatically managed by the listener.
The use of the feature is very easy. As it required managing filesystems like mount and umount, the user which will create the snapshot need some admin privileges.
In our case we have configured the user with full sudo privileges, but you can find all required privileges in the documentation .

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] sudo grep oracle /etc/sudoers
oracle ALL=(ALL) NOPASSWD:ALL
oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)]

The use of the option can be done by command line or using the graphical dbvserver console. But It is highly recommended to use the graphical tool.
When the option snapshot is available with your license, you will see following tab in the dbvserver console

To create a snapshot groups we just have to choose the option NEW SNAPSHOT GROUP

And then fill the required information. In this example
-We will create a service named snap_service
-We will generate a snapshot every 15 minutes (900 secondes)
There will be a maximum of 2 snapshots. When the limit is reached, the oldest snapshop is deleted. Just note that the latest snapshot will be deleted only when the new snapshot is successfully created. That’s why we can have sometimes 3 snapshots
-The snapshots will be prefixed by MySna

After the snapshots are created without errors, we have to start the snapshots generation by clicking on the green button

And then we can see the status

On OS level we can verify that the first snapshot is created and that the corresponding instance started

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] ps -ef | grep pmon
oracle    2300     1  0 09:05 ?        00:00:00 ora_pmon_orcl
oracle    6765     1  0 09:57 ?        00:00:00 ora_pmon_MySna001
oracle    7794  1892  0 10:00 pts/0    00:00:00 grep --color=auto pmon
oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)]

We can also verify that the service snap_service is registered in the listener. This service will be automatically pointed to the latest snapshot of the group.

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] lsnrctl status

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 16-JAN-2020 10:01:49

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbvisit2)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                16-JAN-2020 09:05:07
Uptime                    0 days 0 hr. 56 min. 42 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/dbvisit2/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbvisit2)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
…
…
…
Service "snap_service" has 1 instance(s).
  Instance "MySna001", status READY, has 1 handler(s) for this service...
The command completed successfully
oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)]

To connect to this service, we just have to create an alias like

snapgroup =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = dbvisit2)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = snap_service)
    )
  )

15 minutes later we can see that a new snapshot was generated

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] ps -ef | grep pmon
oracle    2300     1  0 09:05 ?        00:00:00 ora_pmon_orcl
oracle    6765     1  0 09:57 ?        00:00:00 ora_pmon_MySna001
oracle   11355     1  0 10:11 ?        00:00:00 ora_pmon_MySna002
oracle   11866  1892  0 10:13 pts/0    00:00:00 grep --color=auto pmon
oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)]

Note that we can only open the snapshot in a read only mode

oracle@dbvisit1:/home/oracle/ [orcl (CDB$ROOT)] sqlplus sys/*****@snapgroup as sysdba

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED
         4 PDB2                           MOUNTED
SQL> alter pluggable database all open;
alter pluggable database all open
*
ERROR at line 1:
ORA-65054: Cannot open a pluggable database in the desired mode.


SQL>  alter pluggable database all open read only;

Pluggable database altered.

Of course you have probably seen that you can stop, start, pause and remove snapshots from the GUI.

Single Snapshot
This option allows to create read-only as well read-write snapshots for the database.
To create a single snapshot , we just have to choose the NEW SINGLE SNAPSHOT
In the following example the snapshot will be opened in a read write mode

At this end of the creation we can see the status

We can verify that a service SingleSn was also created

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] lsnrctl status | grep Singl
  Instance "SingleSn", status READY, has 1 handler(s) for this service...
  Instance "SingleSn", status READY, has 1 handler(s) for this service...
Service "SingleSn" has 1 instance(s).
  Instance "SingleSn", status READY, has 1 handler(s) for this service...
  Instance "SingleSn", status READY, has 1 handler(s) for this service...
  Instance "SingleSn", status READY, has 1 handler(s) for this service...
  Instance "SingleSn", status READY, has 1 handler(s) for this service

And that the instance SinglSn is started

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] ps -ef | grep pmon
oracle    3294     1  0 16:04 ?        00:00:00 ora_pmon_SingleSn
oracle    3966  1748  0 16:08 pts/0    00:00:00 grep --color=auto pmon
oracle   14349     1  0 13:57 ?        00:00:00 ora_pmon_orcl
oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)]

We can also remark that at OS level, a filesystem is mounted for the snap. So there must be enough free space at the LVM that host the database.

oracle@dbvisit2:/home/oracle/ [MySna004 (CDB$ROOT)] df -h | grep snap
/dev/mapper/ora_data-SingleSn   25G   18G  6.3G  74% /u01/app/dbvisit/standby/snap/orcl/SingleSn
oracle@dbvisit2:/home/oracle/ [MySna004 (CDB$ROOT)]

Using the alias

singlesnap =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = dbvisit2)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = SingleSn)
    )
  )

We can see that new snapshot is opened in read write mode

SQL> select name,open_mode from v$database;

NAME      OPEN_MODE
--------- --------------------
SINGLESN  READ WRITE

SQL>
Conclusion

What we will retain is that a snapshot groups is a group of snapshots taken at a regular time interval. These kinds of snapshots can only be opened in a read only mode.
The single snapshot can be created in a read only or read write mode. When opened in read write mode, it can be compared maybe to Oracle snapshot standby.
The option dbvisit snapshot required a license and is only supported on linux.

Cet article Dbvisit Standby 9 : Do you know the new snapshot feature? est apparu en premier sur Blog dbi services.

ARRAYSIZE or ROWPREFETCH in sqlplus?

Wed, 2020-01-08 12:15
ARRAYSIZE or ROWPREFETCH in sqlplus?

What is the difference between the well known sqlplus-setting arraysize and the new sqlplus-12.2.-feature rowprefetch? In Blog
https://blog.dbi-services.com/oracle-fasttrue-in-sqlplus-some-thoughts-about-rowprefetch/ I showed a case, which helps to reduce the logical IOs when using rowprefetch.

Here the definition of arraysize and rowprefetch according the documentation:

arraysize:

SET System Variable Summary: Sets the number of rows, called a batch, that SQL*Plus will fetch from the database at one time. Valid values are 1 to 5000. A large value increases the efficiency of queries and subqueries that fetch many rows, but requires more memory. Values over approximately 100 provide little added performance. ARRAYSIZE has no effect on the results of SQL*Plus operations other than increasing efficiency

About SQL*Plus Script Tuning: The effectiveness of setting ARRAYSIZE depends on how well Oracle Database fills network packets and your network latency and throughput. In recent versions of SQL*Plus and Oracle Database, ARRAYSIZE may have little effect. Overlarge sizes can easily take more SQL*Plus memory which may decrease overall performance.

REMARK: The arraysize setting also has an impact on the COPY-command with the COPYCOMMIT-setting (commits every n arraysize batches of records).

rowprefetch:

SET System Variable Summary: Sets the number of rows that SQL*Plus will prefetch from the database at one time. The default value is 1 (max is 32767).
Note: The amount of data contained in the prefetched rows should not exceed the maximum value of 2147483648 bytes (2 Gigabytes). The setting in the oraaccess.xml file can override the SET ROWPREFETCH setting in SQL*Plus.

Differences between ARRAYSIZE and ROWPREFETCH

When doing my tests one of the important differences between ARRAYSIZE and ROWPREFETCH is that ROWPREFETCH allows Oracle to transfer query results on return from its internal OCI execute call. I.e. in a 10046-trace the first FETCH is showing ROWPREFETCH rows fetched regardless of the ARRAYSIZE setting. E.g. with the default setting of ROWPREFETCH 1, ARRAYSIZE 15 I can see the following number of rows fetched (see the r= in the trace):

FETCH #139623638001936:c=448,e=1120,p=0,cr=8,cu=0,mis=0,r=1,dep=0,og=1,plh=3403427028,tim=110487525476
...
FETCH #139623638001936:c=66,e=66,p=0,cr=1,cu=0,mis=0,r=15,dep=0,og=1,plh=3403427028,tim=110487525830
...
FETCH #139623638001936:c=15,e=15,p=0,cr=1,cu=0,mis=0,r=15,dep=0,og=1,plh=3403427028,tim=110487526093
...

I.e. 1, 15, 15,…

With ROWPREFETCH 3, ARRAYSIZE 15 the rows fetched are 3, 15, 15, …

The following table shows the number of rows fetched with different settings of ROWPREFETCH and ARRAYSIZE from a query, which returns 70 rows:


ROWPREFETCH ARRAYSIZE ROWS_FETCH1 ROWS_FETCH2 ROWS_FETCH3 ROWS_FETCH4 ROWS_FETCH5 ROWS_FETCH6 ROWS_FETCH7 ROWS_FETCH8 
 1          15         1          15          15          15          15          9
 2          15         2          15          15          15          15          8
20          15        20          30          20
16          15        16          30          24
 6           5         6          10          10          10          10          10          10          4
 9           5         9          10          10          10          10          10          10          1
10          10        10          20          20          20          0
10           5        10          15          15          15          15          0
16           3        16          18          18          18          0

We can see 3 things here:
- The first FETCH (from the internal OCI execute) contains always the number of rows as defined in the ROWPREFETCH setting
- The second FETCH (and all subsequent fetches) contains a multiple of the ARRAYSIZE setting rows. The following code fragment should show the logic:

2nd_Fetch_Rows = if ROWPREFETCH < ARRAYSIZE 
                 then ARRAYSIZE 
                 else (TRUNC(ROWPREFETCH/ARRAYSIZE)+1)*ARRAYSIZE


- If a fetch does not detect the end of the data in the cursor then an additional fetch is necessary. In 3 cases above a last fetch fetched 0 rows.

Memory required by the client

With the Linux pmap command I checked how much memory the client requires for different ROWPREFETCH and ARRAYSIZE settings.

Testcase:


SQL> connect cbleile/cbleile@orclpdb1
Connected.
SQL> create table big_type (a varchar2(2000), b varchar2(2000), c varchar2(2000), d varchar2(2000), e varchar2(2000));
 
Table created.
 
SQL> insert into big_type select 
  2  rpad('X',2000,'Y'),
  3  rpad('X',2000,'Y'),
  4  rpad('X',2000,'Y'),
  5  rpad('X',2000,'Y'),
  6  rpad('X',2000,'Y') from xmltable('1 to 4100');
 
4100 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_stats.gather_table_stats(user,'BIG_TYPE');
SQL> select avg_row_len from tabs where table_name='BIG_TYPE';
 
AVG_ROW_LEN
-----------
      10005

Before the test:


oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] ps -ef | grep sqlplus
oracle    31537  31636  3 17:49 pts/2    00:01:20 sqlplus   as sysdba
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 31537 | grep anon
0000000000be6000   1580K rw---   [ anon ]
00007f7efcda6000    516K rw---   [ anon ]
...
 
SQL> show rowprefetch arraysize
rowprefetch 1
arraysize 15
SQL> set arraysize 1000 pages 2 pause on lines 20000
SQL> select * from big_type;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 31537 | grep anon
0000000000be6000   1580K rw---   [ anon ]
00007f7efc40f000  10336K rw---   [ anon ]
...
 
Ctrl-C <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 31537 | grep anon
0000000000be6000   1580K rw---   [ anon ]
00007f7efcda6000    516K rw---   [ anon ]
 
SQL> set arraysize 1
SQL> set rowprefetch 1000
SQL> select * from big_type;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 31537 | grep anon
0000000000be6000  12664K rw---   [ anon ]
00007f7efcda6000    516K rw---   [ anon ]
 
Ctrl-C <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 31537 | grep anon
0000000000be6000  12664K rw---   [ anon ]
00007f7efcda6000    516K rw---   [ anon ]
 
SQL> set rowprefetc 1
SQL> set arraysize 1000
SQL> select * from big_type;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 31537 | grep anon
0000000000be6000  22472K rw---   [ anon ]
00007f7efcda6000    516K rw---   [ anon ]
 
Ctrl-C <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 31537 | grep anon
0000000000be6000  12660K rw---   [ anon ]
00007f7efcda6000    516K rw---   [ anon ]
 
SQL> set rowprefetch 501 arraysize 500 pages 502
SQL> select * from big_type;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 31537 | grep anon
0000000000be6000  17568K rw---   [ anon ]
00007f7efcda6000    516K rw---   [ anon ]

New table with just 1 Byte per column:


SQL> create table big_type_small_data as select * from big_type where 1=2;
 
Table created.
 
SQL> insert into  big_type_small_data select 'X','X','X','X','X' from big_type;
 
4100 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_stats.gather_table_stats(user,'BIG_TYPE_SMALL_DATA');
 
PL/SQL procedure successfully completed.
 
SQL> select avg_row_len from tabs where table_name='BIG_TYPE_SMALL_DATA';
 
AVG_ROW_LEN
-----------
	 10
 
Client-Memory before the test:
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 14193 | grep anon
00000000014d1000   1580K rw---   [ anon ]
00007f3c3b8d2000    516K rw---   [ anon ]
 
SQL> show rowprefetch
rowprefetch 1
SQL> show array
arraysize 15
SQL> set arraysize 1000 rowprefetch 1 pages 2 pause on lines 20000
SQL> select * from big_type_small_table;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 14193 | grep anon
00000000014d1000   1580K rw---   [ anon ]
00007f3c3af3b000  10336K rw---   [ anon ]
 
--> 9.6MB allocated. 
 
Ctrl-C <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 14193 | grep anon
00000000014d1000   1580K rw---   [ anon ]
00007f3c3b8d2000    516K rw---   [ anon ]
 
--> All memory released.
 
SQL> set arraysize 1 rowprefetch 1000
SQL> select * from big_type_snall_data;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 14193 | grep anon
00000000014d1000   1852K rw---   [ anon ]
00007f3c3b8d2000    516K rw---   [ anon ]
 
--> Only 272K allocated.
 
Ctrl-C <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 15227 | grep anon
0000000001fba000   1852K rw---   [ anon ]
00007f38c5ef9000    516K rw---   [ anon ]
 
--> Memory not released.
 
Back to previous setting:
SQL> set arraysize 1000 rowprefetch 1
SQL> select * from big_type_small_data;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 15227 | grep anon
0000000001fba000  11644K rw---   [ anon ]
00007f38c5ef9000    516K rw---   [ anon ]
 
--> 9.6MB addtl memory allocated.
 
Ctrl-C <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 15227 | grep anon
0000000001fba000   1832K rw---   [ anon ]
00007f38c5ef9000    516K rw---   [ anon ]
 
--> Memory released, but not to the initial value. I.e. it seems the memory for the rowprefetch is still allocated.
 
Back to the settings with rowprefetch:
 
SQL> set arraysize 1 rowprefetch 1000
SQL> select * from big_type_small_data;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 15227 | grep anon
0000000001fba000   1832K rw---   [ anon ]
00007f38c5ef9000    516K rw---   [ anon ]
 
--> It obviously reused the previous memory.
 
SQL> set arraysize 500 rowprefetch 501 pages 503
SQL> select * from big_type_small_data;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 15227 | grep anon
0000000001fba000   6752K rw---   [ anon ]
00007f38c5ef9000    516K rw---   [ anon ]
 
--> Relogin
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 16334 | grep anon
00000000010b0000   1580K rw---   [ anon ]
00007ff8cc272000    516K rw---   [ anon ]
 
SQL> set arraysize 500 rowprefetch 501 pages 503 pause on lines 20000
SQL> select * from big_type_small_data;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 16334 | grep anon
00000000010b0000   1720K rw---   [ anon ]
00007ff8cbda4000   5436K rw---   [ anon ]
 
Ctrl-C <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 16334 | grep anon
00000000010b0000   1720K rw---   [ anon ]
00007ff8cc272000    516K rw---   [ anon ]
 
SQL> select * from big_type_small_data;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 16334 | grep anon
00000000010b0000   6608K rw---   [ anon ]
00007ff8cc272000    516K rw---   [ anon ]
 
Ctrl-C <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 16334 | grep anon
00000000010b0000   6608K rw---   [ anon ]
00007ff8cc272000    516K rw---   [ anon ]
 
--> This time the memory for the arraysize has not been released.
 
--> Relogin
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 17005 | grep anon
0000000001c40000   1580K rw---   [ anon ]
00007f90ea747000    516K rw---   [ anon ]
 
SQL> set arraysize 1 rowprefetch 32767 pages 3 pause on lines 20000
SQL> select * from big_type_small_data;
 
--> 2 times <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 17005 | grep anon
0000000001c40000   8312K rw---   [ anon ]
00007f90ea6a2000   1176K rw---   [ anon ]
 
Ctrl-C <RETURN>
 
oracle@oracle-19c-vagrant:/home/oracle/Blogs/rowprefetch_test/ [ORCLCDB (CDB$ROOT)] pmap 17005 | grep anon
0000000001c40000   8308K rw---   [ anon ]
00007f90ea6a2000   1176K rw---   [ anon ]
 
--> almost nothing released.

So the tests showed that ARRAYSIZE allocates more memory than ROWPREFETCH (i.e. it allocates according the data-type-size and not according the real data in the column), but in contrast to ROWPREFETCH memory is (often) released with ARRAYSIZE once the SQL finished fetching.

Summary

So when should ROWPREFETCH and ARRAYSIZE be used? As with all fetch-size-settings (e.g. for the JDBC-driver), both can be used to reduce the number of network roundtrips and logical IOs on the DB when lots of data has to be transferred between the server and the client. According my tests ROWPREFETCH requires less memory on the client, but does not release the memory after the query has finished. ARRAYSIZE requires more memory, but often releases memory when the query has finished. ROWPREFETCH = 2 is very useful in case only 1 row is returned by a query, because it returns the row with the internal OCI execute call (first fetch in the 10046 trace) and does not require a subsequent fetch to realize that all data has been fetched already. I.e. it saves 1 network roundtrip.

A good compromise is the use of

ROWPREFETCH = 2
ARRAYSIZE = 100

That setting is actually also used when starting sqlplus with -F(AST). If lots of data has to be transferred to the client then higher ROWPREFETCH or ARRAYSIZE settings can be used to reduce the number logical IOs and network roundtrips. But the best setting also depends on the data to transfer per row and client memory requirements may vary with higher ROWPREFETCH or ARRAYSIZE settings if sqlplus runs a batch-job with many queries or only a few queries. As usual, the best setting when transferring lots of data through sqlplus has to be found by testing the queries and scripts of your environment with different settings.

Cet article ARRAYSIZE or ROWPREFETCH in sqlplus? est apparu en premier sur Blog dbi services.

Oracle FAST=TRUE in sqlplus? Some thoughts about rowprefetch

Sun, 2020-01-05 16:56

During my time as a Consultant working on Tuning Tasks I had the feeling that many people think that there is an Oracle-parameter “FAST=TRUE” to speed up the performance and throughput of the database calls. Unfortunately such a parameter is not available, but since version 12cR2 Oracle provided the option “-F” or “-FAST” for sqlplus, which looks like a “FAST=TRUE”-setting. Here an excerpt from the documentation:


The FAST option improves general performance. This command line option changes the values of the following default settings:
 
- ARRAYSIZE = 100
- LOBPREFETCH = 16384
- PAGESIZE = 50000
- ROWPREFETCH = 2
- STATEMENTCACHE = 20

I was interested in where the rowprefetch-setting could result in an improvement.

The documentation about rowprefetch is as follows:


SET ROWPREFETCH {1 | n}
 
Sets the number of rows that SQL*Plus will prefetch from the database at one time. The default value is 1.
 
Example
 
To set the number of prefetched rows to 200, enter
 
SET ROWPREFETCH 200
 
If you do not specify a value for n, the default is 1 row. This means that rowprefetching is off.
 
Note: The amount of data contained in the prefetched rows should not exceed the maximum value of 2147483648 bytes (2 Gigabytes). The  setting in the oraaccess.xml file can override the SET ROWPREFETCH setting in SQL*Plus. For more information about oraaccess.xml, see the Oracle Call Interface Programmer's Guide. 

A simple test where rowprefetch can make a difference is the use of hash clusters (see the Buffers column in the execution plan below). E.g.


SQL> create cluster DEMO_CLUSTER(CUST_ID number) size 4096 single table hashkeys 1000 ;
 
Cluster created.
 
SQL> create table DEMO cluster DEMO_CLUSTER(CUST_ID) as select * from CUSTOMERS;
 
Table created.
 
SQL> exec dbms_stats.gather_table_stats(user,'DEMO');
 
PL/SQL procedure successfully completed.
 
SQL> select num_rows,blocks from user_tables where table_name='DEMO';
 
  NUM_ROWS     BLOCKS
---------- ----------
     55500	 1035
 
SQL> show rowprefetch
rowprefetch 1
SQL> select /*+ gather_plan_statistics */ rowid,cust_id from DEMO where cust_id=101;
 
ROWID		      CUST_ID
------------------ ----------
AAAR4qAAMAAAAedAAA	  101
 
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
-----------------------------------------
SQL_ID	9g2nyr9h2ytk4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ rowid,cust_id from DEMO where
cust_id=101
 
Plan hash value: 3286081706
 
------------------------------------------------------------------------------------
| Id  | Operation	  | Name | Starts | E-Rows | A-Rows |	A-Time	 | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	 |	1 |	   |	  1 |00:00:00.01 |	 2 |
|*  1 |  TABLE ACCESS HASH| DEMO |	1 |	 1 |	  1 |00:00:00.01 |	 2 |
------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("CUST_ID"=101)
 
SQL> set rowprefetch 2
SQL> select /*+ gather_plan_statistics */ rowid,cust_id from DEMO where cust_id=101;
 
ROWID		      CUST_ID
------------------ ----------
AAAR4qAAMAAAAedAAA	  101
 
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
-----------------------------------------
SQL_ID	9g2nyr9h2ytk4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ rowid,cust_id from DEMO where
cust_id=101
 
Plan hash value: 3286081706
 
------------------------------------------------------------------------------------
| Id  | Operation	  | Name | Starts | E-Rows | A-Rows |	A-Time	 | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	 |	1 |	   |	  1 |00:00:00.01 |	 1 |
|*  1 |  TABLE ACCESS HASH| DEMO |	1 |	 1 |	  1 |00:00:00.01 |	 1 |
------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("CUST_ID"=101)

Due to the prefetch of 2 rows Oracle detects that there actually is only 1 row and avoids the second logical IO (a second fetch).
If cust_id is unique then I would have created a unique (or primary) key constraint here, which would avoid a second fetch as well (because Oracle knows from the constraint that there can be max 1 row per cust_id), but in that case I have to maintain the created index.

I made a couple of tests, which compared the behaviour with different settings of rowprefetch and arraysize in sqlplus (what is actually the difference between the 2 settings?). That will be a subject of a future Blog.

Cet article Oracle FAST=TRUE in sqlplus? Some thoughts about rowprefetch est apparu en premier sur Blog dbi services.

Documentum – Java exception stack on iAPI/iDQL login

Sun, 2020-01-05 02:00

Recently, I was doing some sanity checks on a Documentum Server and I saw a Java exception stack while logging in using iAPI/iDQL to a Repository. It was reproducible for all Repositories. I’ve never seen something like that before (or at least I don’t remember it) so I was a little bit surprised. Whenever there are errors upon login, it is usually Documentum error messages that are printed and there is no exception stack. Since it took me some efforts finding the root cause, I thought about sharing it.

The exception stack displayed was the following one:

[dmadmin@cs-0 ~]$ echo "quit" | iapi gr_repo -Udmadmin -Pxxx

        EMC Documentum iapi - Interactive API interface
        (c) Copyright EMC Corp., 1992 - 2016
        All rights reserved.
        Client Library Release 7.3.0040.0025

Connecting to Server using docbase gr_repo
DfException:: THREAD: main; MSG: [DM_SESSION_I_SESSION_START]info:  "Session 0112d687800c9214 started for user dmadmin."; ERRORCODE: 100; NEXT: null
        at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
        at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getExceptionForAllMessages(DocbaseMessageManager.java:176)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.getExceptionForAllMessages(DocbaseConnection.java:1518)
        at com.documentum.fc.client.impl.session.Session.getExceptionsForAllMessages(Session.java:1603)
        at com.documentum.fc.client.impl.session.SessionHandle.getExceptionsForAllMessages(SessionHandle.java:1301)
        at com.documentum.dmcl.impl.ApiContext.addMessages(ApiContext.java:423)
        at com.documentum.dmcl.impl.ApiContext.collectExceptionsForReporting(ApiContext.java:370)
        at com.documentum.dmcl.impl.GetMessageHandler.get(GetMessageHandler.java:23)
        at com.documentum.dmcl.impl.DmclApi.get(DmclApi.java:49)
        at com.documentum.dmcl.impl.DmclApiNativeAdapter.get(DmclApiNativeAdapter.java:145)
        at com.documentum.dmcl.impl.DmclApiNativeAdapter.get(DmclApiNativeAdapter.java:130)


Connected to Documentum Server running Release 7.3.0050.0039  Linux64.Oracle
Session id is s0
API> Bye
[dmadmin@cs-0 ~]$

 

The login was successful but still, a strange exception stack appeared. The first thing I did was checking the Repository log file but there was nothing out of the ordinary inside it except for one thing:

[dmadmin@cs-0 ~]$ cd $DOCUMENTUM/dba/log
[dmadmin@cs-0 log]$
[dmadmin@cs-0 log]$ grep -A3 "Agent Exec" gr_repo.log
Wed Sep 11 10:38:29 2019 [INFORMATION] [AGENTEXEC 1477] Detected during program initialization: Agent Exec connected to server gr_repo:  DfException:: THREAD: main; MSG: [DM_SESSION_I_SESSION_START]info:  "Session 0112d687800c8904 started for user dmadmin."; ERRORCODE: 100; NEXT: null
        at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
        at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getExceptionForAllMessages(DocbaseMessageManager.java:176)
        at com.documentu
[dmadmin@cs-0 log]$

 

While starting, the Agent Exec was therefore facing the same behavior with the exact same stack (which is cut at the 4th line but it’s the same stack until then so it’s safe to assume it’s the same). Therefore, to dig deeper and to find when the issue started exactly, I checked the logs from the agentexec/jobs since this will be kept until cleanup from the log purge and since it does login to the Repository:

[dmadmin@cs-0 log]$ cd $DOCUMENTUM/dba/log/gr_repo/agentexec
[dmadmin@cs-0 agentexec]$
[dmadmin@cs-0 agentexec]$ # Check the last file
[dmadmin@cs-0 agentexec]$ cat $(ls -tr job_* | tail -1)
Wed Sep 11 18:00:21 2019 [INFORMATION] [LAUNCHER 3184] Detected while preparing job dm_ConsistencyChecker for execution: Agent Exec connected to server gr_repo:  DfException:: THREAD: main; MSG: [DM_SESSION_I_SESSION_START]info:  "Session 0112d687800c8974 started for user dmadmin."; ERRORCODE: 100; NEXT: null
        at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
        at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getExceptionForAllMessages(DocbaseMessageManager.java:176)
        at com.documentu
[dmadmin@cs-0 agentexec]$
[dmadmin@cs-0 agentexec]$ # Finding the first file with the error
[dmadmin@cs-0 agentexec]$ for f in $(ls -tr); do r=$(grep "_I_SESSION_START.*ERRORCODE" "${f}"); if [[ "${r}" != "" ]]; then echo "${r}"; break; fi; done
Tue Sep 10 18:00:06 2019 [INFORMATION] [LAUNCHER 31113] Detected while preparing job dm_ConsistencyChecker for execution: Agent Exec connected to server gr_repo:  DfException:: THREAD: main; MSG: [DM_SESSION_I_SESSION_START]info:  "Session 0112d687800c8827 started for user dmadmin."; ERRORCODE: 100; NEXT: null
[dmadmin@cs-0 agentexec]$

 

In all the job’s sessions files, there were the same stack (or rather a piece of the stack). At first, I didn’t understand where this was coming from, all I know was that it was linked somehow to the login inside the Repository and that it appeared for the first time on the date returned by my last command above. It was not really an error message since it wasn’t showing any “_E_” messages but it was still printing an exception.

Knowing when it appeared the first time, I looked at all the files that have been modified on that day and among log files, which are expected and can be ignored, there were the dfc.properties file. This provided me the reason for this message: it was actually due to enabling the diagnostic mode on the dfc.properties of the Documentum Server. To be exact, it was due to the “dfc.diagnostics.exception.include_stack=true” entry:

[dmadmin@cs-0 agentexec]$ tail -5 $DOCUMENTUM_SHARED/config/dfc.properties
dfc.session.secure_connect_default=secure
dfc.time_zone=UTC
dfc.diagnostics.resources.enable=true
dfc.diagnostics.exception.include_stack=true
dfc.tracing.print_exception_stack=true
[dmadmin@cs-0 agentexec]$
[dmadmin@cs-0 agentexec]$ echo "quit" | iapi gr_repo -Udmadmin -Pxxx

        EMC Documentum iapi - Interactive API interface
        (c) Copyright EMC Corp., 1992 - 2016
        All rights reserved.
        Client Library Release 7.3.0040.0025

Connecting to Server using docbase gr_repo
DfException:: THREAD: main; MSG: [DM_SESSION_I_SESSION_START]info:  "Session 0112d687800c9235 started for user dmadmin."; ERRORCODE: 100; NEXT: null
        at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
        at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getExceptionForAllMessages(DocbaseMessageManager.java:176)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.getExceptionForAllMessages(DocbaseConnection.java:1518)
        at com.documentum.fc.client.impl.session.Session.getExceptionsForAllMessages(Session.java:1603)
        at com.documentum.fc.client.impl.session.SessionHandle.getExceptionsForAllMessages(SessionHandle.java:1301)
        at com.documentum.dmcl.impl.ApiContext.addMessages(ApiContext.java:423)
        at com.documentum.dmcl.impl.ApiContext.collectExceptionsForReporting(ApiContext.java:370)
        at com.documentum.dmcl.impl.GetMessageHandler.get(GetMessageHandler.java:23)
        at com.documentum.dmcl.impl.DmclApi.get(DmclApi.java:49)
        at com.documentum.dmcl.impl.DmclApiNativeAdapter.get(DmclApiNativeAdapter.java:145)
        at com.documentum.dmcl.impl.DmclApiNativeAdapter.get(DmclApiNativeAdapter.java:130)


Connected to Documentum Server running Release 7.3.0050.0039  Linux64.Oracle
Session id is s0
API> Bye
[dmadmin@cs-0 agentexec]$
[dmadmin@cs-0 agentexec]$ sed -i sed 's,^dfc.diagnostics.exception.include_stack,#&,' $DOCUMENTUM_SHARED/config/dfc.properties
[dmadmin@cs-0 agentexec]$
[dmadmin@cs-0 agentexec]$ echo "quit" | iapi gr_repo -Udmadmin -Pxxx

        EMC Documentum iapi - Interactive API interface
        (c) Copyright EMC Corp., 1992 - 2016
        All rights reserved.
        Client Library Release 7.3.0040.0025

Connecting to Server using docbase gr_repo
[DM_SESSION_I_SESSION_START]info:  "Session 0112d687800c9237 started for user dmadmin."

Connected to Documentum Server running Release 7.3.0050.0039  Linux64.Oracle
Session id is s0
API> Bye
[dmadmin@cs-0 agentexec]$

 

As you can see above, commenting the line “dfc.diagnostics.exception.include_stack=true” (meaning setting it to false, the default value) caused the exception stack to disappear. Since I was curious about this stack and wanted confirmation that this is “expected”, I opened a case with the OpenText Support (#4331438) and they confirmed me after a few days that it wasn’t considered an “ERROR“, it was more of an “INFO” message. It’s a strange way to display informative messages but hey, who am I to judge!

 

Cet article Documentum – Java exception stack on iAPI/iDQL login est apparu en premier sur Blog dbi services.

Pages