Quantcast
Channel: Mohamed Houri’s Oracle Notes
Viewing all 224 articles
Browse latest View live

Index design I

$
0
0

Very often questions about the best position a column should be put in within a composite index come out into forums and Oracle discussions. The last question I have contributed to and tried to answer has been raised up in a French Forum. The original poster was wondering whether it is a good idea to place the very often repeated column (contains duplicates) at the leading edge of the index or not.

First of all, in contrast to my usual style of blogging I am not going to provide a SQL model on which I am going to expand my knowledge of index design. However, for those who want to learn and master indexes I would encourage them to read the world expert person in this field, Richard Foote. He has an excellent blog with several articles about almost all what one has to know about indexes and not only the widely used b-tree indexes but on all other types of indexes including bitmap indexes, function based indexes, partitioned indexes, exadata storage indexes etc..

The second reference is as always Jonathan Lewis blog in which you can find several articles about  index design, index efficiency and index maintenance. In addition, it is not sufficient to know how to design precise index; you need to know as well how your index will evolve with the delete, insert or update operations their underlined tables will undergo during the lifecycle of the application they participate to its normal functioning.

The third reference is the book Relational Database Index Design and the Optimizers which extends the index design to several databases including DB2, Oracle and SQL Server

I, from time to time, come to read very interesting articles about indexes in this web site that I am following via twitter. It contains valuable index design information which, according to what I have read up to now, is pertinent, correct and back up all what I have learned from Jonathan Lewis, Richard Foote and from my own professional experience.

That’s said, I will post here below few of my answers and articles(and Jonathan Lewis articles) about index design as a small answer to a lot of questions about index design

  1. On the importance of the leading index columns that should be the ones on which an equality predicate is applied
  2. Indexing Foreign keys
  3. Redundant Indexes
  4. Global or Local Partitioned Index
  5. Compressing indexes basic part and cost of index compression

I am planning to write several other articles on indexes and I will be completing the above list as far as I will go with this publishing task

My answer to the original poster question about the importance of the number of distinct values property of an index candidate column is that the starting index column decision is not driven by its number of distinct values. It is instead driven by:

  • The nature of the query where clause he has to cover
  • The nature of the predicate (equality, inequality, etc..) applied on the starting column
  • The constant desire to cover with the same index one or a couple of other queries
  • The constant desire to cover with the same index a foreign key deadlock threat : sometime just by reversing the columns order we succeed to cover the current query and an existing foreign key
  • The constant desire to avoid redundant indexes

And finally comes up the reason for which one has to consider placing the column with the small number of distinct values at the leading edge of the index: Compression. If you start your index with the more often duplicated column you will make a good index compression reducing, efficiently, the size of that index which means it will be very quickly put into the buffer cache and will be kept there more often due to its small size.



TABLE ACCESS BY INDEX ROWID BATCHED

$
0
0

I was writing an article for Allthings Oracle about Indexing strategy: discard and sort and testing the model supporting this article in different oracle database releases 10gR2, 11gR2 and 12cR1 until my attention has been kept by an interesting detail in 12cR1.

Observe the following execution plans taken from 12cR1 in response to the following query:

select * from t1 where id2 = 42 order by id1 desc;
------------------------------------------------------------------------
| Id  | Operation                   | Name  | Starts | E-Rows | A-Rows |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |      1 |        |   1000 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |      1 |    499K|   1000 |
|   2 |   INDEX FULL SCAN DESCENDING| T1_PK |      1 |    998K|   1000K|
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID2"=42)

---------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Starts | E-Rows | A-Rows |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |      1 |        |   1000 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1             |      1 |    499K|   1000 |
|*  2 |   INDEX RANGE SCAN          | T1_IND_ID1_FBI |      1 |    499K|   1000 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID2"=42)

------------------------------------------------------------------------------
| Id | Operation                   | Name          | Starts | E-Rows | A-Rows |
------------------------------------------------------------------------------
| 0  | SELECT STATEMENT            |               |      1 |        | 1000   |
| 1  | TABLE ACCESS BY INDEX ROWID | T1            |      1 |    499K| 1000   |
|* 2 | INDEX RANGE SCAN DESCENDING | T1_IND_ID1_NI |      1 |    499K| 1000   |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
--------------------------------------------------
2 - access("ID2"=42)

Question: What have you already pointed out from the above execution plans?

Answer     : I have managed to design indexes so that Oracle succeeded to avoid the order by operation for each of the above query executions (there is no order by operation in the above execution plan).

But this is not the reason which paved the way to this article.  Wait a minute and will you know what motivated this article.

In my incessant desire to help the CBO doing good estimations, I created a virtual column(derived_id2), singularly indexed it with a b-tree index, collected statistics for this virtual column and executed a new but equivalent query:

SQL> select * from t1 where derived_id2 = 42 order by id1 desc;

Which has been honored via the following execution plan

--------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                   | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                        |      1 |        |   1000 |
|   1 |  SORT ORDER BY                       |                        |      1 |    511 |   1000 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1                     |      1 |    511 |   1000 |
|*  3 |    INDEX RANGE SCAN                  | T1_DERIVED_ID2_IND_BIS |      1 |    511 |   1000 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("DERIVED_ID2"=42)

Question : Have you already noticed something?

Answer   : The appearance of  a SORT ORDER BY operation above TABLE ACCESS BY INDEX ROWID BATCHED

It seems that the new 12c TABLE ACCESS BY INDEX ROWID BATCHED cannot take place when Oracle uses the index access child operation to avoid the order by operation. In the first three execution plans above, Oracle uses an index access path to avoid the order by operation and in these cases the parent index table has been visited via the classical table access by index rowid. While when Oracle has been unable to eliminate the order by operation, the parent index child table has been accessed via the new 12c table access by index rowid batched followed by, what seems to be inevitable in this case, an order by operation.

Here below is a simple model you can play with to check this impossible to separate couple (order by, table access by index rowid batched)

 create table t1 as
 select rownum n1
      ,trunc((rownum-1)/3) n2
      ,rpad('x',100) v1
 from dual
connect by level <= 1e4; 

create index t1_ind1 on t1(n2, n1);

select * from t1 where n2 = 3329 order by n1 desc;

select * from table(dbms_xplan.display_cursor);

alter index t1_ind1 invisible;

create index t1_ind2 on t1(n2);

select * from t1 where n2 = 3329 order by n1 desc;

select * from table(dbms_xplan.display_cursor);

Partition by virtual column

$
0
0

Almost a year ago I trouble-shooted a query performance issue which was mainly due to a high number of logical I/O consumed by a local non prefixed index accessing a range partitioned table without eliminating partitions (partition range all). This trouble shooting issue paved the way to the partitioned index: global or local article via which the designer of the “culprit” index has perfectly learnt the lesson.

Everyone was back to its daily work until last week when I was handled a complex query performing badly. Here below, reduced to its problematic part only, is the corresponding execution plan taken from memory

----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                        | Rows  | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                             |       |       |       |
|   1 |  TEMP TABLE TRANSFORMATION           |                             |       |       |       |
|   2 |   LOAD AS SELECT                     |                             |       |       |       |
|   3 |    PARTITION LIST ALL                |                             |     1 |     1 |   367 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| TAB_LOG_ENTRY               |     1 |     1 |   367 |
|*  5 |      INDEX RANGE SCAN                | LOCAL_N_PREFIX_INDEX        |     2 |     1 |   367 |
----------------------------------------------------------------------------------------------------
 
 Predicate Information (identified by operation id):
---------------------------------------------------
 4 - filter((COL1='T' AND INTERNAL_FUNCTION(COL2)))
 5 - access(COL3>=TIMESTAMP' 2014-09-25 00:00:00' AND COL3 <=TIMESTAMP' 2014-09-26 00:00:00')

Don’t be disturbed by this internal_function which pops up because of a particular use of an ‘or’ clause in the current query (this is not the goal of this article). However, spot the number of partitions scanned by Oracle when visiting the index local_n_prefix_index: 367 partitions. The whole partitions have been scanned. The Real time SQL monitoring of the corresponding sql_id shows also that 80% of the wait activity was due to the operation with id n° 4 above.

This is exactly the same issue as the last year one. Isn’t it?

I hurried up to the developer desk and got the following Q&A with him

Me: do you remember our last year discussion about non-prefixed index used to drive a query which is not eliminating partitions?

Developer:  Of course I remember and this is why the query is using the partition key and the index is a locally prefixed (contains the partition key)

Me: So where does this partition list all operation is coming from?

A careful look at the table and index definition reveal that  the table is indeed partitioned but this time the developer decided to use a virtual column (derived from col3) as the partition key. He also managed to create a local prefixed (at least he was correctly thinking that this is a prefixed) index using col3 column.

At this stage of the investigation I remembered that Jonathan Lewis has blogged about partitioning using virtual column where I have already presented (see comment n°3) a similar case to what I have been, coincidentally, asked to look at the last week.

The positive answer in the Jonathan Lewis article to the following question

Quiz: if you create a virtual column as trunc(date_col,’W’) and partition on it – will a query on date_col result in partition elimination?

Is of course valid for the trunc SQL function, but it seems, however, not extensible to other SQL functions.

It’s time now to the set up the model. I will create a simple table list partitioned by a virtual column having 365 partitions (one partition per day). Due to the high number of partitions, I will create this table using dynamic SQL

declare
       v_sql_statment   varchar2(32000);
begin
   v_sql_statment := 'create table mho_log_entry (ln_id  number primary key,ln_date date not null,ln_type_code  number not null,';
   v_sql_statment := v_sql_statment||'ln_event_typ varchar2(32 char) not null,day_in_year number(3,0) generated always';
   v_sql_statment := v_sql_statment||q'# as (to_number(to_char(ln_date,'DDD'))) virtual) partition by list (day_in_year)(#';   
   for n in 1..365 loop
     if n != 365 then
         v_sql_statment   := v_sql_statment||' partition y_dd_'||n||' values ('||n||'),';
     else
         v_sql_statment   := v_sql_statment||' partition y_dd_'||n||' values ('||n||'))';
    end if;
   end loop;     
      execute immediate v_sql_statment;
end;
/

Execute the above script and you will have the following table description

SQL> desc mho_log_entry
          Name            Null?    Type
          -------------- --------- -----------------
   1      LN_ID           NOT NULL NUMBER
   2      LN_DATE         NOT NULL DATE              -- used to derive the virtual column
   3      LN_TYPE_CODE    NOT NULL NUMBER
   4      LN_EVENT_TYP    NOT NULL VARCHAR2(32 CHAR)
   5      DAY_IN_YEAR              NUMBER(3)         -- partition key

The partition key(day_in_year) is a virtual column responding to the following formula

  to_number(to_char(ln_date,'DDD')

And the number of partitions I have generated is given by the following select

 SQL> select count(1)
    from user_tab_partitions
    where table_name = 'MHO_LOG_ENTRY';

  COUNT(1)
----------
       365

I still have to create a local “prefixed” index as the developer did to be in the exact situation as the query I have asked to tune

create index mho_ln_date on mho_log_entry (ln_date) local;

If you want to push data into this table then here is an example on how to proceed

SQL> insert into mho_log_entry
          (ln_id
          ,ln_date
          ,ln_type_code
          ,ln_event_typ
          )
     select
          *
     from
        (
       select
          rownum
         ,date '2014-01-01' + ((level - 1) * 2) dd
         ,trunc(rownum-1)
         ,case
           when mod(rownum,100) = 0 then 'I'
           when mod(rownum,10) = 0 then 'A'
           when mod (rownum,1000) = 0 then 'B'
           when mod(rownum,1e4) = 0 then 'R'
          else 'L'
          end aa
       from dual
       connect by level <=3e2
       )
     where dd <= to_date('31122014','ddmmyyyy') ;

SQL> exec dbms_stats.gather_table_stats(user, 'mho_log_entry', cascade => true);

And finally this is the query and its corresponding execution plan in 11.2.0.3.0

SQL> select *
        from
          mho_log_entry
        where
           ln_date between to_date('20082014','ddmmyyyy')
           and to_date('22082014','ddmmyyyy') ;

SQL> select * from table(dbms_xplan.display_cursor);

------------------------------------------------------------------------------------
| Id  | Operation                          | Name          | Rows  | Pstart| Pstop |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |               |       |       |       |
|   1 |  PARTITION LIST ALL                |               |     3 |     1 |365    |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| MHO_LOG_ENTRY |     3 |     1 |365    |
|*  3 |    INDEX RANGE SCAN                | MHO_LN_DATE   |     3 |     1 |365    |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("LN_DATE">=TO_DATE(' 2014-08-20 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
       AND "LN_DATE"<=TO_DATE('2014-08-22 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

No partition elimination and 365 index range scans.

However, if I change the query to use the virtual column (day_in_year) instead of its underlying real column (ln_date), partition pruning occurs:

 SQL> select *
        from
           mho_log_entry
        where
           day_in_year between to_number(to_char(to_date('20082014','ddmmyyyy'),'DDD'))
           and to_number(to_char(to_date('22082014','ddmmyyyy'),'DDD')) ;

SQL> select * from table(dbms_xplan.display_display);

-------------------------------------------------------------------------
| Id  | Operation               | Name          | Rows  | Pstart| Pstop |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |               |       |       |       |
|   1 |  PARTITION LIST ITERATOR|               |     3 |   232 |   234 |
|   2 |   TABLE ACCESS FULL     | MHO_LOG_ENTRY |     3 |   232 |   234 |
-------------------------------------------------------------------------

Only two partitions have been scanned. Note in passing the absence of the predicate part in this case.  This predicate part is absent also when using the explain plan for command:

 SQL> explain plan for
     select *
        from
           mho_log_entry
        where
           day_in_year between to_number(to_char(to_date('20082014','ddmmyyyy'),'DDD'))
           and to_number(to_char(to_date('22082014','ddmmyyyy'),'DDD')) ;

SQL> select * from table(dbms_xplan.display);

-------------------------------------------------------------------------
| Id  | Operation               | Name          | Rows  | Pstart| Pstop |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |               |     3 |       |       |
|   1 |  PARTITION LIST ITERATOR|               |     3 |   232 |   234 |
|   2 |   TABLE ACCESS FULL     | MHO_LOG_ENTRY |     3 |   232 |   234 |
------------------------------------------------------------------------- 

Since Oracle is iteratively eliminating partitions it might be correct to do not expect to see the predicate part applied on the mho_log_entry table. However, see what happens when I create an index (first global and then local) on this table using the virtual column:

SQL> create index mho_day_in_year on mho_log_entry (day_in_year);

SQL> select *
     from
      mho_log_entry
     where
       day_in_year between to_number(to_char(to_date('20082014','ddmmyyyy'),'DDD'))
       and to_number(to_char(to_date('22082014','ddmmyyyy'),'DDD')) ;

-------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Pstart|Pstop |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                 |       |       |      |
|   1 |  TABLE ACCESS BY GLOBAL INDEX ROWID| MHO_LOG_ENTRY   |     3 | ROWID |ROWID |
|*  2 |   INDEX RANGE SCAN                 | MHO_DAY_IN_YEAR |     3 |       |      |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DAY_IN_YEAR">=232 AND "DAY_IN_YEAR"<=234

And this is the corresponding execution plan for a local index

 drop index mho_day_in_year ;
 create index mho_day_in_year on mho_log_entry (day_in_year) local;

 select /*+ index(mho_log_entry) */ -- I don't know why I am obliged to hint the index
   *                                -- I will be back to it later
    from
      mho_log_entry
    where
       day_in_year between to_number(to_char(to_date('20082014','ddmmyyyy'),'DDD'))
       and to_number(to_char(to_date('22082014','ddmmyyyy'),'DDD')) ;

-------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Pstart|Pstop |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                 |       |       |      |
|   1 |  PARTITION LIST ITERATOR           |                 |     3 |   232 |234   |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| MHO_LOG_ENTRY   |     3 |   232 |234   |
|*  3 |    INDEX RANGE SCAN                | MHO_DAY_IN_YEAR |     1 |   232 |234   |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("DAY_IN_YEAR">=232 AND "DAY_IN_YEAR"<=234)

It is interesting to note that when using an index, Oracle applied the predicate part while when full scanning the mho_log_entry table is didn’t . And both explain plan and actual execution are showing the same behaviour.

That’s said let’s be back to what motivated this article: partitioning by a virtual column cannot guaranty a partition pruning when querying the partitioned table using the physical column your partition key is based on

PS: rewriting the original query using the virtual column might not be always correct. It strongly depends on the data and on the virtual column definition. In the above designed model I have only one distinct year of data which makes the refactoring of the query possible


Publications

$
0
0

Here it is a list of recent articles I wrote for RedGate Software, Dell Software and Oracle Otn. I included a reminder for French readers that they have a free “French” downloadable chapter of the always excellent Jonathan Lewis book Oracle Cost Based Fundamentals.

Allthings Oracle

Dell Software Solution

Oracle Otn

 Oracle French Developer group

 Books Translation

 

I will be very happy to receive your critics and corrections.


100

$
0
0

During the last week I reached the number 100. There are 100 persons following my blogging activities on October the 24th. It was, and it still is, a personal documentation purpose that paved the way to this writing activity. My initial motivation has nothing to do with that number while it becomes now, I have to confess, a magnificent boost.

Unfortunately, between the desire of learning and that of writing in general and blogging in particular, there is a gap that I have to fill very quickly if I want to have a Jonathan Lewis rate of publications (number of blog article per month) :-)


Algeria and Oracle

$
0
0

As you might have already guessed I am deploying a lot of efforts to help Algerian universities and companies using the Oracle technology in the same way as the way I’ve used to learn from world Oracle experts like Jonathan Lewis. In this context, I spent the last couple of weeks in Algeria giving one day seminar to the Oracle DBA team of Ooredoo – Algeria, a private mobile telephone group and one of the PSG football team sponsors, about interpreting execution plans, adaptive cursor sharing and gathering adequate statistics. I managed also to initiate students of the “Mathematique-Informatique” department of Université de Khemis-Miliana to the Oracle world via a brief and very quick training on how Oracle manage internally to answer a simple select * from employee table (library cache, buffer cache, logical read, physical read, bind variable, soft parse, hard parse, parent cursor, child cursor).

I was very happy to see students interested by this presentation which they have never been given in the way I have managed to do it. I made a big effort in order to use words that are close to the student’s actual knowledge of the Oracle Database technology. Strange enough why academic teachers can’t give attractive and elegant training with which I am sure they will initiate so many excellent Oracle carriers. Am I insinuating that to be a good Oracle teacher you should have a good Oracle professional experience? I am afraid that the answer is YES.

OoredooKhemisUniv2


Index organized table: high logical I/O during delete

$
0
0

I was trouble shooting an application wide performance issue via a 60 minutes AWR report and here below is what I’ve pointed out in the SQL ordered by Gets part:

SQL ordered by Gets

Buffer Gets Executions Gets per Exec %Total Elapsed Time (s) %CPU %IO SQL Id SQL Module SQL Text
24,131,799 784 30,780.36 11.36 273.33 54.5 45 6m881uq1j6yr6 DBMS_SCHEDULER INSERT INTO TABLE_IOT(…
17,339,802 784 22,117.09 8.16 557.84 13.8 86.4 2abf33xkq9ypd DBMS_SCHEDULER DELETE FROM TABLE_IOT…

It is an insert and a delete from an index organized table consuming 30,780 and 22,117 Buffer Gets per execution respectively.

You are going to say why an IOT table on which there is such a high number of delete and insert operations. This is what prompted my attention as well. But let’s suppose that they want to keep this table as it is. I started trouble shooting the delete part first, and, the next step I did was to get its corresponding execution plan taken from AWR using the corresponding sql_id

DELETE FROM TABLE_IOT WHERE IOT_ID = :B1

---------------------------------------------------------------------------------------
| Id  | Operation         | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |                   |       |       |     7 (100)|          |
|   1 |  DELETE           | TABLE_IOT         |       |       |            |          |
|   2 |   INDEX RANGE SCAN| CHILD_IOT_FK_I    |   808 | 21008 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

I wished, at this step, that Oracle has already managed within its last release, to add the predicate part of the above execution plan taken from AWR (and for real time monitored SQL). But It still doesn’t include them.

The TABLE_IOT has 5 columns

SQL&gt; desc TABLE_IOT

           Name                            Null?    Type
           ------------------------------- -------- --------------------
    1      IOT_ID                           NOT NULL NUMBER(10)
    2      IOT_DATE                        NOT NULL DATE
    3      IOT_IDB                         NOT NULL NUMBER(10)
    4      IOT_PUBL_DATE                   NOT NULL DATE
    5      IOT_CRE_DATE                    NOT NULL DATE
    6      IOT_ID_TYPE                              VARCHAR2(3 CHAR)

With a primary key on (iot_publ_date, iot_id, iot_date, iot_idb) and index on foreign key child_iot_fk_i(iot_id)

When deleting from the TABLE_IOT, Oracle is not using the primary key. It is, instead, visiting the secondary index which has been created to cover the deadlock threat when deleting from a parent table this index organized table is referencing.

Well, just by looking at the above index definition I was tended to think that if the developer has read my Indexing Strategy – Part I article he would have certainly not created that index on the foreign key and would have reversed the primary key index columns by putting the IOT_ID column at the leading edge of this primary key index. As such, he will have succeeded to cover the FK lock threat and would have save disk space and DML overhead on the underlying table.

In trouble shooting this issue I started by replaying the delete in an equivalent TEST environment:


SQL&gt; delete from TABLE_IOT where IOT_id = 94149;

251 rows deleted.

---------------------------------------------------------------------------------------
| Id  | Operation         | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |                   |   431 | 11206 |     7   (0)| 00:00:01 |
|   1 |  DELETE           | TABLE_IOT         |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| CHILD_IOT_FK_I    |   431 | 11206 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(&quot;IOT_ID&quot;=94149)

Statistics
----------------------------------------------------------
         54  recursive calls
       1033  db block gets
        101  consistent gets
          8  physical reads
     131468  redo size
        567  bytes sent via SQL*Net to client
        493  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          9  sorts (memory)
          0  sorts (disk)
        251  rows processed

Then, I made invisible the foreign key index (don’t do that in production without reading the article until the end)

SQL&gt; alter index CHILD_IOT_FK_I invisible;

SQL&gt; delete from TABLE_IOT where IOT_id = 94149;

251 rows deleted.

--------------------------------------------------------------------------------------
| Id  | Operation        | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT |                   |   431 | 11206 |  1467   (1)| 00:00:03 |
|   1 |  DELETE          | TABLE_IOT         |       |       |            |          |
|*  2 |   INDEX SKIP SCAN| IOT_PK            |   431 | 11206 |  1467   (1)| 00:00:03 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(&quot;IOT_ID&quot;=94149)
       filter(&quot;IOT_ID&quot;=94149)

Statistics
----------------------------------------------------------
         54  recursive calls
       1782  db block gets
        126  consistent gets
         24  physical reads
     178716  redo size
        567  bytes sent via SQL*Net to client
        493  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          8  sorts (memory)
          0  sorts (disk)
        251  rows processed

Scanning the primary key index necessitated more consistent gets than it was the case with the previous secondary index child_iot_fk_i. This is due to the skip scan path. Very often when I see an index skip scan access I am pretty sure that there is a place for a better index to be designed. And particularly in this case, where, if I was the developer of this application at design time, I would have started the primary key index with the IOT_ID column and hence would have not created the redundant child_iot_fk_i.

SQL&gt; create unique index mho_iot_pk on TABLE_IOT (IOT_ID, IOT_PUBL_DATE,IOT_DATE, IOT_IDB);

SQL&gt; delete from TABLE_IOT where IOT_id = 94149;

---------------------------------------------------------------------------------------
| Id  | Operation         | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |                   |   431 | 11206 |     5   (0)| 00:00:01 |
|   1 |  DELETE           | TABLE_IOT         |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| MHO_IOT_PK        |   431 | 11206 |     5   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(&quot;IOT_ID&quot;=94149)

Statistics
----------------------------------------------------------
          1  recursive calls
       1046  db block gets
          5  consistent gets
          4  physical reads
     141512  redo size
        575  bytes sent via SQL*Net to client
        493  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
        251  rows processed                  

The number of consistent gets and the number of recursive calls has been drastically reduced when using the new appropriately designed unique index.

However I would have preferred, in this particluar case, a solution in which the type of table would have been changed from an index organized table into a heap table. A proposition that was not been accepted by the customer.

I tried to model this customer size using the below table model but did not end up with the exact same situation.

 create table 
  index_org_tab
  ( n1  number
   ,d1  date
   ,n2  number
   ,n3  number
   ,n4  number
   ,svc varchar2(10)
   ,constraint iot_pk primary key (d1,n1,n2,n3)
   )
  organization index;

insert into index_org_tab
 select
      rownum
     ,date '2013-01-01' + ((Level - 1) * 2)
     ,trunc((rownum-1)/5)
     ,mod(rownum,10)
     ,dbms_random.value(1,50)
     ,lpad('x',10)
 from dual
 connect by level &lt;= 1e6;

 create index iot_fk_i on index_org_tab(n3);

 exec dbms_stats.gather_table_stats(user, 'index_org_tab', cascade =&gt; true, method_opt =&gt; 'for all columns size 1');

 delete from
 index_org_tab
 where n3 = 6;

100000 rows deleted.

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |               |   100K|  2050K|   458   (2)| 00:00:01 |
|   1 |  DELETE           | INDEX_ORG_TAB |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| IOT_FK_I      |   100K|  2050K|   458   (2)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(&quot;N3&quot;=6)

Statistics
--------------------------------------------------------
        340  recursive calls
     106296  db block gets
        905  consistent gets
          0  physical reads
   29930420  redo size
        563  bytes sent via SQL*Net to client
        482  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
     100000  rows processed

SQL&gt; alter index IOT_FK_I invisible;

SQL&gt;  delete from
           index_org_tab
           where n3 = 6;

100000 rows deleted.

---------------------------------------------------------------------------------------
| Id  | Operation             | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |               |   100K|  2050K|  2897   (3)| 00:00:05 |
|   1 |  DELETE               | INDEX_ORG_TAB |       |       |            |          |
|*  2 |   INDEX FAST FULL SCAN| IOT_PK        |   100K|  2050K|  2897   (3)| 00:00:05 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(&quot;N3&quot;=6)

Statistics
--------------------------------------------------------
        518  recursive calls
     406700  db block gets
       8397  consistent gets
          0  physical reads
   49543928  redo size
        569  bytes sent via SQL*Net to client
        497  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          8  sorts (memory)
          0  sorts (disk)
     100000  rows processed

create unique index mho_iot_uk on index_org_tab(n3,d1,n1,n2);

SQL&gt; delete from
     index_org_tab
     where n3 = 6;

100000 rows deleted.
-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |               |   100K|  2050K|   468   (2)| 00:00:01 |
|   1 |  DELETE           | INDEX_ORG_TAB |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| MHO_IOT_UK    |   100K|  2050K|   468   (2)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(&quot;N3&quot;=6)

Statistics

----------------------------------------------------------
        400  recursive calls
     109095  db block gets
        983  consistent gets
          0  physical reads
   33591824  redo size
        569  bytes sent via SQL*Net to client
        504  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
     100000  rows processed

Bottom line : think carefully when you have the intention to create an index organized table on how you are going to access it. If you are going to visit this table without its primary key then you might encounter a high number of logical I/O consumption particularly when you visit this IOT table using a secondary index (as it is the case in the current issue).


Bind Aware – Part I

$
0
0

If you don’t want to use the /*+ bind-aware */ hint to make your cursor immediately bind aware, you may find that your initial cursor needs a certain number of executions to reach the bind awareness status. How many of such initial executions your cursor needs in its warming up period? And how this number of executions is monitored? Those are the two questions I aim to answer via a two parts articles

In my answering path I will start by creating the following t_cs table with a special data pattern

create table t_acs(n1  number, vc2  varchar2(10));

BEGIN
     for j in 1..1200150 loop
      if j = 1 then
       insert into t_acs values (j, 'j1');
      elsif j>1 and j<=101 then
       insert into t_acs values(j, 'j100');
      elsif j>101 and j<=1101 then
       insert into t_acs values (j, 'j1000');
      elsif j>10001 and j<= 110001 then
      insert into t_acs values(j,'j10000');
     else
      insert into t_acs values(j, 'j>million');
     end if;
    end loop;
   commit;
END;
/

create index t_acs_i1 on t_acs(vc2);

select vc2, count(1) from t_acs group by vc2 order by 2;

VC2          COUNT(1)
---------- ----------
j1                  1
j100              100
j1000            1000
j10000         100000
j>million     1099049

As you can notice above, the values of the vc2 column indicate the number of their occurrence in the t_cs table:

	‘j1’        means   select count(1) from t_acs where vc2 = ‘j1’         generates  1 rows
	‘j100’      means   select count(1) from t_acs where vc2 = ‘j100’       generates 100 rows
	‘j1000’     means   select count(1) from t_acs where vc2 = ‘j1000’      generates 1000 rows
	‘j10000’    means   select count(1) from t_acs where vc2 = ‘j10000’     generates 10000 rows
	‘j>million’ means   select count(1) from t_acs where vc2 = ‘j>million’  generates more than a million of rows

I will explain later why I built such a data skew set and such a naming of the vc2 column values.
There are three views that are used to monitor a cursor candidate to the adaptive cursor sharing feature:

SQL> desc v$sql_cs_statistics

           Name                            Null?    Type
           ------------------------------- -------- --------------------
    1      ADDRESS                                  RAW(8)
    2      HASH_VALUE                               NUMBER
    3      SQL_ID                                   VARCHAR2(13)
    4      CHILD_NUMBER                             NUMBER        --> emphasize this
    5      BIND_SET_HASH_VALUE                      NUMBER
    6      PEEKED                                   VARCHAR2(1)
    7      EXECUTIONS                               NUMBER
    8      ROWS_PROCESSED                           NUMBER        --> emphasize this
    9      BUFFER_GETS                              NUMBER
   10      CPU_TIME                                 NUMBER

SQL> desc v$sql_cs_histogram

           Name                            Null?    Type
           ------------------------------- -------- --------------------
    1      ADDRESS                                  RAW(8)
    2      HASH_VALUE                               NUMBER
    3      SQL_ID                                   VARCHAR2(13)
    4      CHILD_NUMBER                             NUMBER        --> emphasize this
    5      BUCKET_ID                                NUMBER        --> emphasize this
    6      COUNT                                    NUMBER        --> emphasize this

SQL> desc v$sql_cs_selectivity

           Name                            Null?    Type
           ------------------------------- -------- --------------------
    1      ADDRESS                                  RAW(8)
    2      HASH_VALUE                               NUMBER
    3      SQL_ID                                   VARCHAR2(13)
    4      CHILD_NUMBER                             NUMBER
    5      PREDICATE                                VARCHAR2(40)
    6      RANGE_ID                                 NUMBER
    7      LOW                                      VARCHAR2(10)
    8      HIGH                                     VARCHAR2(10)

The third view starts to be useful only when the cursor becomes bind aware. At this step of the article, let’s concentrate only on the rows_processed column of the first view and on the couple (bucket_id, count) columns of the second one. They go hand-in-hand in the monitoring process of the same child cursor

Let’s investigate how these two views interact with each other. I will be using, all over this article, the below query which is:

SQL> select count(1) from t_acs where vc2 = :lvc;

Where :lvc is a bind variable which I will declare and set (the first time) as follows:

SQL> var lvc varchar2(10);
SQL> exec :lvc := 'j100';

To be bind aware, a cursor requires from the column used in its predicate part to possess a histogram:

BEGIN
       dbms_stats.gather_table_stats
                   (user
                   ,'t_acs'
                   ,method_opt       => 'for all columns size skewonly'
                   ,estimate_percent => dbms_stats.auto_sample_size
                   ,cascade          => true
                   ,no_invalidate    => false
                   );
END;
/

I used ‘’size skewonly’’ for the method_opt parameter to collect histogram on the vc2 column. There was no previous query using the vc2 column in a where clause so that using the ‘’size auto’’ parameter in the above dbms_stats package call would have been more appropriate.

As mentioned above ‘j100’ means that my initial cursor will be processing 100 rows. This is what we can see in the two monitoring views when we execute the above query:

SQL> select
         child_number
         ,executions
         ,rows_processed
    from v$sql_cs_statistics
    where sql_id = '6f6wzmu3yzm43' ;

SQL> select
        child_number
        ,bucket_id
        ,count
    from
         v$sql_cs_histogram
    where  sql_id = '6f6wzmu3yzm43' ;

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
------------ ---------- --------------
           0          1            101

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          1 --> incremented
           0          1          0
           0          2          0

The child cursor n° 0 processed 101 rows and Oracle incremented the count of its bucket_id n° 0.

What happens now if I execute the same query using a different bind variable value (‘j10000’)?

SQL> exec :lvc := 'j10000';

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
------------ ---------- --------------
           0          1         101

I know that ‘j10000’ bind variable produces 10,000 rows; so why my shared cursor n°0 is showing only 101 rows?

This is not completely true. We will see below, that in contrast to v$sql_cs_histogram view, the rows_processed column of the v$sql_cs_statistics view is updated only when a new execution plan is compiled. This means that, the rows_processed column is not updated when a child cursor is shared. At this step we are still sharing the child cursor n°0 and this is why the number of processed rows of the last second (last) execution is not corect.

Hopefully, we have the v$sql_cs_histogram:

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          1
           0          1          1 --> incremented
           0          2          0

Oracle is showing, via this view, that we are still sharing the same child cursor n° 0, but, in contrast to the first execution where it was the bucket_id n° 0 that has seen its count incremented to 1, for the second query execution, it is actually the bucket_id n°1 that has been incremented.

What does this mean?

Oracle is telling us that the second query execution has actually processed more than the indicated 101 rows. We will see at the end of this article the proof of that claim.

Now, let’s use a new bind variable value (‘j>million’) and re-execute the same query?

SQL> exec :lvc := 'j>million';

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
----------- ---------- --------------
           1          1        1099050
           0          1         101

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          0
           1          2          1 --> incremented
           0          0          1
           0          1          1
           0          2          0

Notice this time that Oracle produces a new child cursor n°1, which is a sign that our cursor is now bind aware. It has processed 1,099,050 rows and it is the bucket_id number 2 of this new compiled child cursor n° 1 that has seen its count incremented to 1.

In order to show you that the bind variable ‘j10000’ has effectively generated more than 10,000 rows, let’s re-execute the initial query using this bind variable and observe what the monitoring views will show us:

SQL> exec :lvc := 'j10000';

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
------------ ---------- --------------
           2          1         100001
           1          1        1099050
           0          1            101

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           2          0          0
           2          1          1 --> incremented
           2          2          0
           1          0          0
           1          1          0
           1          2          1
           0          0          1
           0          1          1
           0          2          0

A new child cursor n°2 has been compiled for the ‘j10000’ bind variable value and this is why we see the correct number of rows_processed (10,001) in the v$sql_cs_statistics view. Note also that for this child cursor n°2, it is the bucket_id n° 1 that has been incremented to 1

Keep away, for this moment, the fact that cursor child n°1 and 2 are bind aware and that cursor child n°0 became obsolete (not shareable anymore) as shown below:

select  sql_id
       ,child_number
       ,is_bind_aware
       ,is_bind_sensitive
       ,is_shareable
       ,executions
from   v$sql
where  sql_id ='6f6wzmu3yzm43';

SQL_ID        CHILD_NUMBER I I I EXECUTIONS
------------- ------------ - - - ----------
6f6wzmu3yzm43            0 N Y N          2
6f6wzmu3yzm43            1 Y Y Y          1
6f6wzmu3yzm43            2 Y Y Y          1

And notice that, with what preceds, I hope that I have demonstrated the relationship between the child cursor, the number of rows it has processed and the corresponding incremented bucket_id. This relationship obeys to the following algorithm:

             0   < ROWS_PROCESSED <= 1000  --> COUNT of BUCKET_ID  0 will be incremented
	       1000 < ROWS_PROCESSED <= 1e6   --> COUNT of BUCKET_ID  1 will be incremented
	              ROWS_PROCESSED > 1e6   --> COUNT of BUCKET_ID  2  will be incremented

In part II of this series we will see how the couple (bucket_id, count) influences the bind aware status of the initial child cursor. This couple of column ceases to play their initial role when the cursor becomes bind aware; in which case the third v$view (v$sql_cs_selectivity) enters the bind awareness scene and starts using the selectivity of the bind variable to decide whether to share an existing child cursor or optimize a new one.



Bind Aware – Part II

$
0
0

In part I we saw how the count of the bucket_id in the v$cs_histogram view is incremented according to the number of rows their corresponding child cursor execution has processed depending on the bind variable value. We saw that this count-bucket_id relationship follows the following algorithm

  • If the number of processed rows < 1,000 then increment count of bucket_id n°0
  • If the number of processed rows < 1,000,000 then increment count of bucket_id n°1
  • If the number of processed rows > 1,000,000 then increment count of bucket_id n°2

In part II we will see how the decision to mark a cursor bind-aware is made. I have found three situations in which the CBO is triggered to compile a new execution plan:

    • increment the count of two adjacent bucket_id (0-1 or 1-2)
    • increment the count of the two distant bucket_id (0-2)
    • increment the count of the three bucket_id (0-1-2)

1. Two neighbour bucket_ids

Let’s start with the first situation. For this I will be using ‘j100’ and ‘j10000’ bind variable values as far as these two values increment two adjacent bucket_id which are 0 and 1. I will be using the same table and the same query as in part I.

SQL> exec :lvc := 'j100'

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43 --> see script at the end

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          1
           0          1          0
           0          2          0

Next, I will execute the same query 6 extra times so that I will get the following cursor monitoring picture:

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          7
           0          1          0
           0          2          0

7 executions using the same bind variable value (‘j100’) which has incremented the count of bucket_id n° 0 as expected.
Now, I will change the bind variable value so that it is the bucket_id n°1 that will be incremented and execute the same query 7 times

SQL> exec :lvc := 'j10000'

SQL> select count(1) from t_acs where vc2 = :lvc; -- repeat this 7 times

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          7
           0          1          7
           0          2          0

After 7 executions using the new bind variable we are still sharing the same child cursor n°0. However,Oracle has, as expected too, incremented the bucket_id n°1 7 times. Let’s add an extra run of the same query and see what happens:

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          1
           1          2          0
           0          0          7 --> equals its neighbour
           0          1          7 --> equals its neighbour
           0          2          0

Finally after at the 8th execution Oracle has decided that it is now time to compile a new plan (new child cursor n°1). This allows me to write the first below observation:

For the same child cursor, when the count of a bucket_id reaches the count of its neighbour bucket_id, the next execution will mark the original cursor as bind aware and a new child cursor is compiled.

This conclusion is of course valid only before the apparition of the first bind aware cursor in the cursor cache.

2. Two distant bucket_ids

In order to investigate the second case I will start by flushing the shared pool and repeat the same experiment with two distant (bucket_id) bind variables, ‘j100’ and ‘j>million’ starting by 6 cursor executions using the first bind variable.

SQL> alter system flush shared_pool;
SQL> exec :lvc := 'j100';
SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          6
           0          1          0
           0          2          0

As expected, the  count of bucket_id number 0 has been incremented 6 times. Now, I will execute the same query two times using the ‘j>million’ bind variable which favours the non-adjacent bucket_id n°2

SQL> exec :lvc := 'j>million'
SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43
CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          6
           0          1          0
           0          2          2

As expected too, the count of bucket_id number 2 has been incremented 2 times. But I am still sharing the same child cursor n°0. Let’s try an extra execution with the same bind variable

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          0
           1          2          1
           0          0          6
           0          1          0
           0          2          2

And finally a new child cursor n°1 has been created indicating that the cursor has been marked bind aware.

At this step I can make the second conclusion:

For the same child cursor, when the count of a bucket_id is greater or equal to  (trunc(count/2) – 1) of its distant non-adjacent bucket_id, the next execution will mark the original cursor as bind aware and a new child cursor is compiled.

We have 6 executions at bucket_id 0 and 2 executions at bucket_id n° 2 of the same child cursor n°0. We reaches the situation where 2>= (trunc(6/2)-1). This has the tendency to indicate that the the next execution (the 3rd one) will compile a new execution plan (child cursor n°1)

What happens when the number of executions is odd?

By flushing the shared pool and setting again the bind variable to ‘j100’ and executing the same query 11 times I obtained the following picture:

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0         11
           0          1          0
           0          2          0

Then by setting the bind variable value to ‘j>million’ and executing the same query 4 times I arrived to the following picture:

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0         11
           0          1          0
           0          2          4

The above pictures indicates that  4 >= trunc(11/2) – 1 = 4. This means that the next execution(5th one)  will compile a new plan:

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          0
           1          2          1
           0          0         11
           0          1          0
           0          2          4

The two observations seems to be valid in both directions.i.e. the initial compiled cursor can belong to the bucket_id n°0 or 1 or 2. I have experimented the above tests starting by bind variable ‘j>million’ (or ‘j10000′) and have came up with the same conclusions.

In my incessant desire to make my article short and simple, I’ve decided to treat the last case (mixture bucket_ids) in Part III

PS:  cntbukt script

select
        child_number
        ,bucket_id
        ,count
    from
         v$sql_cs_histogram
    where sql_id = '&1' ;

Dynamic sampling: unlearning

$
0
0

Here is an execution plan that served as a path for the Oracle SQL engine to execute a simple two table join

 SQL> SELECT
          count(t1.small_vc)
         ,count(t2.padding)
    FROM
          t1, t2
    WHERE
          t1.id1 = t2.id
        ;

SQL> start xstat

SQL_ID  b5yc4wnt59bxn, child number 0
-------------------------------------
--------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |
|   2 |   NESTED LOOPS                |       |      1 |        |     10 |
|   3 |    NESTED LOOPS               |       |      1 |     10 |     10 |
|   4 |     TABLE ACCESS FULL         | T2    |      1 |     10 |     10 |
|*  5 |     INDEX UNIQUE SCAN         | T1_PK |     10 |      1 |     10 |
|   6 |    TABLE ACCESS BY INDEX ROWID| T1    |     10 |      1 |     10 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(&quot;T1&quot;.&quot;ID1&quot;=&quot;T2&quot;.&quot;ID&quot;)

Note
-----
   - dynamic sampling used for this statement (level=2)

What does this last Note about dynamic sampling imply?

That involved tables have stale statistics?
No it doesn’t imply such a conclusion

That involved tables have no statistics at all?
No it doesn’t exactly imply such a conclusion

The above Note indicates that at least one (particularly for the default level 2) of the involved tables in the above query has no statistics at all. It doesn’t mean as well that all tables in the join query have no statistics. It means that at least one of the two tables has been NOT ANALYZED using the 10053 trace file terminology as shown below:


***************************************
BASE STATISTICAL INFORMATION
***************************************
Table Stats::
  Table: T2  Alias: T2  (NOT ANALYZED) --- spot this
    #Rows: 327  #Blks:  4  AvgRowLen:  100.00  ChainCnt:  0.00
  Column (#1): ID(  NO STATISTICS (using defaults)
    AvgLen: 13 NDV: 10 Nulls: 0 Density: 0.097859

Index Stats::
  Index: T2_I_FK  Col#: 1
    LVLS: 0  #LB: 1  #DK: 10  LB/K: 1.00  DB/K: 1.00  CLUF: 1.00
***********************

In fact, what I did not show you is the model I’ve used which is

create table t1
    as select
              rownum                     id1,
              trunc(dbms_random.value(1,1000)) id2,
              lpad(rownum,10,'0')              small_vc,
              rpad('x',1000)                   padding
              from dual
    connect by level &lt;= 1e4;

alter table t1 add constraint t1_pk primary key (id1);

create index t1_i1 on t1(id2);

create table t2
   as select *
      from ( select
               rownum                              as id
              ,mod(t1.id1,5) + mod(rownum,10)* 10  as id1
              ,lpad(rownum,10,'0')                 as small_vc
              ,rpad('x',70)                        as padding
            from t1
            where rownum &lt;= 10
           )
            order by id1;

alter table t2 add constraint t2_fk foreign key(id) references t1(id1);

create index t2_i_fk on t2(id);

And most importantly the statistics I have collected for table t1 only

 BEGIN
  dbms_stats.gather_table_stats(USER
                           ,'t1'
                           ,method_opt =&gt; 'FOR ALL COLUMNS SIZE 1'
                           ,estimate_percent =&gt; DBMS_STATS.AUTO_SAMPLE_SIZE
                           ,CASCADE =&gt; true);  
END;                          
/

The CBO 10053 trace file echoes for table t1, in contrast to t2 table on which I haven’t collected statistics, the following lines which indicate that the CBO has used table t1 corresponding dictionary statistics during the execution plan compilation time

***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 10000  #Blks:  1460  AvgRowLen:  1020.00  ChainCnt:  0.00
  Column (#1): ID1(
    AvgLen: 4 NDV: 10000 Nulls: 0 Density: 0.000100 Min: 1 Max: 10000
Index Stats::
  Index: T1_I1  Col#: 2
    LVLS: 1  #LB: 21  #DK: 999  LB/K: 1.00  DB/K: 9.00  CLUF: 9968.00
  Index: T1_PK  Col#: 1
    LVLS: 1  #LB: 20  #DK: 10000  LB/K: 1.00  DB/K: 1.00  CLUF: 1429.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 10000.000000  Rounded: 10000  Computed: 10000.00  Non Adjusted: 10000.00
  Access Path: TableScan
    Cost:  447.51  Resp: 447.51  Degree: 0
      Cost_io: 439.00  Cost_cpu: 12297302
      Resp_io: 439.00  Resp_cpu: 12297302
  Best:: AccessPath: TableScan
         Cost: 447.51  Degree: 1  Resp: 447.51  Card: 10000.00  Bytes: 0

Bottom line: when you see a Note about dynamic sampling at level 2 in an execution plan then bear in mind that this Note doesn’t imply that all tables involved in the underlying query have no statistics. It indicates that there is at least one table without statistics.

I’ve also attached to this blog the corresponding documentation about dynamic (statistics) sampling


Bind Aware – Part III

$
0
0

After having explained

  • In Part I, how Oracle is monitoring a cursor candidate for a bind aware status via the number of processed rows each bind variable value is generating and how this is linked with the count of the three different bucket_id
  • In Part II, how Oracle is exploiting the tandem (bucket_id, count) to decide when time has come to make a cursor bind aware.Part in which I think I have succeeded to figure out the secret sauce the Adaptive Cursor Sharing Layer code is using to mark a cursor bind aware in two cases which are (a) when only to adjacent bucket_id are concerned (the count of the third bucket_id = 0) and (b) when only to distant bucket_id are concerned (that is the count of bucket_id n°1 = 0).

I left the third case which concerns all bucket_id  (that is the count of all bucket_id != 0) for the current blog article.

I will immediately temper your enthusiasm and say that unfortunately I didn’t succeed to figure out the secret sauce Oracle is using to mark a cursor bind aware in this third case. Nevertheless, this failure didn’t pre-empted me to share with you all the observations and tests I have done hoping that someone will take over the task and come up with a nice conclusion

You will have pointed out that I’ve also made an exploit by reserving two articles on the Adaptive Cursor Sharing without showing a single execution plan! This is because I aimed through this series of articles to show how Oracle is internally managing and monitoring this feature rather than to show its goal which is to produce several optimal execution plans for a SQL statement using bind variables and fulfilling particular pre-requisites.

Finally, you might have also noticed that, despite 3 articles, I still have not exposed nor explain what happens when the cursor is finally made bind aware. In fact once a cursor is made bind aware there is a new piece of internal Oracle code that takes over the monitoring job. This is the Extended Cursor Sharing (ECS) feature which monitors the selectivity (or a range of selectivities) for each child cursor that has been previously made bind aware by ACS internal code. While the ACS feature uses the tandem (bucket_id, count) for its internal secret sauce (half secret from now and on), the ECS feature is based on the v$sql_cs_selectivity view.

Back now to the content of this third and last part of the series which exposes my tests when trying to decipher the ACS working mechanism when all bucket_id are concerned

I have decomposed my tests in 3 categories (all bucket_id have a count != 0)

  • The first case corresponds to a cursor marked bind aware following an increment of the count of bucket_id n°0
  • The second case is when a cursor is marked bind aware following an increment of the count of bucket_id n°1
  • The third category corresponds to a cursor marked bind aware following an increment the count of the bucket_id n°2

Based on those different observations, I was aiming to derive an empirical algorithm that might explain how Oracle is taking its decision to mark a cursor bind aware when all bucket-ids are concerned. Unfortunately, in contrast to the case exposed in Part II of this series, I have been unable to derive that algorithm. Nevertheless, I’ve decided to share with you find all my tests.

1.    First category of tests

As exposed above the first category of my tests concerns a cursor marked bind aware following an execution done using a bind variable that increments the count of bucket_id n°0. For example I have the below situation:

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID  COUNT
------------ ---------- ----------
0             0          5 ---> +1 execution
0             1          8
0             2          3

Looking at the above picture, you can understand that I’ve managed to execute my query in the following execution order

  • bind variable value ‘j10000’ 8 times  (increment count of bucket_id 1 to 8)
  • bind variable value ‘j100’ 5 times    (increment count of bucket_id 0 to 5)
  • bind variable value ‘j>million’ 3 times (increment count of bucket_id 2 to 3)

Of course these executions have been done in a way to avoid two adjacent bucket_id reaching the same count while the remaining bucket_id has a count =0. This is why I managed to have all bucket_id incremented to avoid bind aware situations exposed in Part II. To accomplish this you can for example start by executing your query using the ‘j10000’ bind variable 8 times then jump to the adjacent bucket_id n°0 using bind variable ‘j100’ execute it 5 times then go to the other bind variable value ‘j>million’ and execute it 3 times. Proceeding as such you will arrive exactly at the above situation exposed in v$sql_cs_histogram for the 6f6wzmu3yzm43 sql_id.

At this stage of the experiment, if you execute the same query using the first bind variable value ‘j100’, your will see that Oracle has made your cursor bind aware as shown below (a new child cursor n°1 appears):

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID COUNT
------------ ---------- ----------
1             0          1
1             1          0
1             2          0
0             0          5
0             1          8
0             2          3

2.    Second category of tests

The second category of my tests is the one that corresponds to a cursor becoming bind aware following an execution at bucket_id n° 1

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID COUNT
------------ ---------- ----------
0             0          8
0             1          3 + 1 execution
0             2          2

CHILD_NUMBER  BUCKET_ID COUNT
------------ ---------- ----------
1             0          0
1             1          1
1             2          0
0             0          8
0             1          3
0             2          2

The same precaution have been taken so that bind aware reasons exposed in Part II of the series are avoided before having the count of all bucket_id incremented

3.    Third category of tests

Finally the third category corresponds to a cursor becoming bind aware following an execution at bucket_id n° 2 as shown below:

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID  COUNT
------------ ---------- ----------
0             0          8
0             1          3
0             2          2  ----> + 1 execution

CHILD_NUMBER  BUCKET_ID  COUNT
------------ ---------- ----------
1             0          0
1             1          0
1             2          1
0             0          8
0             1          3
0             2          2

Looking at the above situation there is no apparent clue indicating what secret sauce ACS is using to mark a cursor bind aware when all bucket_id are concerned.

I have attached this tests with 3 bucket document where you can find several similar cases I have done and that have not put me in the right direction as well.

I hope that these 3 articles help someone.


Adaptive plan

$
0
0

Have you ever seen such kind of execution plan

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Starts | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |      1 |      5 |00:00:00.04 |       4 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| T1_T2_MV |      1 |      5 |00:00:00.04 |       4 |
------------------------------------------------------------------------------------------

Note
-----
   - this is an adaptive plan

Or this one:

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last +adaptive'));

-----------------------------------------------------------------------------
| Id  | Operation                    | Name     | Starts | A-Rows | Buffers |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |      1 |      5 |       4 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| T1_T2_MV |      1 |      5 |       4 |
-----------------------------------------------------------------------------

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

Where are those inactive rows marked ‘-‘ in the plan?

I was writing an article on tuning disjunctive subqueries when I have been prompted to check something with materialized views. Coincidentaly at the time the materialized view question kicks in, I was finishing the model for the disjunctive subquery purpose. Thereofe, I’ve decided to created a materialied view using this model and here what happens.

select /*+ qb_name(parent) */
     id,
     n1
from
     t1
where
     n1 = 100
and  exists
      (select /*+ qb_name(child) */
            null
       from t2
       where
            t2.id = t1.id
       and  t2.x1 = 100
       );
-------------------------------------------------------------------------------------------
|   Id  | Operation                                | Name     | Starts | A-Rows | Buffers |
-------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                         |          |      1 |      5 |      13 |
|- *  1 |  HASH JOIN                               |          |      1 |      5 |      13 |
|     2 |   NESTED LOOPS                           |          |      1 |      5 |      13 |
|     3 |    NESTED LOOPS                          |          |      1 |      5 |      11 |
|-    4 |     STATISTICS COLLECTOR                 |          |      1 |      5 |       3 |
|     5 |      SORT UNIQUE                         |          |      1 |      5 |       3 |
|     6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2       |      1 |      5 |       3 |
|  *  7 |        INDEX RANGE SCAN                  | T2_X1_I1 |      1 |      5 |       2 |
|  *  8 |     INDEX RANGE SCAN                     | T1_ID_I1 |      5 |      5 |       8 |
|  *  9 |    TABLE ACCESS BY INDEX ROWID           | T1       |      5 |      5 |       2 |
|- * 10 |   TABLE ACCESS FULL                      | T1       |      0 |      0 |       0 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."ID"="T1"."ID")
   7 - access("T2"."X1"=100)
   8 - access("T2"."ID"="T1"."ID")
   9 - filter("N1"=100)
  10 - filter("N1"=100)

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

The materialized view on the top of the above query resembles to this:

create materialized view t1_t2_mv as
select /*+ qb_name(parent) */
     id,
     n1
from
     t1
where
     n1 = 100
and  exists
      (select /*+ qb_name(child) */
            null
       from t2
       where
            t2.id = t1.id
       and  t2.x1 = 100
       );
Materialized view created.

And I have finished the setup by enabling query rewrite on the materialized view:

alter materialized view t1_t2_mv enable query rewrite;
Materialized view altered.

Finally I re-executed the initial query and get the corresponding execution plan as usual

select /*+ qb_name(parent) */
     id,
     n1
from
     t1
where
     n1 = 100
and  exists
      (select /*+ qb_name(child) */
            null
       from t2
       where
            t2.id = t1.id
       and  t2.x1 = 100
       );
------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Starts | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |      1 |      5 |00:00:00.04 |       4 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| T1_T2_MV |      1 |      5 |00:00:00.04 |       4 |
------------------------------------------------------------------------------------------

Note
-----
   - this is an adaptive plan

And the corresponding outline is:

Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.1')
      DB_VERSION('12.1.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$35C14E55")
      REWRITE(@"SEL$F743C7BF" "T1_T2_MV")
      OUTLINE(@"SEL$F743C7BF")
      REWRITE(@"PARENT" "T1_T2_MV")
      OUTLINE(@"PARENT")
      FULL(@"SEL$35C14E55" "T1_T2_MV"@"SEL$518C1272")
    END_OUTLINE_DATA
  */

Spot in passing how the number of Buffers drops from 13 to 4 when the CBO decided to use the materialized view instead of the initial query.

Bottom line: The Note about adaptive plan you might see when materialized views are used is related to the SQL behind the materialized view and not to the materialized view itself.


Subquery Coalescing

$
0
0

It’s fairly clear that to improve a performance of a SQL query we sometimes need to rewrite it so that we end up by either reducing the number of scanned tables or by giving the CBO an alternative path the original query was pre-empting it to take. In a previous article, I’ve looked at ways of turning a disjunctive subquery into a conjunctive subquery so that it can be unnested and merged with its parent query block. In this article I aim to explain how to improve performance by coalescing two distinct subqueries. This tuning strategy will be preceded by the conjunction and disjunction subquery concept definition. Finally, simple examples of subquery coalescing will be presented and explained. I aim also to show that sometime de-coalescing can reveal to be a good tuning strategy as far as it might open a new path for an optimal execution plan. This article is based on the Oracle published paper entitled Enhanced Subquery Optimization in Oracle

Conjunction, Disjunction and Containment property

Two subqueries are eligible to be coalesced provided they verify the containment property. A subquery subq1 is said to contain a second subq2 if the result of subq2 is a subset of the subq1 result. In this case subq1 is called the container query block while subq2 is called the contained query block. The same two subqueries will verify the contained property when subq2 contains a conjunctive predicate which, if suppressed, makes subq1 and subq2 equivalent.

Let’s picture the containment property to make it crystal clear. Below I have the following three distinct query blocks

select
     t1.*
from
     t1,t2
where t2.id1 = t1.id1;

9000 rows selected.

select
     t1.*
from
     t1,t2
where t2.id1 = t1.id1
and   t2.status = 'COM';

8900 rows selected.

select
     t1.*
from
     t1,t2
where t2.id1 = t1.id1
and   t2.id1 = t2.id;

4 rows selected.

If we get rid of the predicate part on the status column from the second query block, the first and the second query become equivalent. They will then verify the containment property and therefore can be coalesced. If we look at the second and the third query blocks they are also verifying the containment property as far as the result set of the third query is a subset of the result set of the second one. That is the containment property definition.

A conjunction is the action of linking two subqueries with an AND operator. The conjunction of the two subqueries is true if the two subqueries are simultaneously true; otherwise it is false.

A disjunction is the action of liking two subqueries with an OR predicate. The disjunction of two subqueries is true when one of the two subqueries is true and is false when both are simultaneously true.

Coalescing two subqueries of the same type (exists)

Two conjunctive subqueries satisfying the containment property can be coalesced. Consider the following reproducible model (11.2.0.3)

create table t1
   as select
    rownum                id1,
    trunc((rownum-1/3))   n1,
    date '2012-06-07' + mod((level-1)*2,5) start_date,
    lpad(rownum,10,'0')   small_vc,
    rpad('x',1000)        padding
from dual
connect by level <= 1e4;

create table t2
as select
    rownum id
    ,mod(rownum,5) + mod(rownum,10)* 10  as id1
    ,case
       when mod(rownum, 1000) = 7 then 'ERR'
       when rownum <= 9900 then 'COM'
       when mod(rownum,10) between 1 and 5 then 'PRP'
     else
       'UNK'
     end status
     ,lpad(rownum,10,'0')    as small_vc
     ,rpad('x',70)           as padding
from dual
connect by level <= 1e4;

And the following query with two exists subqueries of the same type (exists)

select
     start_date
    ,count(1)
from t1
where
     start_date >= to_date('10062012','ddmmyyyy')
and (exists (select null
             from t2
             where t2.id1 = t1.id1
             and t2.status = 'COM'
             )
  or exists (select null
             from t2
             where t2.id1 = t1.id1
             and   t2.id1 = t2.id
             )
     )
group by start_date;

START_DATE          COUNT(1)
----------------- ----------
20120611 00:00:00          2
20120610 00:00:00          1

--------------------------------------------------------------------------------------
| Id  | Operation             | Name    | Starts | E-Rows | A-Rows | Buffers | Reads  |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |      1 |        |      2 |    1950 |   1665 |
|   1 |  HASH GROUP BY        |         |      1 |      1 |      2 |    1950 |   1665 |
|*  2 |   HASH JOIN SEMI      |         |      1 |      1 |      3 |    1950 |   1665 |
|*  3 |    TABLE ACCESS FULL  | T1      |      1 |   4908 |   4000 |    1670 |   1665 |
|   4 |    VIEW               | VW_SQ_1 |      1 |  11843 |   9894 |     280 |      0 |
|   5 |     UNION-ALL         |         |      1 |        |   9894 |     280 |      0 |
|*  6 |      TABLE ACCESS FULL| T2      |      1 |      1 |      4 |     140 |      0 |
|*  7 |      TABLE ACCESS FULL| T2      |      1 |  11842 |   9890 |     140 |      0 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("VW_COL_1"="T1"."ID1")
   3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   6 - filter("T2"."ID1"="T2"."ID")
   7 - filter("T2"."STATUS"='COM')

Note
-----
   - dynamic sampling used for this statement (level=2)

Since the above two exists subqueries verify the containment property, let’s then coalesce them and note the performance consequences we will reach through this transformation

select
     start_date
    ,count(1)
from t1
where
     start_date >= to_date('10062012','ddmmyyyy')
and (exists (select null
            from t2
            where t2.id1 = t1.id1
            and(t2.status = 'COM'
                or t2.id1 = t2.id)
            )
          )
group by start_date;

----------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      2 |    1808 |   1665 |
|   1 |  HASH GROUP BY      |      |      1 |      1 |      2 |    1808 |   1665 |
|*  2 |   HASH JOIN SEMI    |      |      1 |     10 |      3 |    1808 |   1665 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |   4000 |   4000 |    1669 |   1665 |
|*  4 |    TABLE ACCESS FULL| T2   |      1 |   9890 |   9890 |     139 |      0 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."ID1"="T1"."ID1")
   3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   4 - filter(("T2"."STATUS"='COM' OR "T2"."ID1"="T2"."ID"))

Thanks to this coalescing operation we are now doing only a single access full t2 table instead of two initial full table scan of the same table. As such, we saved 140 logical I/O dropping the total number of buffers from 1950 to 1808.

Coalescing two subqueries of different type (exists and not exists)
Real life examples can show cases of subqueries verifying the containment property but having a different type. An example of such situation is presented in the below query:

select
     start_date,
     id1,
     n1
from t1
where
     start_date >= to_date('10062012','ddmmyyyy')
and exists     (select null
                from t2 a
                where a.id1 = t1.id1
                )
and not exists (select null
                from t2 b
                where b.id1 = t1.id1
                and   b.id1 != 83
                );

START_DATE               ID1         N1
----------------- ---------- ----------
20120611 00:00:00         83         82

----------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |    1947 |   1665 |
|*  1 |  HASH JOIN ANTI     |      |      1 |      7 |      1 |    1947 |   1665 |
|*  2 |   HASH JOIN SEMI    |      |      1 |     10 |      3 |    1808 |   1665 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |   4000 |   4000 |    1669 |   1665 |
|   4 |    TABLE ACCESS FULL| T2   |      1 |  10000 |  10000 |     139 |      0 |
|*  5 |   TABLE ACCESS FULL | T2   |      1 |   9000 |   9000 |     139 |      0 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("B"."ID1"="T1"."ID1")
   2 - access("A"."ID1"="T1"."ID1")
   3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   5 - filter("B"."ID1"<>83)

The above query contains two distinct subqueries:

exists (select null
        from t2 a
        where a.id1 = t1.id1
        )

not exists (select null
            from t2 b
            where b.id1 = t1.id1
            and   b.id1 != 83
            );

They verify the containment property because if we get rid of the predicate (b.id1 != 83) from the second one it becomes equivalent to the first one. The sole difference is that they are of different types: the first one is an exists subquery while the second one is a not exists subquery. Coalescing such a kind of subquery yields to the following query

select
     start_date,
     id1,
     n1
from t1
where
     start_date >= to_date('10062012','ddmmyyyy')
and exists     (select null
                from t2 a
                where a.id1 = t1.id1
                having sum (case when  a.id1 != 83
                            then 1 else 0 end) = 0
               );
START_DATE               ID1         N1
----------------- ---------- ----------
20120611 00:00:00         83         82

-----------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      1 |     557K|   1665 |
|*  1 |  FILTER              |      |      1 |        |      1 |     557K|   1665 |
|*  2 |   TABLE ACCESS FULL  | T1   |      1 |   4000 |   4000 |    1670 |   1665 |
|*  3 |   FILTER             |      |   4000 |        |      1 |     556K|      0 |
|   4 |    SORT AGGREGATE    |      |   4000 |      1 |   4000 |     556K|      0 |
|*  5 |     TABLE ACCESS FULL| T2   |   4000 |   1000 |   3000 |     556K|      0 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   2 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   3 - filter(SUM(CASE  WHEN "A"."ID1"<>83 THEN 1 ELSE 0 END )=0)
   5 - filter("A"."ID1"=:B1)

One of the immediate and obvious observations is that the coalescing process produces a dramatic performance alteration as far as we went from a clean 1947 logical I/O to a terrific 557 thousands of Buffers. This is a clear lesson that it is not because we’ve eliminated a full table access from the picture that we will have a performance improvement. Sometimes visiting twice a table might make the CBO generating an optimal execution plan. The collateral effect of the above coalescing process is that the CBO has been unable to unnest the new exist subquery and therefore it execute it as filter operation even though that the most resource expensive operation is operation 5. Important information in the new suboptimal plan is the terrific (4000) number of executed operations as shown by the Starts column in the execution plan. The operation at line 2 produces 4000 rows and it dictates the number of times the operation at line 3 (and its children operations at lines 4 and 5) it will be executed.

The CBO has been unable to unnest the new coalesced exist subquery because of the presence of the aggregate function (sum) as proofed by the corresponding 10053 trace file

*****************************
Cost-Based Subquery Unnesting
*****************************
SU: Unnesting query blocks in query block SEL$1 (#1) that are valid to unnest.
Subquery Unnesting on query block SEL$1 (#1)SU: Performing unnesting that
does not require costing.
SU: Considering subquery unnest on query block SEL$1 (#1).
SU:   Checking validity of unnesting subquery SEL$2 (#2)
SU:     SU bypassed: Failed aggregate validity checks.
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.

As far as unnesting is impossible de-coalescing might represent a good strategy in this case. And as far as we can go from the coalesced query to the original one thanks to the containment property, we can therefore open the unnesting path leading to original optimal plan we thought that we are going to improve by coalescing the two subqueries.

I was thinking in this particular case of subquery unnesting impossibility to take advantage of virtual column and hide the aggregation function in the coalesced exists subquery. Unfortunately it is impossible to create a virtual column using an aggregation function as shown below:

alter table t2 add aggr_virt number generated always
                    as (sum(case when id1 != 83 then 1 else 0 end)) virtual;
alter table t2 add aggr_virt number generated always as (sum(case when id1 != 83 then 1 else 0 end)) virtual
ERROR at line 1:
ORA-00934: group function is not allowed here

The Oracle documentation says the following about creating virtual columns

When you define a virtual column, the defining column_expr must refer only to columns of the subject table that have already been defined, in the current statement or in a prior statement.

Conclusion

Coalescing two subqueries of the same type might drastically reduce the number of logical I/O as far as it can eliminate an entire table access. Coalescing two subqueries of different types might pre-empt the CBO from taking advantage of the unnesting transformation. Fortunately if you know how to coalesce two different subqueries you will know how to de-coalesce them to allow the CBO taking advantage of unnesting the subqueries with their main query blocks.

Keep always an open eye on the concept of conjunction and disjunction with subqueries. The CBO is unable to unnest a subquery when it appears into a disjunction operation leading therefore to several executing of the subquery as a filter operation applied on the result set of the main query block. The CBO is however able to transform a conjunction of two subqueries of the same time into a single conjunction subquery and unnest it with its parent query block.


2014 in review

$
0
0

I have wished in my 2013 annual blogging review to publish 100 blog articles in 2014. Instead of those hoped 100 posts, there have been only 36 posts. In my defence I can use the list of articles I have published for Oracle Otn, Allthings Oracle and Toad World summarized here. Anyway, my 2014 blogging activity was not as I have expected it to be. I hope that the next year will be more fruitful and boosting as far as there are in front of me several challenging issues to investigate and all what I need is:

  • a reproducible model
  • few tests
  • few tests again with different Oracle releases
  • observations
  • consolidations

And end up by writing a document and submit it for critics and review.

Here below is my 2014 blogging activity summary. I wish you a happy new year

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 31,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 11 sold-out performances for that many people to see it.

Click here to see the complete report.


Approximate_ndv

$
0
0

A couple of months ago I have been asked to do a pro-active tuning task on an application running Oracle 11.2.0.3.0 under Linux x86 64-bit which is going life in production within the next couple of months. I have been supplied a 60 minutes AWR pre-production report covering a period during which a critical batch job was running. At my surprise the following SQL statement pops up at the top of several SQL Ordered by AWR parts:

select
  /*+ no_parallel(t)
      no_parallel_index(t)
      dbms_stats cursor_sharing_exact
      use_weak_name_resl
      dynamic_sampling(0)
      no_monitoring
      no_substrb_pad
   */
    count(*),
    count(distinct "SABR_ID"),
    sum(sys_op_opnsize("SABR_ID")),
    substrb(dump(min("SABR_ID"), 16, 0, 32), 1, 120),
    substrb(dump(max("SABR_ID"), 16, 0, 32), 1, 120),
    count(distinct "TYPE_ID"),
    sum(sys_op_opnsize("TYPE_ID")),
    substrb(dump(min(substrb("TYPE_ID", 1, 32)), 16, 0, 32), 1, 120),
    substrb(dump(max(substrb("TYPE_ID", 1, 32)), 16, 0, 32), 1, 120)
from
    "XXX"."SOM_ET_ORDER_KS" sample (33.0000000000) t

This is a call to dbms_stats package collecting statistics for SOM_ET_ORDER_KS table.
So far so good right?
Hmmm……
Instructed and experimented eyes would have been already caught by two things when observing the above SQL statement:

 count(distinct)
 sample(33,000000)

Do you know what do those two points above mean in an 11gR2 database?
The count (distinct) indicates that the application is collecting statistics under the global preference approximate_ndv set to false which I have immediately checked via this command:

SQL> select dbms_stats.get_prefs ('approximate_ndv') ndv from dual;
NDV
------
FALSE

While  the (sample 33) indicates that the estimate_percent parameter used by this application when collecting statistics is set to 33%

     (estimate_percent => 33)

Is this the best way of collecting statistics?
I don’t think so.
Let’s build a proof of concept. Below is the table on which I will be collecting statistics with and without approximate_ndv

create table t_ndv
as select
      rownum n1
     ,trunc((rownum-1)/3) n2
     ,trunc(dbms_random.value(1,100)) n3
     ,dbms_random.string('L',dbms_random.value(1,5))||rownum v4
  from dual
  connect by level <=1e6;

There are 1 million rows in this table with 1 million distinct n1 values and 1 million distinct v4 values. There are however 333,334 n2 distinct values and only 99 distinct n3 values.
Lets check the Oracle behaviour when approximate_ndv is disabled. All the following tests have been done on 12 Oracle database instance:


BANNER
---------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
PL/SQL Release 12.1.0.1.0 - Production
CORE    12.1.0.1.0      Production
TNS for 64-bit Windows: Version 12.1.0.1.0 - Production
NLSRTL Version 12.1.0.1.0 - Production
SQL> select dbms_stats.get_prefs ('approximate_ndv') ndv from dual;

NDV
-----
FALSE

SQL> alter session set events '10046 trace name context forever, level 12';

SQL> begin
  2      dbms_stats.gather_table_stats
  3      (user
  4      ,'t_ndv'
  5      ,method_opt => 'FOR ALL COLUMNS SIZE 1'
  6      ,estimate_percent => 57);
  7  end;
 8  /

SQL> alter session set events '10046 trace name context off';

The corresponding tkprofed trace file shows a piece of code which is, as expected, similar to above SQL I have found at the top of SQL Ordered by parts of the AWR I was asked to analyse

select /*+  no_parallel(t)
            no_parallel_index(t)
            dbms_stats
            cursor_sharing_exact
            use_weak_name_resl
            dynamic_sampling(0)
            no_monitoring
            xmlindex_sel_idx_tbl
           no_substrb_pad
       */
  count(*)
, count("N1")
, count(distinct "N1")
, sum(sys_op_opnsize("N1"))
, substrb(dump(min("N1"),16,0,64),1,240)
, substrb(dump(max("N1"),16,0,64),1,240)
, count("N2")
, count(distinct "N2"), sum(sys_op_opnsize("N2"))
 …etc...
from
 "C##MHOURI"."T_NDV" sample (57.0000000000)  t

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      8.03       8.03          0       3861          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      8.03       8.04          0       3861          0           1

And the collected statistics information on t_ndv table is

SQL> select num_rows, blocks, avg_row_len, sample_size
     from user_tables
     where table_name = 'T_NDV';

  NUM_ROWS     BLOCKS AVG_ROW_LEN SAMPLE_SIZE
---------- ---------- ----------- -----------
   1000586       3924          23      570334

SQL> select
         column_name
        ,num_distinct
        ,sample_size
    from user_tab_columns
    where table_name = 'T_NDV';

COLUMN_NAME          NUM_DISTINCT SAMPLE_SIZE
-------------------- ------------ -----------
V4                        1001467      570836
N3                             99      570836
N2                         333484      570836
N1                        1001467      570836

When I did the same experiments but with approximate_ndv set to true this is below what I have obtained

SQL> exec dbms_stats.set_global_prefs('approximate_ndv','TRUE');

SQL> exec dbms_stats.delete_table_stats(user ,'t_ndv');

SQL> alter session set events '10046 trace name context forever, level 12';

SQL> begin
  2      dbms_stats.gather_table_stats
  3      (user
  4      ,'t_ndv'
  5      ,method_opt       => 'FOR ALL COLUMNS SIZE 1'
         ,estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE
  6     );
  7  end;
  8  /

SQL> alter session set events '10046 trace name context off';

And finally the corresponding trace file particularly the part that corresponds to gathering number of distinct values

select /*+  full(t)
            no_parallel(t)
            no_parallel_index(t)
            dbms_stats
            cursor_sharing_exact
            use_weak_name_resl
            dynamic_sampling(0)
            no_monitoring
            xmlindex_sel_idx_tbl
            no_substrb_pad  */
  to_char(count("N1")),
  to_char(substrb(dump(min("N1"),16,0,64),1,240)),
  to_char(substrb(dump(max("N1"),16,0,64),1,240)),
  to_char(count("N2")),
  to_char(substrb(dump(min("N2"),16,0,64),1,240)),
  to_char(substrb(dump(max("N2"),16,0,64),1,240)),
  to_char(count("N3")),
  to_char(substrb(dump(min("N3"),16,0,64),1,240)),
  to_char(substrb(dump(max("N3"),16,0,64),1,240)),
  to_char(count("V4")),
  to_char(substrb(dump(min("V4"),16,0,64),1,240)),
  to_char(substrb(dump(max("V4"),16,0,64),1,240))
from
 "C##MHOURI"."T_NDV" t  /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      2.85       2.88          0       3861          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Total        3      2.87       2.88          0       3861          0           1

SQL> select num_rows, blocks, avg_row_len, sample_size
     from user_tables
    where table_name = 'T_NDV';

  NUM_ROWS     BLOCKS AVG_ROW_LEN SAMPLE_SIZE
---------- ---------- ----------- -----------
   1000000       3924          23     1000000

SQL> select
        column_name
       ,num_distinct
       ,sample_size
    from user_tab_columns
    where table_name = 'T_NDV';

COLUMN_NAME          NUM_DISTINCT SAMPLE_SIZE
-------------------- ------------ -----------
V4                         990208     1000000
N3                             99     1000000
N2                         337344     1000000
N1                        1000000     1000000

Several important remarks can be emphasized here. First, the elapsed time(which is almost cpu time) has been reduced from 8 seconds to less than 3 seconds. Second, the sample size used automatically by Oracle is 100 (in response to dbms_stats.auto_sample_size parameter I have set during my second statistics gathering attempt) which has the consequence of computing the exact number of rows (num_rows) that are actually present in the t_ndv table(1000000).

With less time and less cpu consumption, Oracle, under approximate_ndv preference set to true and estimate_percent set to dbms_stat.auto_sample_size, produced a perfect estimation when compared to the time and resource consumption it has necessitated when the same property was set to false.

When approximate_ndv is enabled Oracle is fast and accurate. Because if you compare the SQL code used when this property is disabled to the corresponding SQL when it is enabled you will point out that in the later case (enabled) there is no call anymore to the costly count(distinct) function. There is instead a strange call to a couple of NDV, NIL, NIL function as shown below:

from
 "C##MHOURI"."T_NDV" t  /* NDV,NIL,NIL
                          ,NDV,NIL,NIL
                          ,NDV,NIL,NIL
                          ,NDV,NIL,NIL
                         */

which I have intentionnaly arranged to show you that the number of NDV, NIL, NIL coincides with the number of columns (4) of the t_ndv table

SQL> desc t_ndv
     Name      Type
-------------- ------------------
N1             NUMBER
N2             NUMBER
N3             NUMBER
V4             VARCHAR2(4000)

And immediately after this piece of code the trace file show the following SQL statement

SELECT /*+ parallel */ TO_NUMBER(EXTRACTVALUE(VALUE(T),
  '/select_list_item/pos') + 1) POS, EXTRACTVALUE(VALUE(T),
  '/select_list_item/value') VAL, TO_NUMBER(EXTRACTVALUE(VALUE(T),
  '/select_list_item/nonnulls')) NONNULLS, TO_NUMBER(EXTRACTVALUE(VALUE(T),
  '/select_list_item/ndv')) NDV, TO_NUMBER(EXTRACTVALUE(VALUE(T),
  '/select_list_item/split')) SPLIT, TO_NUMBER(EXTRACTVALUE(VALUE(T),
  '/select_list_item/rsize')) RSIZE, TO_NUMBER(EXTRACTVALUE(VALUE(T),
  '/select_list_item/rowcnt')) ROWCNT, TO_NUMBER(EXTRACTVALUE(VALUE(T),
  '/select_list_item/topncnt')) TOPNCNT, EXTRACT(VALUE(T),
  '/select_list_item/topn_values').GETCLOBVAL() TOPN, NULL MINFREQ, NULL
  MAXFREQ, NULL AVGFREQ, NULL STDDEVFREQ
FROM
 TABLE(XMLSEQUENCE(EXTRACT(:B1 , '/process_result/select_list_item'))) T
  ORDER BY TOPNCNT DESC

which means that Oracle is using internally an in memory xml structue when calculating the number of distinct values of each table column making this process fast and accurate

Bottom Line: I was not asked to look at the manner used by the local DBAs to collect statistics but, being curious at all what I look at when diagnosing performance issues, my attention was kept by this TOP SQL statement one would have considered as a purely coincidence and would have not analysed while it has permitted me to discuss with the local DBAs and to change the way they are collecting statistics.



Resumable time out

$
0
0

I was trying to create a very big table through a create table as select from a join between two huge tables forcing both parallel DDL and parallel QUERY and was mainly concerned about the amount of temp space the hash join between those two big tables will require to get my table smoothly created. I said hash join because my first attempt to create this table used a nested loop join which was driving a dramatic 100 million starts of table access by index rowid since more than 10 hours when I decided to kill the underlying session. The create table being a one-shot process I decided to hint the optimizer so that it will opt for a hash join operation instead of the initial nested loop. But my concern has been transferred from a long running non finishing SQL statement to how much my create table will need of TEMP space in order to complete successfully.

I launched the create table and started monitoring it with Tanel Poder snapper script, v$active_session_history and the Real Time SQL Monitoring feature (RTSM). This is what the monitoring was showing

SQL> select
          event
	 ,count(1)
      from
          v$active_session_history
      where sql_id = '0xhm7700rq6tt'
      group by event
      order by 2 desc;


EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

However few minutes later Tanel Poder snapper started showing the following dominant wait event

SQL> @snapper ash 5 1 all

----------------------------------------------------------------------------------------------------
Active% | INST | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
----------------------------------------------------------------------------------------------------
   800% |    1 | 0xhm7700rq6tt   | 0         | statement suspended, wait error to  | Configuration

--  End of ASH snap 1, end=2015-02-12 09:12:57, seconds=5, samples_taken=45

Event which keeps incrementing in v$active_session_history as shown below:

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                          7400
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

7 rows selected.

SQL> /

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                          7440
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

7 rows selected.

SQL> /

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                          7480
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                         11943
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        980
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

The unique wait event that was incrementing is that bizarre statement suspended, wait error to be cleared. The session from which I launched the create table statment was hanging without reporting any error.

As far as I was initially concerned by the TEMP space I checked the available space there

SQL> select
       tablespace_name,
       total_blocks,
       used_blocks,
       free_blocks
     from v$sort_segment;

TABLESPACE_NAME                 TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS
------------------------------- ------------ ----------- -----------
TEMP                                 8191744     8191744           0

No available free space left in TEMP tablespace

The RTSM of the corresponding sql_id was showing the following

=====================================================================================
| Id    |              Operation              |        Name        |  Rows   | Temp |
|       |                                     |                    |(Actual) |      |
=====================================================================================
|     0 | CREATE TABLE STATEMENT              |                    |       0 |      |
|     1 |   PX COORDINATOR                    |                    |         |      |
|     2 |    PX SEND QC (RANDOM)              | :TQ10003           |         |      |
|     3 |     LOAD AS SELECT                  |                    |         |      |
|     4 |      HASH GROUP BY                  |                    |         |      |
|     5 |       PX RECEIVE                    |                    |         |      |
|     6 |        PX SEND HASH                 | :TQ10002           |         |      |
|  -> 7 |         HASH JOIN                   |                    |       0 |  66G | --> spot this
|       |                                     |                    |         |      |
|       |                                     |                    |         |      |
|       |                                     |                    |         |      |
|     8 |          PX RECEIVE                 |                    |    626M |      |
|     9 |           PX SEND BROADCAST         | :TQ10001           |    626M |      |
|    10 |            HASH JOIN                |                    |     78M |      |
|    11 |             PX RECEIVE              |                    |    372K |      |
|    12 |              PX SEND BROADCAST      | :TQ10000           |    372K |      |
|    13 |               PX BLOCK ITERATOR     |                    |   46481 |      |
|    14 |                INDEX FAST FULL SCAN | INDX_12453656_TABL |   46481 |      |
|    15 |             PX BLOCK ITERATOR       |                    |     88M |      |
|    16 |              TABLE ACCESS FULL      | TABL1              |     88M |      |
|       |                                     |                    |         |      |
| -> 17 |          PX BLOCK ITERATOR          |                    |    682M |      |
| -> 18 |           TABLE ACCESS FULL         | TABLE123456        |    682M |      |
|       |                                     |                    |         |      |
=====================================================================================

The HASH JOIN operation at line 7 has already gone above the available TEMP tablespace size which is 64G. I was wondering then, why my session has not been stopped with an ORA-01652 error instead of hanging there and letting the wait event statement suspended, wait error to be cleared popping up in my different monitoring tools.

After few minutes of googling tuning steps I ended up by executing the following query

SQL>select
        coord_session_id
       ,substr(sql_text, 1,10)
       ,error_msg
    from dba_resumable;

SESSION_ID  SQL        ERROR_MSG
----------- ---------  -------------------------------------------------------------------
146         create /*+  ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

This 146 SID is my create table session! It is in error but hanging and waiting on that statement suspend wait event.

After few minutes of googling tuning again I ended up by checking the following parameter:

SQL> sho parameter resumable

NAME                           TYPE                             VALUE
------------------------------ -------------------------------- ------
resumable_timeout              integer                          9000

Notice now how I finally have made a link between the title and the purpose of this article: resumable_timeout. The Oracle documentation states that when this parameter is set to a nonzero value the resumable space allocation might kicks in.

Put is simply what happens in my case is that since this parameter has been set to 9000 minutes and since my create table as select has encountered an ORA-01652 temp tablespace error, Oracle decided to put my session in a resumable state until someone will point out that space shortage,correct the situation and allow the session to continue its normal processing

The next step I did is to abort my session from its resumable state

SQL> exec dbms_resumable.abort(146);

PL/SQL procedure successfully completed.

Unfortunately I waited a couple of minutes in front of my sqlplus session waiting for it to resume without success so that I decided to use the brute force

ERROR:
ORA-03113: end-of-file on communication channel
Process ID: 690975
Session ID: 146 Serial number: 3837

Summary
I would suggest to do not set the resumable_timeout parameter to a nonzero value. Otherwise you will be hanging on a statement suspended, wait error to be cleared event during an amount of time indicated by this resumable_timeout parameter looking around for what is making your SQL statement hanging.

Footnote
Here below few extra words from Oracle documentation about Resumable space allocation

“Resumable space allocation avoids headaches and saves time by suspending, instead
of terminating, a large database operation requiring more disk space than is currently
available. While the operation is suspended, you can allocate more disk space on
the destination tablespace or increase the quota for the user. Once the low space
condition is addressed, the large database operation automatically picks up where it
left off.
As you might expect, the statements that resume are known as resumable statements.
The suspended statement, if it is part of a transaction, also suspends the transaction.
When disk space becomes available and the suspended statement resumes, the
transaction can be committed or rolled back whether or not any statements in
the transactions were suspended. The following conditions can trigger resumable
space allocation:
■ Out of disk space in a permanent or temporary tablespace
■ Maximum extents reached on a tablespace
■ User space quota exceeded
You can also control how long a statement can be suspended. The default time
interval is two hours, at which point the statement fails and returns an error message
to the user or application as if the statement was not suspended at all.
There are four general categories of commands that can be resumable:
(1) SELECT statements, (2) DML commands, (3) SQL*Loader operations, and
(4) DDL statements that allocate disk space.”


Parallel Query, Broadcast distribution and TEMP space

$
0
0

When it comes to parallel operations there are fundamentally two data distribution methods between individual parallel (PX) server sets: hash-hash and broadcast distribution. Suppose that you are going to operate a parallel HASH JOIN between two full table scans using a Degree of Parallelism (DOP) of 2. Since the HASH JOIN operation needs two distinct set of PX servers there will be actually always 2*DOP = 4 slaves acting in parallel. They represent two slaves per PX set: (PX1 and PX2) for the first set and (PX3 and PX4) for the second set. Each table in the join is read by one of the PX server set in parallel. When both PX servers have finished collecting their data set they need to distribute it to the subsequent parallel HASH JOIN operation. This data distribution can be done using, commonly, either a hash-hash or a broadcast distribution. Using the former method (hash-hash) the result set gathered by the parallel scan of the build and the probe tablesin the join are both sent to the parallel server set responsible for the hash join operation. Using a BROADCAST distribution method, instead of distributing rows from both result sets Oracle sends the smaller result set to all parallel server slaves of the set responsible of the subsequent parallel hash joins operation.

I bolded the word smaller result set to emphasize that smaller is relative to the other row source in the join. Do you consider that a build table with 78 million of rows is a small data set? It might be considered smaller in the eye of the Oracle optimizer when the probe table is estimated to generate 770 million of rows. This is exactly what happened to me.

If this concept of parallel data distribution is still not clear for you and you want to understand what I have encountered and how I’ve managed to pull out myself from this nightmare then continuing reading this article might be worth the effort.

I was asked to create a big table based on a join with two other big tables so that doing this serially was practically impossible to complete in an acceptable execution time. I decided, therefore, to create it in parallel. After having enabled parallel DML and forced parallel DDL I launched the following create table statement with a DOP of 8 both for DML and DDL

SQL> create /*+ parallel(8) */
       table table_of_dates
       tablespace dtbs
       pctfree 0
    as
    select
        /*+ parallel(8)
          full(t)
          full(tr)
      */
       t.tr_tabl_id
     , t.table1_date_time
     , t.writing_date
     , min(tr.arrivl_date)
     , max(tr.arrivl_date)
   from
      table1 t
   left outer join table2 tr
   on t.tr_tabl_id = tr.tr_tabl_id
   join table3
   on t.order_id = table3.order_id
   and tr.status not in ('CANCELED')
   where t.writing_date <= to_date('17.06.2011', 'dd.mm.yyyy')
   and table3.order_type = ‘Broadcast’
   group by
         t.tr_tabl_id
       , t.table1_date_time
       , t.writing_date;

create /*+ parallel(8) */
*
ERROR at line 1:
ORA-12801: error signaled in parallel query server P013
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

As you can notice, it went with an ORA-01652 error. Below it the corresponding Real Time SQL Monitoring Report showing that the HASH JOIN operation at line 7 reaches 67G which is beyond the size of the current physical TEMP tablespace (64G) and hence the ORA-01652 error.

Parallel Execution Details (DOP=8, Servers Allocated=16)
SQL Plan Monitoring Details (Plan Hash Value=3645515647)
=====================================================================================
| Id |              Operation              |        Name | Execs |   Rows   | Temp  |
|    |                                     |             |       | (Actual) | (Max) |
=====================================================================================
|  0 | CREATE TABLE STATEMENT              |             |    17 |          |       |
|  1 |   PX COORDINATOR                    |             |    17 |          |       |
|  2 |    PX SEND QC (RANDOM)              | :TQ10003    |       |          |       |
|  3 |     LOAD AS SELECT                  |             |       |          |       |
|  4 |      HASH GROUP BY                  |             |       |          |       |
|  5 |       PX RECEIVE                    |             |       |          |       |
|  6 |        PX SEND HASH                 | :TQ10002    |     8 |          |       |
|  7 |         HASH JOIN                   |             |     8 |        0 |   67G |
|  8 |          PX RECEIVE                 |             |     8 |     626M |       |
|  9 |           PX SEND BROADCAST         | :TQ10001    |     8 |     626M |       |
| 10 |            HASH JOIN                |             |     8 |      78M |       |
| 11 |             PX RECEIVE              |             |     8 |     372K |       |
| 12 |              PX SEND BROADCAST      | :TQ10000    |     8 |     372K |       |
| 13 |               PX BLOCK ITERATOR     |             |     8 |    46481 |       |
| 14 |                INDEX FAST FULL SCAN | IDX_TABLE3_3|   182 |    46481 |       |
| 15 |             PX BLOCK ITERATOR       |             |     8 |      88M |       |
| 16 |              TABLE ACCESS FULL      | TABLE1      |   120 |      88M |       |
| 17 |          PX BLOCK ITERATOR          |             |     8 |     717M |       |
| 18 |           TABLE ACCESS FULL         | TABLE2      |   233 |     717M |       |
=====================================================================================

I stumped few minutes looking at the above execution plan and have finally decided to try a second create table with a reduced degree of parallelism (4 instead of 8) and here what I got

SQL> create /*+ parallel(4) */
       table table_of_dates
       tablespace dtbs
       pctfree 0
    as
    select
        /*+ parallel(4)
         full(t)
         full(tr)
      */
       t.tr_tabl_id
     , t.table1_date_time
     , t.writing_date
     , min(tr.arrivl_date)
     , max(tr.arrivl_date)
   from
      table1 t
   left outer join table2 tr
   on t.tr_tabl_id = tr.tr_tabl_id
   join table3
   on t.order_id = table3.order_id
   and tr.status not in ('CANCELED')
   where t.writing_date <= to_date('17.06.2011', 'dd.mm.yyyy')
   and table3.order_type = ‘Broadcast’
   group by
         t.tr_tabl_id
       , t.table1_date_time
       , t.writing_date;

Table created.
Elapsed: 00:31:42.29

The table has been this time successfully created within 32 minutes approximatively.

Before going to the next issue in the pipe, I wanted to understand why reducing the Degree Of Parallelism (DOP) from 8 to 4 made the create statement successful? The obvious thing I have attempted was to compare the 8 DOP execution plan with the 4 DOP one. The first plan has already been shown above. The second one is presented here below (reduced only to the information that is vital to the aim of this article):

Parallel Execution Details (DOP=4 , Servers Allocated=8)
SQL Plan Monitoring Details (Plan Hash Value=326881411)
=============================================================================================
| Id |                 Operation                  |       Name     | Execs |   Rows   |Temp |
|    |                                            |                |       | (Actual) |(Max)|
=============================================================================================
|  0 | CREATE TABLE STATEMENT                     |                |     9 |        4 |     |
|  1 |   PX COORDINATOR                           |                |     9 |        4 |     |
|  2 |    PX SEND QC (RANDOM)                     | :TQ10003       |     4 |        4 |     |
|  3 |     LOAD AS SELECT                         |                |     4 |        4 |     |
|  4 |      HASH GROUP BY                         |                |     4 |      75M |     |
|  5 |       PX RECEIVE                           |                |     4 |     168M |     |
|  6 |        PX SEND HASH                        | :TQ10002       |     4 |     168M |     |
|  7 |         HASH JOIN                          |                |     4 |     168M | 34G |
|  8 |          PX RECEIVE                        |                |     4 |     313M |     |
|  9 |           PX SEND BROADCAST                | :TQ10001       |     4 |     313M |     |
| 10 |            HASH JOIN                       |                |     4 |      78M |     |
| 11 |             BUFFER SORT                    |                |     4 |     186K |     |
| 12 |              PX RECEIVE                    |                |     4 |     186K |     |
| 13 |               PX SEND BROADCAST            | :TQ10000       |     1 |     186K |     |
| 14 |                TABLE ACCESS BY INDEX ROWID | TABLE3         |     1 |    46481 |     |
| 15 |                 INDEX RANGE SCAN           | IDX_ORDER_TYPE |     1 |    46481 |     |
| 16 |             PX BLOCK ITERATOR              |                |     4 |      88M |     |
| 17 |              TABLE ACCESS FULL             | TABLE1         |   115 |      88M |     |
| 18 |          PX BLOCK ITERATOR                 |                |     4 |     770M |     |
| 19 |           TABLE ACCESS FULL                | TABLE2         |   256 |     770M |     |
=============================================================================================

They don’t share exactly the same execution plan (they have two different plan hash values). The irritating question was: why, by halving down the degree of parallelism from 8 to 4 the same SQL statement necessitated almost half (34G) the amount of TEMP space and completed successfully?

The answer resides into the parallel distribution (PQ Distrib) method used by the parallel server sets to distribute their collected set of rows to the subsequent parallel server set (doing the hash join).

In the DOP 4 execution plan above we can see that we have 4 PX parallel server sets each one responsible of filling up one of the virtual TQ tables: TQ10000, TQ10001, TQ10002 and TQ10003. Here below is how to read the above execution plan:

  1.  The first PX1 set of slaves reads TABLE3 and broadcast its data set to the second PX2 set of slaves through the TQ10000 virtual table.
  2.  PX2 set reads TABLE1, hash join it with the data set it has received (TQ10000) from PX1 set and broadcast its result to the next parallel server
    set which is PX3 via the second TQ10001 virtual table.
  3. PX3 set of parallel slaves probes TABLE2 by parallel full scanning it and hash join it with the build result set (TQ10001) it has received from
    PX2 parallel set. This operation ends up by filling up the third virtual TQ table TQ10002 and by sending it to the next and last PX server PX4
    using a hash distribution.
  4. Finally, PX4 set of slaves will receive the TQ10002 data, hash group by it, fill the last virtual table (TQ10003) table and send it to the query
    coordinator (QC) which will end up by creating the table_of_dates table

That is a simple way of reading a parallel execution plan. By listing the above parallel operations I aimed to emphasize that data (HASH JOIN operation at line 10) produced by the parallel set of slaves PX2 is broadcasted (PX SEND BROADCAST operation at line 9) to the next parallel set PX3. And this means that each slave of PX2 set will pass every row it has received to every slave of the PX3 set. A typical reason to do such a distribution method is that the data set of the first row source (78M of rows in this case) is ”smaller” than the second row source to be joined with (770M of rows).

In order to make the picture clear let’s zoom around the hash join operation in the 4 DOP plan

Parallel Execution Details (DOP=4 , Servers Allocated=8)
======================================================================
| Id |        Operation    |       Name     | Execs |   Rows   |Temp |
|    |                     |                |       | (Actual) |(Max)|
======================================================================
|  7 |HASH JOIN            |                |     4 |     168M | 34G |
|  8 | PX RECEIVE          |                |     4 |     313M |     |
|  9 |  PX SEND BROADCAST  | :TQ10001       |     4 |     313M |     |
| 10 |   HASH JOIN         |                |     4 |      78M |     |
| 18 | PX BLOCK ITERATOR   |                |     4 |     770M |     |
| 19 |  TABLE ACCESS FULL  | TABLE2         |   256 |     770M |     |
======================================================================

The ”smaller” result set produced by the PX2 set of slaves at line 10 is 78M of rows. As far as Oracle decided to broadcast those 78M of rows towards PX3 set it has been duplicated as many times as there are slaves in PX3 set to receive the broadcasted data. As far as we have a DOP of 4 then this means that the HASH JOIN operation at line 7 has received 4*78 = 313M of rows (operation at line 8) that has been built and hashed in memory in order to be able to probe the result set produced by the second parallel server set coming from operations 18-19.

Multiply the DOP by 2, keep the same parallel distribution method and the same HASH JOIN operation will have to build and hash a table of 8*78 = 626M of rows. Which ultimately has required more than 67GB of TEMP space (instead of the initial DOP 4 34GB) as shown below:

Parallel Execution Details (DOP=8, Servers Allocated=16)
===================================================================
| Id |     Operation     |        Name | Execs |   Rows   | Temp  |
|    |                   |             |       | (Actual) | (Max) |
===================================================================
|  7 |HASH JOIN          |             |     8 |        0 |   67G |
|  8 | PX RECEIVE        |             |     8 |     626M |       |
|  9 |  PX SEND BROADCAST| :TQ10001    |     8 |     626M |       |
| 10 |   HASH JOIN       |             |     8 |      78M |       |
| 17 | PX BLOCK ITERATOR |             |     8 |     717M |       |
| 18 |  TABLE ACCESS FULL| TABLE2      |   233 |     717M |       |
===================================================================

Now that I know from where the initial ORA-01652 error is coming from, changing the parallel distribution method from BROADCAST to HASH might do the job without requiring more than 64GB of TEMP space. A simple way to accomplish this task is to hint the parallel select to use the desired parallel distribution method. From the outline of the successful DOP 4 execution plan I took the appropriate hint (pq_distribute(alias hash hash)), adapted it and issued the following create table:

SQL> create /*+ parallel(8) */
       table table_of_dates
       tablespace dtbs
       pctfree 0
    as
    select
        /*+ parallel(8)
          full(t)
          full(tr)
	  pq_distribute(@"SEL$E07F6F7C" "T"@"SEL$2" hash hash)
          px_join_filter(@"SEL$E07F6F7C" "T"@"SEL$2")
          pq_distribute(@"SEL$E07F6F7C" "TR"@"SEL$1" hash hash)
          px_join_filter(@"SEL$E07F6F7C" "TR"@"SEL$1")
      */
       t.tr_tabl_id
     , t.table1_date_time
     , t.writing_date
     , min(tr.arrivl_date)
     , max(tr.arrivl_date)
   from
      table1 t
   left outer join table2 tr
   on t.tr_tabl_id = tr.tr_tabl_id
   join table3
   on t.order_id = table3.order_id
   and tr.status not in ('CANCELED')
   where t.writing_date <= to_date('17.06.2011', 'dd.mm.yyyy')
   and table3.order_type = ‘Broadcast’
   group by
         t.tr_tabl_id
       , t.table1_date_time
       , t.writing_date;

Table created.
Elapsed: 00:12:46.33

And the job has been done in less than 13 minutes instead of the initial 33 minutes with DOP 4.

The new execution plan is:

SQL Plan Monitoring Details (Plan Hash Value=5257928)
====================================================================================
| Id |              Operation              |        Name |Execs |   Rows   | Temp  |
|    |                                     |             |      | (Actual) | (Max) |
====================================================================================
|  0 | CREATE TABLE STATEMENT              |             |   17 |        8 |       |
|  1 |   PX COORDINATOR                    |             |   17 |        8 |       |
|  2 |    PX SEND QC (RANDOM)              | :TQ10004    |    8 |        8 |       |
|  3 |     LOAD AS SELECT                  |             |    8 |        8 |       |
|  4 |      HASH GROUP BY                  |             |    8 |      75M |   10G |
|  5 |       HASH JOIN                     |             |    8 |     168M |       |
|  6 |        JOIN FILTER CREATE           | :BF0000     |    8 |      78M |       |
|  7 |         PX RECEIVE                  |             |    8 |      78M |       |
|  8 |          PX SEND HASH               | :TQ10002    |    8 |      78M |       |
|  9 |           HASH JOIN BUFFERED        |             |    8 |      78M |    3G |
| 10 |            JOIN FILTER CREATE       | :BF0001     |    8 |    46481 |       |
| 11 |             PX RECEIVE              |             |    8 |    46481 |       |
| 12 |              PX SEND HASH           | :TQ10000    |    8 |    46481 |       |
| 13 |               PX BLOCK ITERATOR     |             |    8 |    46481 |       |
| 14 |                INDEX FAST FULL SCAN | IDX_TABLE3_3|  181 |    46481 |       |
| 15 |            PX RECEIVE               |             |    8 |      88M |       |
| 16 |             PX SEND HASH            | :TQ10001    |    8 |      88M |       |
| 17 |              JOIN FILTER USE        | :BF0001     |    8 |      88M |       |
| 18 |               PX BLOCK ITERATOR     |             |    8 |      88M |       |
| 19 |                TABLE ACCESS FULL    | TABLE1      |  121 |      88M |       |
| 20 |        PX RECEIVE                   |             |    8 |     770M |       |
| 21 |         PX SEND HASH                | :TQ10003    |    8 |     770M |       |
| 22 |          JOIN FILTER USE            | :BF0000     |    8 |     770M |       |
| 23 |           PX BLOCK ITERATOR         |             |    8 |     770M |       |
| 24 |            TABLE ACCESS FULL        | TABLE2      |  245 |     770M |       |
====================================================================================

Changing the parallel distribution method from broadcast to hash-hash has not only reduced the TEMP space usage but has halved the execution time of the create table.

Conclusion
Watch carefully the degree of parallelism you are going to use. In case of a broadcast distribution method for relatively big ”smaller result set” you might end up by devouring a huge amount of TEMP space and failing to succeed with the ORA-01652: unable to extend temp segment by 128 in tablespace TEMP. Hopefully, with the arrival of the 12c Adaptive Distribution Method and the new HYBRID HASH parallel distribution method, the STATISTIC COLLECTOR operation placed below the distribution method will operate a dynamic switch from a BROADCAST distribution to a HASH distribution whenever the number of rows to be distributed exceeds a threshold which is 2*DOP. With my initial DOP of 8 I would have had a threshold of 16 which is largely below the 78 million of rows that I am due to distribute further up.


Index compression: two birds with one stone

$
0
0

Have you ever been asked to trouble shoot a performance issue of a complex query with the following restrictions:

  • You are not allowed to change the code of the query because it belongs to a third party software
  • You are not allowed to create a new index because of disk space stress
  • You are supposed to solve the issue without using  neither a SPM baseline nor a SQL Profile

What do you think you still have on your hands to tackle this issue?

I was quite confident that the performance issue was coming, as almost always, from a poor or not representative statistics which ultimately have biased the Optimizer choosing a wrong execution path. The row source execution plan taken this time from the Real Time Sql Monitoring report confirms my initial feeling about non representative statistics as shown by the several 1 estimated cardinality below (Rows Estim):

SQL Plan Monitoring Details (Plan Hash Value=2278065992)
==============================================================================================
| Id |                Operation                 |          Name |  Rows   | Execs |   Rows   |
|    |                                          |               | (Estim) |       | (Actual) |
==============================================================================================
|  0 | SELECT STATEMENT                         |               |         |     1 |     1059 |
|  1 |   HASH GROUP BY                          |               |       1 |     1 |     1059 |
|  2 |    FILTER                                |               |         |     1 |     135K |
|  3 |     NESTED LOOPS                         |               |       1 |     1 |     135K |
|  4 |      NESTED LOOPS                        |               |       1 |     1 |     135K |
|  5 |       NESTED LOOPS OUTER                 |               |       1 |     1 |     135K |
|  6 |        NESTED LOOPS                      |               |       1 |     1 |     135K |
|  7 |         HASH JOIN OUTER                  |               |       1 |     1 |     145K |
|  8 |          NESTED LOOPS                    |               |         |     1 |     145K |
|  9 |           NESTED LOOPS                   |               |       1 |     1 |     145K |
| 10 |            NESTED LOOPS                  |               |       1 |     1 |     146K |
| 11 |             NESTED LOOPS                 |               |       1 |     1 |     146K |
| 12 |              NESTED LOOPS                |               |       1 |     1 |        1 |
| 13 |               NESTED LOOPS               |               |       1 |     1 |        1 |
| 14 |                FAST DUAL                 |               |       1 |     1 |        1 |
| 15 |                FAST DUAL                 |               |       1 |     1 |        1 |
| 16 |               FAST DUAL                  |               |       1 |     1 |        1 |
| 17 |              TABLE ACCESS BY INDEX ROWID | TABLE_XY      |       1 |     1 |     146K |
| 18 |               INDEX RANGE SCAN           | IDX_TABLE_XY23|       1 |     1 |      12M |
==============================================================================================
17 - filter(("AU"."NEW_VALUE"=:SYS_B_119 AND "AU"."ATTRIBUTE_NAME"=:SYS_B_118))
18 - access("AU"."UPDATED_DATE">=TO_DATE(:SYS_B_001||TO_CHAR(EXTRACT(MONTH FROM
              ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_002||TO_CHAR(EXTRACT(YEAR FROM
              ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_003,:SYS_B_004)
              AND "AU"."COURSE_NAME"=:SYS_B_117
              AND "AU"."UPDATED_DATE"<=TO_DATE(TO_CHAR(LAST_DAY(ADD_MONTHS(CURRENT_DATE,(-
             :SYS_B_000))),:SYS_B_005)||:SYS_B_006,:SYS_B_007))
     filter("AU"."COURSE_NAME"=:SYS_B_117)

Collecting statistics adequately was of course the right path to follow; however looking at the above execution plan I have realized that the most consuming operation was an index range scan followed by a table access by index rowid. The index operation at line 18 was supplying its parent operation at line 17 with 12 million worth of rows from which the filter at this line has allowed only 2% (146K) of the rows to survive its elimination. This is a classical problem of imprecise index wasting a lot of time and energy in throwing away rows that should have been eliminated earlier.  There is also another indication about the imprecision of the index used at operation line 18. Its predicate part contains both an access and a filter operation. In order to make the picture clear here below is the index definition and the predicate part used in the problematic query:

IDX_TABLE_XY23 (UPDATED_DATE, COURSE_NAME)

FROM TABLE_XY     
WHERE AU.COURSE_NAME = ‘Point’
AND AU.UPDATED_DATE  >= PidDates.END_DATE

The index has been defined to start with the column on which an inequality is applied. We should instead always place at the leading edge of the index the column on which we have the intention to apply an equality predicate. One solution would be to reverse the above index columns. But I was not going to do that without checking the whole application looking for queries using the UPDATED_DATE column in equality predicate so that reversing the columns of that index will harm them. Hopefully there were no such queries and the way was already paved for me to proceed to that index columns order reversing proposition.

Few minutes before going ahead I remembered that this application is suffering from a disk space shortage so that compressing the index would certainly help. Moreover the “new” leading index column is the less repetitive one which hopefully will give a better level of compressibility:

SQL> select column_name, num_distinct
     from all_tab_col_statistics
     where table_name = 'TABLE_XY'
     and column_name in ('UPDATED_DATE','COURSE_NAME');

COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------
UPDATED_DATE                   1309016064
COURSE_NAME                    63
SQL> create index idx_TABLE_XYmh on TABLE_XY(COURSE_NAME, UPDATED_DATE) compress 1;

And here are the results

Before the new index

1059 rows selected.
Elapsed: 00:32:37.34

Global Stats
================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
================================================================
|    2020 |     154 |     1867 |    72 |    10M |   1M |   9GB |
================================================================

After the new index

1059 rows selected.
Elapsed: 00:19:56.08

Global Stats
========================================================
| Elapsed |   Cpu   |    IO    | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Gets  | Reqs | Bytes |
========================================================
|    1204 |      70 |     1134 |     9M | 463K |   4GB |
========================================================

The query execution time dropped from 33 minutes to 19 minutes and so did the logical and physical I/O.

Of course I am still throwing the same amount of rows at the table level a shown below

SQL Plan Monitoring Details (Plan Hash Value=2577971998)
===================================================================
| Id    |  Operation                 |          Name   |   Rows   |
|       |                            |                 | (Actual) |
===================================================================
|    17 |TABLE ACCESS BY INDEX ROWID | TABLE_XY        |     146K |
|    18 | INDEX RANGE SCAN           | IDX_TABLE_XYMH  |      12M |
===================================================================
17 - filter(("AU"."NEW_VALUE"=:SYS_B_119 AND "AU"."ATTRIBUTE_NAME"=:SYS_B_118))
18 - access("AU"."COURSE_NAME"=:SYS_B_117 AND
           "AU"."UPDATED_DATE">=TO_DATE(:SYS_B_001||TO_CHAR(EXTRACT(MONTH FROM
            ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_002||TO_CHAR(EXTRACT(YEAR FROM
            ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_003,:SYS_B_004) AND
           "AU"."UPDATED_DATE"<=TO_DATE(TO_CHAR(LAST_DAY(ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000)))
           ,:SYS_B_005)||:SYS_B_006,:SYS_B_007))

But I have drastically reduced the part of the index I was traversing before reversing the columns order and I have got rid of the filter at the index predicate.  And finally, because it is the title of this article, by compressing the index I have gained 67GB

SQL> select segment_name, trunc(bytes/1024/1024/1024) GB
    from dba_segments
    where segment_type = 'INDEX'
    and segment_name in ('IDX_TABLE_XYMH','IDX_TABLE_XY23');

SEGMENT_NAME                           GB
------------------------------ ----------
IDX_TABLE_XYMH                        103
IDX_TABLE_XY23                        170

 


Diagnosing the past

$
0
0

This is a simple note about diagnosing a situation that happened in the past.

A running application suffered a delay in many of its multi-user insert statements blocked on an enq: TM-row lock contention.

Capture2

The on call DBA was just killing the culprit blocking session when I received an e-mail asking to investigate the root cause of this lock . As far as this was a task of diagnosing a very recent past, using v$active_session_history imposes itself:

select sql_id,event, count(1)
from v$active_session_history
where sample_time > to_date('05032015 10:00:00','ddmmrrrr hh24:mi:ss')
and   sample_time < to_date('05032015 11:00:00','ddmmrrrr hh24:mi:ss')
and event like '%TM%'
group by sql_id,event
order by 3 desc;

3fbwp7qdqxk9v     enq: TM - contention    1

Surprisingly there was only 1 recorded enq: TM-row lock contention wait event during the exact same period of blocked insert statements; and the corresponding sql_id has nothing to do with the blocked inserts.

I stumped few minutes looking bizarrely to the above select and started thinking about the basics which say:

  • v$active_session_history is a sample of all active sessions taken every second.
  • dba_hist_active_sess_history isa one-in-ten samples of v$active_session_history

I knew as well that v$active_session_history being an in-memory buffer the retention period is henceforth depending on the size of the buffer and the volume of active sessions. But, I was diagnosing almost a real time situation; so why there were no  “functional” enq: TM-row lock contention in v$active_session_history?

Finally, I decided to use the less precise one-in-ten sample table and this is what I’ve got:

select sql_id,event, count(1)
from dba_hist_active_sess_history
where sample_time > to_date('05032015 10:00:00','ddmmrrrr hh24:mi:ss')
and   sample_time < to_date('05032015 11:00:00','ddmmrrrr hh24:mi:ss')
and event like '%TM%'
group by sql_id,event
order by 3 desc;

53xthsbv8d7yk     enq: TM - contention    4878
1w95zpw2fy021     enq: TM - contention    340
35ghv3bugv22a     enq: TM - contention    264
8b9nqpzs24n0t     enq: TM - contention    163
aqdaq2ybqkrpa     enq: TM - contention    156
50gygyqsha3nr     enq: TM - contention    103
fzfvzhjg0p6y0     enq: TM - contention    82
bs359cfsq4fvc     enq: TM - contention    80
15xpc3um0c3a2     enq: TM - contention    58
d0rndrymh0b18     enq: TM - contention    49
864jbgkbpvcnf     enq: TM - contention    40
9cn21y7hbya46     enq: TM - contention    36
8419w8jnhfa3m     enq: TM - contention    33
f71jbkdy94pph     enq: TM - contention    5
2zpyy8wbnp5d0     enq: TM - contention    3
0d6gq7b9j522p     enq: TM - contention    2

Normally what we can see in dba_hist_active_sess_history has certainly travelled via v$active_session_history; and the more recent is the situation the more is the chance we have to find this situation mentioned in v$active_session_history. Why then I have not found what I was expecting? Before answering this simple question let me tell you few words about how I have explained this TM lock and the solution I have proposed to get rid of it.

TM-enqueue is almost always related to an unindexed foreign key. The session killed by the DBA was deleting from a parent table (parent_tab). The underlying child table (child_tab) was pointing to this “deleted” parent table via an unindexed foreign key which consequently has been locked. In the meantime, the application was inserting concurrently into other tables. Those inserts add child values coming from the above locked child_tab table (which becomes a parent table in the eyes of the insert statements) which Oracle has to check their existence in the locked “parent” child_tab table and hence the lock sensation reported by the end user during their insert statements. The solution was simply to index the foreign key in the child_tab.

Back to the reason that prompted me to write this note, why v$active_session_history is not showing the same very recent TM lock as the dba_hist_active_sess_history? The answer is simply because I forget that the application is running under RAC instance and I was pointing to the other node of the RAC so that when I issued the same select against gv$ the discrepancy between the two view ceases immediately as shown below:

SQL> select sql_id,event, count(1)
    from gv$active_session_history
    where sample_time > to_date('05032015 10:00:00','ddmmrrrr hh24:mi:ss')
    and   sample_time < to_date('05032015 11:00:00','ddmmrrrr hh24:mi:ss')
    and event like '%TM%'
    group by sql_id,event
    order by 3 desc;

SQL_ID        EVENT                       COUNT(1)
------------- --------------------------- ---------
53xthsbv8d7yk enq: TM - contention        48483
1w95zpw2fy021 enq: TM - contention        3370
35ghv3bugv22a enq: TM - contention        2635
8b9nqpzs24n0t enq: TM - contention        1660
aqdaq2ybqkrpa enq: TM - contention        1548
50gygyqsha3nr enq: TM - contention        1035
fzfvzhjg0p6y0 enq: TM - contention        821
bs359cfsq4fvc enq: TM - contention        801
15xpc3um0c3a2 enq: TM - contention        585
d0rndrymh0b18 enq: TM - contention        491
864jbgkbpvcnf enq: TM - contention        378
9cn21y7hbya46 enq: TM - contention        366
8419w8jnhfa3m enq: TM - contention        331
f71jbkdy94pph enq: TM - contention        46
2zpyy8wbnp5d0 enq: TM - contention        33
0d6gq7b9j522p enq: TM - contention        15
dmpafdd7anvrw enq: TM - contention        1
3fbwp7qdqxk9v enq: TM - contention        1

select sql_id,event, count(1)
from dba_hist_active_sess_history
where sample_time > to_date('05032015 10:00:00','ddmmrrrr hh24:mi:ss')
and   sample_time < to_date('05032015 11:00:00','ddmmrrrr hh24:mi:ss')
and event like '%TM%'
group by sql_id,event
order by 3 desc;

53xthsbv8d7yk     enq: TM - contention    4878
1w95zpw2fy021     enq: TM - contention    340
35ghv3bugv22a     enq: TM - contention    264
8b9nqpzs24n0t     enq: TM - contention    163
aqdaq2ybqkrpa     enq: TM - contention    156
50gygyqsha3nr     enq: TM - contention    103
fzfvzhjg0p6y0     enq: TM - contention    82
bs359cfsq4fvc     enq: TM - contention    80
15xpc3um0c3a2     enq: TM - contention    58
d0rndrymh0b18     enq: TM - contention    49
864jbgkbpvcnf     enq: TM - contention    40
9cn21y7hbya46     enq: TM - contention    36
8419w8jnhfa3m     enq: TM - contention    33
f71jbkdy94pph     enq: TM - contention    5
2zpyy8wbnp5d0     enq: TM - contention    3
0d6gq7b9j522p     enq: TM - contention    2

Spot by the way the formidable illustration of the basics mentioned earlier in this article i.e. dba_hist_active_sess_history is a one-in-ten samples of v$active_session_history

48483/10 ~ 4878 
3370/10  ~  340
2635/10  ~ 264
1660/10  ~ 163
1548/10  ~ 156

A precise index

$
0
0

Below are a select statement not performing very well and its corresponding row source execution plan:

SQL> select
       {list of colums}
    from
      tx_tables bot
    inner join ty_tables_tmp tmp
     on account_id      = tmp.account_id
     and trade_id       = tmp.trd_id
     where transferred <> 1;
----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                        | Starts | E-Rows | A-Rows | 
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                             |      1 |        |    301K|
|   1 |  NESTED LOOPS                |                             |      1 |        |    301K|
|   2 |   NESTED LOOPS               |                             |      1 |     75 |    301K|
|   3 |    TABLE ACCESS FULL         | TY_TABLES_TMP               |      1 |      2 |      2 |        
|*  4 |    INDEX RANGE SCAN          | TX_TABLES_IDX1              |      2 |  43025 |    301K|
|*  5 |   TABLE ACCESS BY INDEX ROWID| TX_TABLES                   |    301K|     38 |    301K|
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("TRADE_ID"="TMP"."TRD_ID")
   5 - filter(("BOT"."TRANSFERRED"<>1 AND "ACCOUNT_ID"="TMP"."ACCOUNT_ID"))

Statistiques
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     278595  consistent gets
          0  physical reads
          0  redo size
   10597671  bytes sent via SQL*Net to client
     221895  bytes received via SQL*Net from client
      20131  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     301944  rows processed

Let’s put aside the inadequate statistics (the CBO is estimating to get 43K rows at operation in line 4 while actually it has generated 301K) and let’s try to figure out if there is a way we can follow to avoid starting TX_TABLES TABLE ACCESS BY INDEX ROWID operation 301K times. The double NESTED LOOP (known as the 11g NLJ_BATCHING) is driving here an outer row source of 301K rows (NESTED LOOP operation at line 2) which starts henceforth the inner operation TABLE ACCESS BY INDEX ROWID 301K times (see the Starts column at line 5). If we get rid of the NESTED LOOP at line 1 we might then be able to reduce the number of times the operation at line 5 is started. And maybe we will also, as a consequence of this starts operation reduction, decrease the number of corresponding logical I/O. Annihilating the nlj_batching feature can be achieved by using the no_nlj_batching hint as shown below:

SQL> select
       /*+ no_nlj_batching(bot) */
       {list of colums}
     from
      tx_tables bot
    inner join ty_tables_tmptmp
     on account_id     = tmp.account_id
     and trade_id       = tmp.trd_id
     where transferred <> 1;
----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                        | Starts | E-Rows | A-Rows |  
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                             |      0 |        |      0 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| TX_TABLES                   |      1 |     38 |    301K|
|   2 |   NESTED LOOPS              |                             |      1 |     75 |    301K|
|   3 |    TABLE ACCESS FULL        | TY_TABLES_TMP               |      1 |      2 |      2 |
|*  4 |    INDEX RANGE SCAN         | TX_TABLES_IDX1              |      2 |  43025 |    301K|
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("BOT"."TRANSFERRED"<>1 AND
       "ACCOUNT_ID"="TMP"."ACCOUNT_ID"))
   4 - access("TRADE_ID"="TMP"."TRD_ID")
Note
-----
   - dynamic sampling used for this statement (level=2)

Statistiques
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     278595  consistent gets
          0  physical reads
          0  redo size
   10597671  bytes sent via SQL*Net to client
     221895  bytes received via SQL*Net from client
      20131  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     301944  rows processed

Although the Starts column is not showing anymore those 301K executions, the number of logical I/O is still exactly the same. The performance issue, in contrast to what I was initially thinking about, is not coming from the nlj_batching feature. As a next step I have decided that it is time to look carefully to this query from the indexes point of view. The above two execution plans have both made use of the TX_TABLES_IDX1 index defined as shown below:

TX_TABLES_IDX1(TRADE_ID, EXT_TRD_ID)       

There is still a room to create a precise index which might help in this case. This index might look like the following one:

SQL> create index TX_TABLES_FBI_IDX2
                 (TRADE_ID
                 ,ACCOUNT_ID
                 ,CASE WHEN TRANSFERRED <>1 THEN -1 ELSE NULL END
                 );

Which, once created, it has allowed the initial query (without any hint) to be honored with the following execution plan:

-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                         | Starts | E-Rows | A-Rows |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                              |      1 |        |    301K|
|   1 |  NESTED LOOPS                |                              |      1 |     75 |    301K|
|   2 |   TABLE ACCESS FULL          | TY_TABLES_TMP                |      1 |      2 |      2 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| TX_TABLES                    |      2 |     38 |    301K|
|*  4 |    INDEX RANGE SCAN          | TX_TABLES_FBI_IDX2           |      2 |   4141 |    301K|
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("BOT"."TRANSFERRED"<>1)
   4 - access("TRADE_ID"="TMP"."TRD_ID" AND
       "ACCOUNT_ID"="TMP"."ACCOUNT_ID")
Note
-----
   - dynamic sampling used for this statement (level=2)

Statistiques
----------------------------------------------------------
        199  recursive calls
          0  db block gets
     108661  consistent gets
        229  physical reads
          0  redo size
    8394791  bytes sent via SQL*Net to client
     221895  bytes received via SQL*Net from client
      20131  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
     301944  rows processed

Spot how the new index has not only get rid of the nlj_batching double nested loop and reduced the number of operations Oracle has to start but it has also reduced the logical I/O consumption to 108K instead of the initial 278K. However, we still have not changed the predicate part of the query to match exactly the function based part of the new index (CASE WHEN TRANSFERRED <>1 THEN -1 ELSE NULL END) which explains why we still have a filter on the TX_TABLES operation at line 3. As always with function based indexes, you need to have the predicate part of the query matching the definition of the function based index expression. Which in other words translate to this new query (look at the last line of the query):

SQL> select
       {list of colums}
     from
       tx_tables bot
       inner join ty_tables_tmptmp
     on account_id     = tmp.account_id
     and trade_id       = tmp.trd_id
     where (case  when transferred <> 1 then -1 else null end)  = -1;

Here it is the new resulting executions plan:

-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                         | Starts | E-Rows | A-Rows |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                              |      1 |        |    301K|
|   1 |  NESTED LOOPS                |                              |      1 |      1 |    301K|
|   2 |   TABLE ACCESS FULL          | TY_TABLES_TMP                |      1 |      1 |      2 |
|   3 |   TABLE ACCESS BY INDEX ROWID| TX_TABLES                    |      2 |     75 |    301K|
|*  4 |    INDEX RANGE SCAN          | TX_TABLES_FBI_IDX2           |      2 |   4141 |    301K|
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("TRADE_ID"="TMP"."TRD_ID"
               AND "ACCOUNT_ID"="TMP"."ACCOUNT_ID"
               AND "BOT"."SYS_NC00060$"=(-1))
Statistiques
----------------------------------------------------
          0  recursive calls
          0  db block gets
     108454  consistent gets
          0  physical reads
          0  redo size
    8394791  bytes sent via SQL*Net to client
     221895  bytes received via SQL*Net from client
      20131  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     301944  rows processed

Notice now that when I have matched the predicate part of the query with the function based index definition there is no filter anymore on the TX_TABLES table which, despite this time has not been of a noticeable effect, it might reveal to be a drastic enhancement.

Bottom Line: precise index can help the CBO following a better path .


Viewing all 224 articles
Browse latest View live