Very recently I have been handled the following sub-optimal execution plan of a classical MERGE statement between two tables, t1 and t2
--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 379 | 24635 | 1107 (19)| 00:00:05 | | |
| 1 | MERGE | T1 | | | | | | |
| 2 | VIEW | | | | | | | |
| 3 | SEQUENCE | SEQ_SEQ | | | | | | |
|* 4 | HASH JOIN OUTER | | 379 | 23119 | 1107 (19)| 00:00:05 | | |
| 5 | TABLE ACCESS BY GLOBAL INDEX ROWID| T2 | 372 | 8184 | 5 (0)| 00:00:01 | ROWID | ROWID |
|* 6 | INDEX RANGE SCAN | AUDIT_IND | 383 | | 3 (0)| 00:00:01 | | |
| 7 | PARTITION RANGE ALL | | 5637K| 209M| 1046 (15)| 00:00:05 | 1 | 2 |
| 8 | TABLE ACCESS FULL | T1 | 5637K| 209M| 1046 (15)| 00:00:05 | 1 | 2 |
--------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("L"."COL_SK"(+)="COL_SK")
6 - access("AUDIT_ID"=246395)
According to this execution plan there are only 379 rows to be merged. But the client was complaining about the 12 seconds this merge statement is taking and asked for improvement. Well, it is quite clear from the supplied information that the CBO is wrongly hash joining 5,637,000 worth of rows from the probed table t1 (operation 8) with 372 rows from the built table t2 (operation 5 and predicate 6), burning a lot of CPU at operation 4 and finally throwing away the majority of the 5,637,000 rows from table t1 that are not fulfilling the join condition (predicate 6).
This is the MERGE statement?
MERGE INTO T1 L USING
(SELECT COL_SK,LOAD_DTS
FROM T2
WHERE AUDIT_ID = 246395) H
ON (L.COL_SK = H.COL_SK)
WHEN NOT MATCHED THEN
INSERT
(COL1,COL_SK,COL2,LOAD_DTS ,ORIG,AUDIT_ID)
VALUES
(SEQ_SEQ.NEXTVAL,H.COL_SK,-1,H.LOAD_DTS,'MHO',246395);
Looking carefully to the above statement I realized that the 372 rows are coming from this
(SELECT
COL_SK,
LOAD_DTS
FROM T2
WHERE AUDIT_ID = 246395
) H
There are 372 values of the join column, COL_SK, to be merged. So why the CBO has not started by getting those 372 values from table t2 and then looked up into the second table t1 using the join column to satisfy the join condition
ON (L.COL_SK = H.COL_SK)
This kind of execution path becomes more adequate when you know that there is additionally a unique index on table t1 starting with the join column
create unique index IND_T1_UK on t1 (col_sk, other_column);
When I hinted the MERGE statement to use this unique index
MERGE /*+ index(L IND_T1_UK) */ INTO T1 L USING
(SELECT COL_SK,LOAD_DTS
FROM T2
WHERE AUDIT_ID = 246395) H
ON (L.COL_SK = H.COL_SK)
WHEN NOT MATCHED THEN
INSERT
(COL1,COL_SK,COL2,LOAD_DTS ,ORIG,AUDIT_ID)
VALUES
(SEQ_SEQ.NEXTVAL,H.COL_SK,-1,H.LOAD_DTS,'MHO',246395);
I got the optimal plan. The plan I want the CBO to use without any hint.
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 379 | 24635 | 1116 (1)| 00:00:05 | | |
| 1 | MERGE | T1 | | | | | | |
| 2 | VIEW | | | | | | | |
| 3 | SEQUENCE | SEQ_SEQ | | | | | | |
| 4 | NESTED LOOPS OUTER | | 379 | 23119 | 1116 (1)| 00:00:05 | | |
| 5 | TABLE ACCESS BY GLOBAL INDEX ROWID| T2 | 372 | 8184 | 5 (0)| 00:00:01 | ROWID | ROWID |
|* 6 | INDEX RANGE SCAN | AUDIT_IND | 383 | | 3 (0)| 00:00:01 | | |
| 7 | TABLE ACCESS BY GLOBAL INDEX ROWID| T1 | 1 | 39 | 3 (0)| 00:00:01 | ROWID | ROWID |
|* 8 | INDEX RANGE SCAN | IND_T1_UK | 1 | | 2 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("AUDIT_ID"=246395)
8 - access("L"."COL_SK"(+)="COL_SK")
You can easily point out from the above two execution plans that the cost of the NESTED LOOP (1116) is greater than the cost of the HASH JOIN (1107) and this is the main reason why the CBO is prefering the sub-optimal HASH JOIN plan. A HASH JOIN being unable to perform an index lookups on the probed table t1 using the join condition col_sk, the CBO traumatized the merge statement by using a costly full table scan.
Finally, solving my problem resides on understanding why the cost of the NESTED LOOP(NL) is greater than the cost of the HASH JOIN.
The cost of a nested loop join is given by
Cost of acquiring data from first table +
Cardinality of result from first table * Cost of single visit to second table
So, in your opinion, what statistics information could play an important role in the decision the CBO makes for the NL join method? Think a little bit before going down the page
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
It is the density of the join column (col_sk).
SQL> select
num_distinct
,density
,num_nulls
,last_analyzed
,sample_size
,global_stats
,histogram
from all_tab_col_statistics
where table_name ='T1'
and column_name='COL_SK';
NUM_DISTINCT DENSITY NUM_NULLS LAST_ANALYZED SAMPLE_SIZE GLO HISTOGRAM
------------ ---------- ---------- ------------------- ----------- --- -------
5,546,458 0.000000180 0 2013/11/07 22:00:31 5,546,458 YES NONE --> spot the join column last_analyzed date
select
table_name
,last_analyzed
from all_tables
where table_name = ‘T1’;
TABLE_NAME LAST_ANALYZED
------------- ---------------------
T1 2013/11/21 05:00:31 --> spot the table last_analyzed date
select count(1) from (select distinct col_sk from t1)---> 5,447,251
Two important points are to be emphasied here:
1. The table t1 has been last analyzed on 21/11/2013 while the join column has been last analyzed on 07/11/2013
2. There is about 100K difference between the real count of distinct(col_sk) and num_distinct of col_sk as taken from the all_tab_col_statistics table
A deeper look on the parameters used to compute the table statistics shows that the statistics are collected using two non adequate parameters:
method_opt -=> null
estimate_percent -=> a given value
The first parameter setting when used translates to: collect stats on table and ignore stats on columns. Which explains the difference between the last analyzed date of the table and that of its join column. It also explains the discrepancy noticed between the num_distinct of the join column and its real count when taken directly from table t1. My prefered value of the method_opt parameter is:
method_opt -=> 'for all columns size 1'
Which collects stats for all columns without collecting histogram.
The second parameter(estimate_percent) indicates at which sample (precision) statistics should be gathered. Starting from 11g and above, my preferred value for this parameter is
estimate_percent -=> 'dbms_stats.auto.sample_size'
Particularly when approximate_ndv is set to true
SELECT DBMS_STATS.get_prefs('approximate_ndv') FROM dual;
DBMS_STATS.GET_PREFS('APPROXIMATE_NDV')
--------------------------------------------------------------------------------
TRUE
Back to my MERGE problem. When the statistics have been gathered again on table t1 using the above adequate dbms_stats parameters
BEGIN
dbms_stats.gather_table_stats (ownname => user,
tabname => 'T1',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
cascade => true,
method_opt => 'FOR ALL COLUMNS SIZE 1'
);
END;
/
the CBO started selecting automatically the optimal plan using the NESTED LOOP JOIN as shown via the following execution plan taken from memory
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | | | 1078 (100)| | | |
| 1 | MERGE | T1 | | | | | | |
| 2 | VIEW | | | | | | | |
| 3 | SEQUENCE | SEQ_SEQ | | | | | | |
| 4 | NESTED LOOPS OUTER | | 357 | 21777 | 1078 (1)| 00:00:05 | | |
| 5 | TABLE ACCESS BY GLOBAL INDEX ROWID| T2 | 357 | 7854 | 5 (0)| 00:00:01 | ROWID | ROWID |
|* 6 | INDEX RANGE SCAN | AUDIT_IND | 367 | | 3 (0)| 00:00:01 | | |
| 7 | TABLE ACCESS BY GLOBAL INDEX ROWID| T1 | 1 | 39 | 3 (0)| 00:00:01 | ROWID | ROWID |
|* 8 | INDEX RANGE SCAN | IND_T1_UK | 1 | | 2 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("AUDIT_ID"=:B1)
8 - access("L"."COL_SK"="COL_SK")
Now, before ending this article I would like to show two things :
The predicate of the sub-optimal plan
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("L"."COL_SK"(+)="COL_SK")
6 - access("AUDIT_ID"=:B1)
And the predicate part of the optimal plan
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("AUDIT_ID"=:B1)
8 - access("L"."COL_SK"="COL_SK")
In the optimal plan we start by taking the 357 (or 379) rows to be merged from the outer table using predicate 6 and then we scan the inner table using this 357 join columns via an index range scan. The majority of the rows are eliminated much earlier.
In the sub-optimal plan it is the last operation, the HASH JOIN OUTER that eliminates the majority of the rows. Which means in this case : we started big and finally finished small. But to have a performant query we need to start small and try to keep small.
Bottom line: when you meet a sub-optimal execution plan using a HASH JOIN driven by a costly full table scan on the probed table while you know that the probed table could be very quickly transformed to an inner table of a NESTED LOOP join scanned via a precise index access on the join column, then verify the statistics on the join column because this one plays an important role in the cost of the NESTED LOOP join.