If you are a regular reader of Jonathan Lewis blog you will have probably came across this article in which the author explains why an “OR subquery” pre-empts the optimizer from unnesting the subquery and merging it with its parent query for a possible optimal join path. This unnesting impossibility is so that the “OR subquery” is executed as a FILTER predicate which when applied on a huge data row set penalizes dramatically the performance of the whole query. In the same article, you will have hopefully also learned how by re-writing the query using a UNION ALL (and taking care of the always threatening NULL via the LNNVL() function) you can open a new path for the CBO allowing an unnest of the subquery.
Unfortunately, nowadays there is a massive expansion of third party software where changing SQL code is not possible so that I hoped that the optimizer was capable to automatically re-factor a disjunctive subquery and consider unnesting it using the UNION ALL workaround.
I was under that impression that this hope is never exhausted by the optimizer until last week when I have received from my friend Ahmed Aangour an e-mail showing a particular disjunctive subquery which has been unnested by the optimizer without any rewrite of the original query by the developer. I have found the case very interesting so that I decided to model it and to share it with you. Take a look to the query and the execution plan first in 11.2.0.2 (the table script is supplied at the end of the article)
SQL> alter session set statistics_level=all; SQL> alter session set optimizer_features_enable='11.2.0.2'; SQL> select a.id1 ,a.n1 ,a.start_date from t1 a where (a.id1 in (select b.id from t2 b where b.status = 'COM' ) OR a.id1 in (select c.id1 from t2 c where c.status = 'ERR' ) ); SQL> select * from table(dbms_xplan.display_cursor(null,null, ‘allstats last’)); ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9890 |00:00:02.23 | 742K|<-- |* 1 | FILTER | | 1 | | 9890 |00:00:02.23 | 742K| | 2 | TABLE ACCESS FULL| T1 | 1 | 10000 | 10000 |00:00:00.01 | 1686 | |* 3 | TABLE ACCESS FULL| T2 | 10000 | 1 | 9890 |00:00:02.16 | 725K| |* 4 | TABLE ACCESS FULL| T2 | 110 | 1 | 0 |00:00:00.05 | 15400 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(( IS NOT NULL OR IS NOT NULL)) 3 - filter(("B"."ID"=:B1 AND "B"."STATUS"='COM')) 4 - filter(("C"."ID1"=:B1 AND "C"."STATUS"='ERR'))
The double full access to table t2 plus the FILTER operation indicate clearly that the OR clause has not been combined with the parent query. If you want to know what is behind the filter predicate n°1 above then the “not so famous” explain plan for command will help in this case:
--------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 975 | 15600 | 462 (0)| 00:00:01 | |* 1 | FILTER | | | | | | | 2 | TABLE ACCESS FULL| T1 | 10000 | 156K| 462 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL| T2 | 1 | 8 | 42 (0)| 00:00:01 | |* 4 | TABLE ACCESS FULL| T2 | 1 | 7 | 2 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter( EXISTS (SELECT 0 FROM "T2" "B" WHERE "B"."ID"=:B1 AND "B"."STATUS"='COM') OR EXISTS (SELECT 0 FROM "T2" "C" WHERE "C"."ID1"=:B2 AND "C"."STATUS"='ERR')) 3 - filter("B"."ID"=:B1 AND "B"."STATUS"='COM') 4 - filter("C"."ID1"=:B1 AND "C"."STATUS"='ERR')
Notice how the subquery has been executed as a FILTER operation which sometimes (if not often) represents a real performance threat.
However, when I‘ve executed the same query under optimizer 11.2.0.3 I got the following interesting execution plan
SQL> alter session set optimizer_features_enable='11.2.0.3'; -------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9890 |00:00:00.03 | 1953 |<-- |* 1 | HASH JOIN | | 1 | 5000 | 9890 |00:00:00.03 | 1953 | | 2 | VIEW | VW_NSO_1 | 1 | 5000 | 9890 |00:00:00.01 | 282 | | 3 | HASH UNIQUE | | 1 | 5000 | 9890 |00:00:00.01 | 282 | | 4 | UNION-ALL | | 1 | | 9900 |00:00:00.01 | 282 | |* 5 | TABLE ACCESS FULL| T2 | 1 | 2500 | 10 |00:00:00.01 | 141 | |* 6 | TABLE ACCESS FULL| T2 | 1 | 2500 | 9890 |00:00:00.01 | 141 | | 7 | TABLE ACCESS FULL | T1 | 1 | 10000 | 10000 |00:00:00.01 | 1671 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("A"."ID1"="ID1") 5 - filter("C"."STATUS"='ERR') 6 - filter("B"."STATUS"='COM')
Notice now how the new plan is showing a HASH JOIN operation between an internal view( VW_NSO_1) and table t1 coming from the parent query block. Notice as well the HASH JOIN condition (access(“A”.”ID1″=”ID1″)) that appears in filter n°1. The optimizer has done a double transformation:
- created an internal view VW_NSO_1 representing a UNION-ALL between the two subqueries present in the where clause
- joined the newly online created view with table t1 present in the parent query block
Looking at the corresponding 10053 trace file I have found how the CBO has transformed the initial query:
select a.id1 id1, a.n1 n1, a.start_date start_date from ( (select c.id1 id1 from c##mhouri.t2 c where c.status='ERR') union (select b.id id from c##mhouri.t2 b where b.status='COM') ) vw_nso_1, c##mhouri.t1 a where a.id1= vw_nso_1.id1;
In fact the optimizer has first combined the two subqueries into a VIEW and finished by UNNESTING them with the parent query. This is a transformation which Oracle optimizer seems to name : SUBQ INTO VIEW FOR COMPLEX UNNEST
In the same 10053 trace file we can spot the following lines:
***************************** Cost-Based Subquery Unnesting ***************************** Query after disj subq unnesting:******* UNPARSED QUERY IS ******* SU: Transform an ANY subquery to semi-join or distinct. Registered qb: SET$7FD77EFD 0x15b5d4d0 (SUBQ INTO VIEW FOR COMPLEX UNNEST SET$E74BECDC) SU: Will unnest subquery SEL$3 (#2) SU: Will unnest subquery SEL$2 (#3) SU: Reconstructing original query from best state. SU: Considering subquery unnest on query block SEL$1 (#1). SU: Checking validity of unnesting subquery SEL$2 (#3) SU: Checking validity of unnesting subquery SEL$3 (#2) Query after disj subq unnesting:******* UNPARSED QUERY IS ******* SU: Checking validity of unnesting subquery SET$E74BECDC (#6) SU: Passed validity checks.
This is a clear enhancement made in the optimizer query transformation that will help improving performance of disjunctive subqueries automatically without any external intervention.
Unfortunately, I was going to end this article until I’ve realized that although I am testing this case under 12.1.0.1.0 database release I still have not executed the same query under optimizer feature 12.1.0.1.0
SQL> alter session set optimizer_features_enable='12.1.0.1.1'; SQL > execute query ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9890 |00:00:03.84 | 716K| |* 1 | FILTER | | 1 | | 9890 |00:00:03.84 | 716K| | 2 | TABLE ACCESS FULL| T1 | 1 | 10000 | 10000 |00:00:00.01 | 1686 | |* 3 | TABLE ACCESS FULL| T2 | 10000 | 2 | 9890 |00:00:03.81 | 715K| ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter( IS NOT NULL) 3 - filter((("B"."ID1"=:B1 AND "B"."STATUS"='ERR') OR ("B"."ID"=:B2 AND "B"."STATUS"='COM')))
The automatic unnesting of the disjunctive subquery has been removed in the 12.1.0.1.1 optimizer model.
If you want to reproduce and test this case here below is the model (I would be interested to see if the disjunctive subquery is unnested or not in the 12.1.0.1.2 release )
create table t1 as select rownum id1, trunc((rownum-1/3)) n1, date '2012-06-07' + mod((level-1)*2,5) start_date, lpad(rownum,10,'0') small_vc, rpad('x',1000) padding from dual connect by level <= 1e4; create table t2 as select rownum id ,mod(rownum,5) + mod(rownum,10)* 10 as id1 ,case when mod(rownum, 1000) = 7 then 'ERR' when rownum <= 9900 then 'COM' when mod(rownum,10) between 1 and 5 then 'PRP' else 'UNK' end status ,lpad(rownum,10,'0') as small_vc ,rpad('x',70) as padding from dual connect by level <= 1e4; alter table t1 add constraint t1_pk primary key (id1);
