Quantcast
Channel: Mohamed Houri’s Oracle Notes
Viewing all articles
Browse latest Browse all 224

Or expansion: Two-pass, Linear or Greedy

$
0
0

Abstract

The OR Expansion represents the ability of the CBO to transform an initial query that includes predicates linked together by an OR operator into a UNION ALL operator applied on two or more query blocks known as the UNION ALL branches. This transformation allows the CBO to consider new index access paths and join methods that would have been impossible should the initial query has not been transformed. But, the higher the number of ORs in the original query, the higher the number of semantically equivalent OR-expansion transformations. Letting the CBO evaluate the cost-benefit of this high percentage of OR-expansion states would have been very expensive. This is why Oracle has implemented three different best-cost evaluation techniques when the OR Expansion transformation is considered: Two-pass, Linear, and Greedy.  This article aims to explain when and how these techniques are used.

Warning

This is not an article that will help you in your daily performance and trouble-shooting work since it doesn’t really matter which strategy Oracle has used to evaluate the best-costed OR-expansion transformation, Two-pass, Linear, or Greedy. What matters, from a performance point of view, is whether Oracle has used the OR expansion or not. It even seems that an alter session to switch from one strategy to another is ignored. So don’t be surprised if the content of this article does not add any value to your diagnostic and performance tuning skills:

Terminology

   a) Disjunctive Normal Form

Before the CBO considers the method or the procedure of evaluating the best cost of one or many OR-Expansion states, it must first transform the initial query into a form called Disjunctive Normal Form. Let’s consider the following query

select 
    *
from
   t1
where
   (n1 =1)                  -- conjunct n°1
 and ( n2 = 42 or vc1 ='20')-- conjunct n°2
;

The predicate part of the above query is not a DNF form because one of its conjuncts (n°2) is a disjunction (contains or).  Before applying one of the three OR expansion techniques Oracle will transform the initial query into its DNF form by distributing the conjunction (and) over the disjunction (or) as shown below:

select 
    *
from
   t1
where
     (n1 =1 and n2=42)     -- conjunct n°1
 or  (n1 =1 and vc1 ='20') -- conjunct n°2;

The corresponding 10053 trace file contains a part related to this normalization which is labeled DNF Matrix as shown below:

ORE: # conjunction chain - 2
ORE: Checking validity of disjunct chain

DNF Matrix (Before sorting OR branches)
            P1  P2  P3
CNJ (#1) :   1   1   0
CNJ (#2) :   1   0   1

ORE:  Predicate list
P1 : "T1"."N1"=1
P2 : "T1"."N2"=42
P3 : "T1"."VC1"='20'

DNF Matrix (After OR branch sorting)
            P1  P2  P3
CNJ (#1) :   1   1   0
CNJ (#2) :   1   0   1

It is thus very clear that Oracle has built a chain of conjunctions with two elements on which it will apply a disjunction. For more details please read this Nenad Noveljic blog post.

   b) NT and FORE

NT represents the Non Transformed initial query when the OR expansion transformation is not applied to the query. FORE refers to the Full OR Expansion where all the disjunctive predicates are transformed into UNION ALL branches. Concretely the following query is in an NT state:

select 
    *
from
   t1
where
     (n1 =1 and n2=42)     -- conjunct n°1
 or  (n1 =1 and vc1 ='20') -- conjunct n°2;

But the following query is in a FORE state:

select 
    *
from
   t1
where
  (n1 =1 and n2=42)    
union all
select 
    *
from
   t1
where
  (n1 =1 and vc1 ='20')
and LNVNL( ((n1 =1 and n2=42));

   c) Stirling Number of the second kind

There exist two types of Stirling Number (SN): SN of the first kind and SN of the second kind. The SN of the second kind is denoted by S(n,k) and it represents the number of ways to partition n objects into k non-empty similar boxes. In the context of the OR-expansion, the total number of expansion states costed by Oracle follows the Stirling Number S(n,k) where n represents the number of disjunctive predicates (aka the number of conjunction chains) and where k (could very well represent the number of Or predicate)  ranges from 1 to n. To calculate the number of possible Or-expansion states that obey the Stirling number of the second kind we will use the recursive way as shown below:

S(n,k) = S(n-1,k-1)+ k*S(n-1,k)

With the following fixed values known in advance:

S(0,0) = 1
S(n,0) = 0 for n>=1 
S(n,n) = 1

Following the above recursive formula, S(4,3) will be found to equal 6 as shown below:

S(4,3) = S(3,2) + 3*S(3,3)
S(4,3) = S(3,2) + 3*1 
S(4,3) = S(3,2) + 3 
 
-- where
S(3,2) = S(2,1) + 2*S(2,2) = 1 + 2 = 3
-- Hence
S(4,3) = 3+3 = 6

Two-pass technique

There are five possible values for the  _optimizer_cbqt_or_expansion parameter:

SQL> @pvalid _optimizer_cbqt_or_expansion
Display valid values for multioption parameters matching "_optimizer_cbqt_or_expansion"...

  PAR# PARAMETER                     ORD VALUE    
------ ----------------------------- --- ---------
  4675 _optimizer_cbqt_or_expansion    1 OFF
       _optimizer_cbqt_or_expansion    2 ON
       _optimizer_cbqt_or_expansion    3 LINEAR
       _optimizer_cbqt_or_expansion    4 GREEDY
       _optimizer_cbqt_or_expansion    5 TWO_PASS

In the two-pass search strategy, Oracle will evaluate the cost of only two states, the initial NT state, and the FORE state. My investigations have shown that the two-pass technique is systematically used when one of the following conditions is met:

  • either the number of conjunct chains= 2
  • or the number of conjunct chains>=5

Here’s below the demonstration starting with 2 conjuncts, 5 conjuncts and then 6 conjuncts:

alter session set tracefile_identifier='1Ored';
@53on
-- query n°1
select 
   *
from
   t1
where
   (n1    = 1  -- conjunct 1
    or n2 = 42 -- conjunct 2       
    );
@53off

egrep "ORE: Using search type|conjunction chain" ORCLCDB_ora_9365_1Ored.trcORE: 
ORE: Using search type: linear
ORE: # conjunction chain – 2

As you can see, Oracle has started by considering to evaluate the cost of the different OR expansion states using the Linear technique. It has also recognized that it has to deal with a DNF of two conjunctions.

However, a few lines further down in the same trace file we can see that Oracle has changed its mind and decided to switch to the two-pass technique:

egrep "ORE: Switching to|state space|Updated best state|Not update best state|conjunction chain" ORCLCDB_ora_9365_1Ored.trc
ORE: # conjunction chain - 2
ORE: Switching to two pass because of # conjunction: 2 -----> switch occurs here
ORE: Starting iteration 1, state space = [{ 1 2 }]
ORE: Updated best state, Cost = 581.063707
ORE: Starting iteration 2, state space = [{ 1 }]
ORE: Updated best state, Cost = 1.000055
ORE: Starting iteration 2, state space = [{ 2 }]
ORE: Updated best state, Cost = 2.000095
ORE:   Transferring best state space to preserved query.
ORE:   Transferring best state space to original query.

As you can read, Oracle decided to switch to two-pass because the DNF form to be processed contains 2 conjunctions.

By the way, it is useless to re-explain here what Nenad Noveljic has already done with certain brilliance in the popularization of what a state space, [{ 1 2 }] for example, represents. I invite you to read his article to get a clear picture. Let me just say the following:

state space = [{ 1 2 }] = original non transformed NT query 
state space = [{ 1 }]   = select * from t1 where (n1 =1)
state space = [{ 2 }]   = select * from t1 where (n12 =42) and lnnvl (n1=1)    

Let’s now repeat the same experiment for a DNF of 5 and 6 conjuncts respectively:

alter session set tracefile_identifier='5Ored';
@53on
select 
   *
from
   t1
where
   (n1 =1
      or n2  = 42
      or vc1 = '20'
	  or n3  = 9 
	  or vc2 = '10'
    );
@53off
egrep "ORE: Using search type|conjunction chain" ORCLCDB_ora_9365_5Ored.trc
ORE: Using search type: linear
ORE: # conjunction chain - 5

egrep "ORE: Switching to|state space|Updated best state|Not update best state|conjunction chain" ORCLCDB_ora_9365_5Ored.trc
ORE: # conjunction chain - 5
ORE: Switching to two pass because of # conjunction: 5   -----> switch occurs here
ORE: Starting iteration 1, state space = [{ 1 2 3 4 5 }]
ORE: Updated best state, Cost = 581.744310
ORE: Starting iteration 2, state space = [{ 1 }]
ORE: Updated best state, Cost = 1.000055
ORE: Starting iteration 2, state space = [{ 2 }]
ORE: Updated best state, Cost = 2.000095
ORE: Starting iteration 2, state space = [{ 3 }]
ORE: Updated best state, Cost = 3.000134
ORE: Starting iteration 2, state space = [{ 4 }]
ORE: Updated best state, Cost = 4.000174
ORE: Starting iteration 2, state space = [{ 5 }]
ORE: Updated best state, Cost = 5.000247
ORE:   Transferring best state space to preserved query.
ORE:   Transferring best state space to original query.
ORE: # conjunction chain – 5
alter session set tracefile_identifier='6Ored';
@53on
select 
   *
from
   t1
where
   (n1 =1
      or n2  = 42
      or vc1 = '20'
	  or n3  = 9 
	  or vc2 = '10'
	  or n5  = 36
    );
@53off
egrep "ORE: Using search type|conjunction chain" ORCLCDB_ora_9365_6Ored.trc
ORE: Using search type: linear
ORE: # conjunction chain - 6

egrep "ORE: Switching to|state space|Updated best state|Not update best state|conjunction chain" ORCLCDB_ora_9365_6Ored.trc
ORE: # conjunction chain - 6
ORE: Switching to two pass because of # conjunction: 6  -----> switch occurs here
ORE: Starting iteration 1, state space = [{ 1 2 3 4 5 6 }]
ORE: Updated best state, Cost = 581.870344
ORE: Starting iteration 2, state space = [{ 1 }]
ORE: Updated best state, Cost = 1.000055
ORE: Starting iteration 2, state space = [{ 2 }]
ORE: Updated best state, Cost = 2.000095
ORE: Starting iteration 2, state space = [{ 3 }]
ORE: Updated best state, Cost = 3.000134
ORE: Starting iteration 2, state space = [{ 4 }]
ORE: Updated best state, Cost = 4.000174
ORE: Starting iteration 2, state space = [{ 5 }]
ORE: Updated best state, Cost = 5.000247
ORE: Starting iteration 2, state space = [{ 6 }]
ORE: Updated best state, Cost = 6.000287
ORE:   Transferring best state space to preserved query.
ORE:   Transferring best state space to original query.

It is more and more clear now that whatever the DNF is, if it has a number of conjunctions equal to 2 or greater than or equal to 5, then Oracle will use the two-pass technique when considering the Cost based Or Expansion. This technique consists in evaluating the cost of the original query (NT) and that of the FULL transformed UNION ALL query (FORE). There is no use of the Stirling number in this case.

Moreover, as already mentioned in the abstract section, it seems that forcing the linear technique has no effect. It is fairly likely that the condition of the number of conjunction 2 (or >=5) that governs the two-pass technique is hard coded in the CBO code as the following tends to prove:

alter session set tracefile_identifier='Linear';

***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
  *************************************
  PARAMETERS WITH ALTERED VALUES
  ******************************
Compilation Environment Dump
optimizer_index_cost_adj            = 10
_optimizer_cbqt_or_expansion        = linear
_swat_ver_mv_knob                   = 0

egrep "ORE: Using search type|conjunction chain" ORCLCDB_ora_16232_Linear.trc
ORE: Using search type: linear
ORE: # conjunction chain – 5

egrep "ORE: Switching to|state space|Updated best state|Not update best state|conjunction chain" ORCLCDB_ora_16232_Linear.trc
ORE: # conjunction chain - 5
ORE: Switching to two pass because of # conjunction: 5
ORE: Starting iteration 1, state space = [{ 1 2 3 4 5 }]

In order not to make this blog post too heavy I will try to devote a separate article to the Linear technique as soon as possible.

Model

Here is the model I used to conduct my experiments:

CREATE TABLE t1 (
    n1   NUMBER,
    n2   NUMBER,
    n3   NUMBER,
    n4   NUMBER,
    n5   NUMBER,
    n6   NUMBER,
    vc1  VARCHAR2(10),
    vc2  VARCHAR2(100),
    d1   DATE
);

INSERT INTO t1
    SELECT
        rownum,
        mod(rownum, 10),
        trunc((rownum - 1 / 3)),
        trunc((rownum - 1 / 5)),
        trunc((rownum - 1 / 7)),
        mod(rownum,10),
        lpad('x', 10),
        lpad('y',100),
        date '2022-01-01' + (level-1) * interval '15' minute
    FROM
        dual
    CONNECT BY
        level <= 1e5;
		
create index t1_idx1 on t1(n1,n2);
create index t1_idx2 on t1(n2);
create index t1_idx3 on t1(n3);
create index t1_idx4 on t1(n4);
create index t1_idx5 on t1(n5);
create index t1_idx6 on t1(n6);
create index t1_idx7 on t1(vc1);
create index t1_idx8 on t1(vc2);
create index t1_idx9 on t1(d1);

exec dbms_stats.gather_table_stats(user, 't1');
alter session set optimizer_index_cost_adj =10;

Viewing all articles
Browse latest Browse all 224

Trending Articles