Quantcast
Channel: Mohamed Houri’s Oracle Notes
Viewing all articles
Browse latest Browse all 224

TEMP Table transformation and PQ_SLAVE_mismatch

$
0
0

A SQL Profiled query opted for a different execution plan despite the Note at the bottom of its execution plan indicating that a SQL Profile has been used. The new plan makes the query failing with the classical parallel query error due to a lack of TEMP space following a massive parallel broadcast distribution

A SQL Profiled query opted for a different execution plan despite the Note at the bottom of its execution plan indicating that a SQL Profile has been used. The new plan makes the query failing with the classical parallel query error due to a lack of TEMP space following a massive parallel broadcast distribution

ORA-12801: error signaled in parallel query server P013
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

Initially this query fails when it reached 32GB of TEMP space. When waked up, the on call DBA augmented the TEMP space but, unfortunately, the next run failed as well when it reached 137GB of TEMP space.

When it was my turn to investigate this issue, instead of continuing enlarging the TEMP tablespace, or changing the parallel distribution method, I decided to figure out why Oracle is refusing to use the SQL Profile and was compiling a new plan practically at each execution. Applying Tanel Poder nonshared script to the sql_id of the real world query gives this:

SQL> @nonshared 1b7g55gx40k79
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 1
REASON                        : <reason>PQ Slave mismatch(5)</reason>

-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 2
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y

-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 3
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y

-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 4
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y
-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 5
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y
-----------------

What does this PQ_SLAVE_MISMATCH non sharing reason mean?
Oracle defines it as

(Y|N) Top-level slave decides not to share cursor

This very short definition seems indicating that a parallel slave refused to share the execution plan of its Query Coordinator and decided to hard parse its proper execution plan even though they are both (the QC and the PX slaves) running in the same instance in an 11.2.0.4 release.

I spent a couple of minutes looking at the query trying to simplify it until I found the part of it causing the execution plan mismatch. Having got a clue of what is happening in this real life query I engineered the following model with which I have reproduced the same behaviour in 11.2.0.4 and 12.1.0.1.0. Look at the following setup (where you will recognize one of the Jonathan Lewis table scripts):

select banner from gv$version where rownum=1;

BANNER
-----------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production

create table t1
as
 with generator as (
      select  --+ materialize
	     rownum id
	  from dual
	  connect by level <=1000
)
select
    trunc((rownum -1)/2) product_id
   ,mod(rownum-1,2)      flag1
   ,mod(rownum-1,3)      flag2
   ,rownum               n1
   ,lpad(rownum,30)      v1
from
    generator v1
   ,generator v2
 where rownum <= 1e6;

create table t2
as
select

     level id
     ,date  '2012-06-07' + mod((level-1)*5,10)+ interval '5' minute start_date
     ,rpad( 'xx',10) padding
from
   dual
connect by level <=1e6;

begin
 dbms_stats.gather_table_stats(user, 't1', method_opt => 'for all columns size 1');
 dbms_stats.gather_table_stats(user, 't2', method_opt => 'for all columns size 1');
 end;
 /

And here’s below the query I will be using in order to reproduce the non-sharing reason:

with got_my_dates
 as (select /*+ materialize */
          id
		 ,padding
		 ,start_date
	 from
	      t2
	 where
	   start_date   > to_date('07/06/2012','dd/mm/yyyy')
	 and start_date <= to_date('10/06/2012','dd/mm/yyyy')
	 )
select
     /*+ parallel(4) */
     t1.*
	,cte.padding
	,cte.start_date
from
     t1
	,got_my_dates cte
where
    t1.product_id = cte.id
and t1.product_id <= 1e3;

A simple remark before starting the experiment. In the real life query the “with subquery” has been automatically materialized by Oracle because it is called two times in the main query. This is why the materialize hint I used above might not be absolutely necessary for the behaviour, I will be explaining hereinafter, to happen.
Let’s also confirm, before starting the experiment, that this query has not been previously parsed and as such is completely unknown:

SQL> select sql_id
    from gv$sql
    where sql_text like '%got_my_dates%'
    and sql_text not like '%v$sql%';

no rows selected

In the following I will execute the above query, get it execution plan, and show how many child cursor it has used during this very first execution:

SQL> – run query

SQL_ID  1b7g55gx40k79, child number 0
-------------------------------------

Plan hash value: 2708956082
-------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes |    TQ  |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |       |       |        |      |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |        |      |
|   2 |   PX COORDINATOR           |                            |       |       |        |      |
|   3 |    PX SEND QC (RANDOM)     | :TQ10000                   |   994K|    22M|  Q1,00 | P->S |
|   4 |     LOAD AS SELECT         |                            |       |       |  Q1,00 | PCWP |
|   5 |      PX BLOCK ITERATOR     |                            |   994K|    22M|  Q1,00 | PCWC |
|*  6 |       TABLE ACCESS FULL    | T2                         |   994K|    22M|  Q1,00 | PCWP |
|   7 |   PX COORDINATOR           |                            |       |       |        |      |
|   8 |    PX SEND QC (RANDOM)     | :TQ20001                   |  1830 |   134K|  Q2,01 | P->S |
|*  9 |     HASH JOIN              |                            |  1830 |   134K|  Q2,01 | PCWP |
|  10 |      JOIN FILTER CREATE    | :BF0000                    |  1830 | 84180 |  Q2,01 | PCWP |
|  11 |       PX RECEIVE           |                            |  1830 | 84180 |  Q2,01 | PCWP |
|  12 |        PX SEND BROADCAST   | :TQ20000                   |  1830 | 84180 |  Q2,00 | P->P |
|  13 |         PX BLOCK ITERATOR  |                            |  1830 | 84180 |  Q2,00 | PCWC |
|* 14 |          TABLE ACCESS FULL | T1                         |  1830 | 84180 |  Q2,00 | PCWP |
|* 15 |      VIEW                  |                            |   994K|    27M|  Q2,01 | PCWP |
|  16 |       JOIN FILTER USE      | :BF0000                    |   994K|    22M|  Q2,01 | PCWP |
|  17 |        PX BLOCK ITERATOR   |                            |   994K|    22M|  Q2,01 | PCWC |
|* 18 |         TABLE ACCESS FULL  | SYS_TEMP_0FD9D6633_140F243 |   994K|    22M|  Q2,01 | PCWP |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter(("START_DATE">TO_DATE(' 2012-06-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "START_DATE"<=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
   9 - access("T1"."PRODUCT_ID"="CTE"."ID")
  14 - access(:Z>=:Z AND :Z<=:Z)
       filter("T1"."PRODUCT_ID"<=1000)
  15 - filter("CTE"."ID"<=1000)
  18 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"C0"))

Note
-----
- Degree of Parallelism is 4 because of hint


SQL> @gv$sql
Enter value for sql_id: 1b7g55gx40k79

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER LAST_LOAD_TIME      EXECUTIONS
------------- --------------- ------------ ------------------- ----------
1b7g55gx40k79      2708956082            0 2016-09-26/07:30:23          1
1b7g55gx40k79      2708956082            1 2016-09-26/07:30:24          0

Notice how a completely new query produces, during its very first execution, two child cursors. Interestingly, the number of executions seems to indicate that Oracle used the child cursor 0 while the number of executions of the child cursor n°1 has not been incremented.

The reason for this double child cursors is:

SQL> @nonshared 1b7g55gx40k79
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

SQL_ID                        : 1b7g55gx40k79
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 0
REASON                        : <reason>PQ Slave mismatch(5)</reason>
CON_ID                        : 1

-----------------

SQL_ID                        : 1b7g55gx40k79
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF9214A40E8
CHILD_NUMBER                  : 1
PQ_SLAVE_MISMATCH             : Y
REASON                        : <reason>PQ Slave mismatch(5)</reason>
CON_ID                        : 1
-----------------

So here we are: exactly at the same situation as the real world query.

Another remark which is worth to be mentioned here is that the execution plan of child cursor n°1 is exactly identical to the child cursor n° 0 shown above except this bizarre Note at the bottom:

SQL_ID  1b7g55gx40k79, child number 1
-------------------------------------
with got_my_dates  as (select /*+ materialize */           id
,padding    ,start_date   from        t2   where     start_date   >
to_date('07/06/2012','dd/mm/yyyy')   and start_date <=
to_date('10/06/2012','dd/mm/yyyy')   ) select      /*+ parallel(4) */
   t1.*  ,cte.padding  ,cte.start_date from      t1  ,got_my_dates cte
where     t1.product_id = cte.id and t1.product_id <= 1e3

Note
-----
- automatic DOP: Computed Degree of Parallelism is 4

I don’t clearly get where this automatic DOP is coming from since the auto DOP parameter is not set in my personal laptop nor in the real world application:

SQL> show parameter parallel_degree_policy

PARAMETER_NAME                  TYPE        VALUE
------------------------------- ----------- -------
parallel_degree_policy          string      MANUAL

Last but not least, while the query was always requesting 2*DOP (8) parallel serves, Oracle managed, systematically, to give it 12 (and sometimes 16) parallel servers:

SQL> select
         sql_id
         ,process_name px_slave
     from gv$sql_monitor
     where sql_id = '1b7g55gx40k79'
     and sql_exec_id =16777216
     and  trunc(sql_exec_start) =to_date('27092016','ddmmyyyy')
     and sql_text is null
     order by 2 ;

SQL_ID        PX_SL
------------- -----
1b7g55gx40k79 p000
1b7g55gx40k79 p000
1b7g55gx40k79 p001
1b7g55gx40k79 p001
1b7g55gx40k79 p002
1b7g55gx40k79 p002
1b7g55gx40k79 p003
1b7g55gx40k79 p003
1b7g55gx40k79 p004
1b7g55gx40k79 p005
1b7g55gx40k79 p006
1b7g55gx40k79 p007

12 rows selected.


Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  C##MHOURI (7:4495)
 SQL ID              :  1b7g55gx40k79
 SQL Execution ID    :  16777216
 Execution Started   :  09/27/2016 07:29:34
 First Refresh Time  :  09/27/2016 07:29:34
 Last Refresh Time   :  09/27/2016 07:29:45
 Duration            :  11s
 Module/Action       :  SQL*Plus/-
 Service             :  orcl
 Program             :  sqlplus.exe
 Fetch Calls         :  3

Parallel Execution Details (DOP=4 , Servers Allocated=12)

This information of Servers Allocated that are greater than the requested servers is not an innocent information and should always kept your attention.

That’s said, as you might have already guessed via the title of this post, the simplification of the real word query shows that the parallel slave is refusing to share the QC execution plan because of the materialisation of the Common Table Expression. This is why when I pre-empted the materialisation of the CTE via the /*+ inline */ hint the parallel slave shared the execution plan of its QC as shown below:

with got_my_dates
 as (select /*+ inline */
          id
            ,padding
            ,start_date
    from
         t2
    where
      start_date   > to_date('07/06/2012','dd/mm/yyyy')
    and start_date <= to_date('10/06/2012','dd/mm/yyyy')
    )
select
     /*+ parallel(4) */
     t1.*
   ,cte.padding
   ,cte.start_date
from
     t1
   ,got_my_dates cte
where
    t1.product_id = cte.id
and t1.product_id <= 1e3;

1000 rows selected.

SQL> start xpsimp

SQL_ID  4h1qa708b9p3j, child number 0
-------------------------------------
Plan hash value: 2637578939
--------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes |    TQ  |IN-OUT|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |        |      |
|   1 |  PX COORDINATOR             |          |       |       |        |      |
|   2 |   PX SEND QC (RANDOM)       | :TQ10002 |   681 | 47670 |  Q1,02 | P->S |
|*  3 |    HASH JOIN BUFFERED       |          |   681 | 47670 |  Q1,02 | PCWP |
|   4 |     JOIN FILTER CREATE      | :BF0000  |   681 | 16344 |  Q1,02 | PCWP |
|   5 |      PX RECEIVE             |          |   681 | 16344 |  Q1,02 | PCWP |
|   6 |       PX SEND HYBRID HASH   | :TQ10000 |   681 | 16344 |  Q1,00 | P->P |
|   7 |        STATISTICS COLLECTOR |          |       |       |  Q1,00 | PCWC |
|   8 |         PX BLOCK ITERATOR   |          |   681 | 16344 |  Q1,00 | PCWC |
|*  9 |          TABLE ACCESS FULL  | T2       |   681 | 16344 |  Q1,00 | PCWP |
|  10 |     PX RECEIVE              |          |  1830 | 84180 |  Q1,02 | PCWP |
|  11 |      PX SEND HYBRID HASH    | :TQ10001 |  1830 | 84180 |  Q1,01 | P->P |
|  12 |       JOIN FILTER USE       | :BF0000  |  1830 | 84180 |  Q1,01 | PCWP |
|  13 |        PX BLOCK ITERATOR    |          |  1830 | 84180 |  Q1,01 | PCWC |
|* 14 |         TABLE ACCESS FULL   | T1       |  1830 | 84180 |  Q1,01 | PCWP |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."PRODUCT_ID"="ID")
   9 - access(:Z>=:Z AND :Z<=:Z)
       filter(("ID"<=1000 AND "START_DATE">TO_DATE(' 2012-06-07 00:00:00', 'syyyy-mm-dd
               hh24:mi:ss')
       AND  "START_DATE"<=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
  14 - access(:Z>=:Z AND :Z<=:Z)
       filter(("T1"."PRODUCT_ID"<=1000 AND
       SYS_OP_BLOOM_FILTER(:BF0000,"T1"."PRODUCT_ID")))

Note
-----
   - Degree of Parallelism is 4 because of hint
 

Notice below how, now that the CTE is not materialized, the query is using a single child cursor :

SQL> @gv$sql2
Enter value for sql_id: 4h1qa708b9p3j

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER LAST_LOAD_TIME      EXECUTIONS END_OF_FETCH_COUNT
------------- --------------- ------------ ------------------- ---------- ------------------
4h1qa708b9p3j      2637578939            0 2016-09-27/18:00:54          1                  1

SQL> @nonshared 4h1qa708b9p3j
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

SQL_ID                        : 4h1qa708b9p3j
ADDRESS                       : 00007FF921744A58
CHILD_ADDRESS                 : 00007FF921744698
CHILD_NUMBER                  : 0
REASON                        :
CON_ID                        : 1
-----------------

And spot as well that the non materialization of the CTE is so that the number of requested parallel server (2*DOP) equals the number of allocated servers as shown below :

SQL> select
        sql_id
       ,process_name px_slave
     from gv$sql_monitor
     where sql_id = '4h1qa708b9p3j'
     and sql_exec_id =16777216
     and trunc(sql_exec_start) = trunc(sysdate)
     and sql_text is null
     order by 2 ;

SQL_ID        PX_SL
------------- -----
4h1qa708b9p3j p000
4h1qa708b9p3j p001
4h1qa708b9p3j p002
4h1qa708b9p3j p003
4h1qa708b9p3j p004
4h1qa708b9p3j p005
4h1qa708b9p3j p006
4h1qa708b9p3j p007

8 rows selected.

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  C##MHOURI (7:4495)
 SQL ID              :  4h1qa708b9p3j
 SQL Execution ID    :  16777216
 Execution Started   :  09/27/2016 18:00:54
 First Refresh Time  :  09/27/2016 18:00:54
 Last Refresh Time   :  09/27/2016 18:00:57
 Duration            :  3s
 Module/Action       :  SQL*Plus/-
 Service             :  orcl
 Program             :  sqlplus.exe
 Fetch Calls         :  3

Parallel Execution Details (DOP=4 , Servers Allocated=8)

And for those who can’t deal without execution plan here’s below the new execution plan

Plan hash value: 2637578939
--------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes |    TQ  |IN-OUT|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |        |      |
|   1 |  PX COORDINATOR             |          |       |       |        |      |
|   2 |   PX SEND QC (RANDOM)       | :TQ10002 |   738 | 51660 |  Q1,02 | P->S |
|*  3 |    HASH JOIN BUFFERED       |          |   738 | 51660 |  Q1,02 | PCWP |
|   4 |     JOIN FILTER CREATE      | :BF0000  |   737 | 17688 |  Q1,02 | PCWP |
|   5 |      PX RECEIVE             |          |   737 | 17688 |  Q1,02 | PCWP |
|   6 |       PX SEND HYBRID HASH   | :TQ10000 |   737 | 17688 |  Q1,00 | P->P |
|   7 |        STATISTICS COLLECTOR |          |       |       |  Q1,00 | PCWC |
|   8 |         PX BLOCK ITERATOR   |          |   737 | 17688 |  Q1,00 | PCWC |
|*  9 |          TABLE ACCESS FULL  | T2       |   737 | 17688 |  Q1,00 | PCWP |
|  10 |     PX RECEIVE              |          |  1986 | 91356 |  Q1,02 | PCWP |
|  11 |      PX SEND HYBRID HASH    | :TQ10001 |  1986 | 91356 |  Q1,01 | P->P |
|  12 |       JOIN FILTER USE       | :BF0000  |  1986 | 91356 |  Q1,01 | PCWP |
|  13 |        PX BLOCK ITERATOR    |          |  1986 | 91356 |  Q1,01 | PCWC |
|* 14 |         TABLE ACCESS FULL   | T1       |  1986 | 91356 |  Q1,01 | PCWP |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."PRODUCT_ID"="ID")
   9 - access(:Z>=:Z AND :Z<=:Z)
       filter(("ID"<=1000 AND "START_DATE">TO_DATE(' 2012-06-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "START_DATE"<=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
  14 - access(:Z>=:Z AND :Z<=:Z)
       filter(("T1"."PRODUCT_ID"<=1000 AND SYS_OP_BLOOM_FILTER(:BF0000,"T1"."PRODUCT_ID")))

Note
-----
- Degree of Parallelism is 4 because of hint

While the initial execution plan uses two DFO trees, the new one managed to use only a single DFO trees with 3 DFOs. Having multiple parallelisers (or DFOs) might sometimes create issues like in this current case of TEMP TABLE transformation and PQ_SLAVE_MISMATCH.

BOTTOM LINE

Even though this issue doesn’t reproduce in 12.0.1.2, for previous Oracle releases, watch out your parallel queries using a temp table transformation. You might be confronted to an execution plan instability due to the PQ_SLAVE_MISMATCH non-sharing reason where a parallel slave refuses to share the execution plan of its query coordinator. Bear in mind as well that in such situation even a SQL Profile will not succeed to definitely guarantee the same execution plan at each execution.



Viewing all articles
Browse latest Browse all 224

Trending Articles