Quantcast
Channel: Mohamed Houri’s Oracle Notes
Viewing all articles
Browse latest Browse all 224

On the impact of stale statistics when enabling Foreign keys

$
0
0

If you are going to enable a foreign key integrity constraint, then make sure you have fresh and representative statistics on both child and parent table ends. If this is not so, then you might lose a lot of time enabling your constraint because of a non-easy to spot recursive query.

Here’s below a summary of a real life 11.2.0.4 case:

SQL Text
------------------------------
BEGIN XZA_DDL.ENABLE_FK(owner => 'ABCDXY', nomtab => 'TABLE_ABC_C', nomfk => NULL); END;

Global Information
------------------------------
 Status              :  EXECUTING           
 Instance ID         :  1                   
 Session             :  XYZ_ABC(13:25335)  
 SQL ID              :  46bn1bvfkkpvn       
 SQL Execution ID    :  16777216            
 Execution Started   :  01/13/2020 16:54:37 
 First Refresh Time  :  01/13/2020 16:54:43 
 Last Refresh Time   :  01/14/2020 15:23:35 
 Duration            :  80939s              
 Module/Action       :  SQL Developer/-     
 Service             :  SYS$USERS           
 Program             :  SQL Developer       

Global Stats
===========================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | PL/SQL  |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Time(s) | Waits(s) |  Gets  | Reqs | Bytes |
===========================================================================================
|   80937 |   79809 |     3.27 |        1.74 |    0.00 |     1124 |     2G |  215 |  21MB |
===========================================================================================

The very long enabling FK process was still in EXECUTING status after more than 22 hours as shown above. Let’s check what ASH can tell us about this situation:

SQL> select sql_id, count(1)
  2  from gv$active_session_history
  3  where
  4      sample_time between to_date('14012020 12:00:00', 'ddmmyyyy hh24:mi:ss')
  5                  and     to_date('14012020 15:45:00', 'ddmmyyyy hh24:mi:ss')
  6  group by sql_id
  7  order by 2 desc
  8  ;

SQL_ID          COUNT(1)
------------- ----------
                   17801
608srf0vf5f3q      12937 -->  alter table ABCDXY. TABLE_ABC_C
a8gtvr24afy70      10919
1vtd595xys9fp       9135
7q4kq2auz89x1       4203
fdxa2ph5250x1       3058

Manifestly it seems that we must focus our attention on the 608srf0vf5f3q SQL_ID.

From a wait event point of view, ASH shows this:

SQL> select event, count(1)
  2  from gv$active_session_history
  3  where
  4      sample_time between to_date('14012020 12:00:00', 'ddmmyyyy hh24:mi:ss')
  5                  and     to_date('14012020 15:45:00', 'ddmmyyyy hh24:mi:ss')
  6  group by event
  7  order by 2 desc;

EVENT                      COUNT(1)
------------------------ ----------
                              48756
row cache lock                12194
db file parallel write         6867
db file scattered read         6519
db file sequential read        5356
direct path read               1794
…/../

A lot of CPU burning followed by the row cache lock wait event. However, the SQL_ID 608srf0vf5f3q is reported to be only burning CPU as the followings proves:

SQL> select event, count(1)
  2  from gv$active_session_history
  3  where
  4      sample_time between to_date('14012020 12:00:00', 'ddmmyyyy hh24:mi:ss')
  5                  and     to_date('14012020 15:45:00', 'ddmmyyyy hh24:mi:ss')
  6  and sql_id = '608srf0vf5f3q'
  7  group by event
  8  order by 2 desc;

EVENT     COUNT(1)
------- ----------
             12937

Since the enabling process was still running I decided to “snap’’ the SID (13) of the PL/SQL stored package which I got from the corresponding gv$sql_monitor

SQL> @snapper all 5 1 13
Sampling SID 13 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.26 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) 
- Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! 

------------------------------------------------------------------------------------------------
SID, USERNAME, TYPE, STATISTIC                            ,         DELTA, HDELTA/SEC,    %TIME,
------------------------------------------------------------------------------------------------
 13, XYZ_ABC, STAT, session logical reads                ,        142350,     27.03k,         ,
 13, XYZ_ABC, STAT, consistent gets                      ,        142351,     27.03k,         ,
 13, XYZ_ABC, STAT, consistent gets from cache           ,        142352,     27.03k,         ,
 13, XYZ_ABC, STAT, consistent gets from cache (fastpath),        142352,     27.03k,         ,
 13, XYZ_ABC, STAT, logical read bytes from cache        ,    1166147584,    221.41M,         , --> this looks interesting
 13, XYZ_ABC, STAT, calls to kcmgcs                      ,            22,       4.18,         ,
 13, XYZ_ABC, STAT, no work - consistent read gets       ,        142334,     27.02k,         ,
 13, XYZ_ABC, STAT, table scans (short tables)           ,             3,        .57,         ,
 13, XYZ_ABC, STAT, table scan rows gotten               ,      23484525,      4.46M,         ,
 13, XYZ_ABC, STAT, table scan blocks gotten             ,        142335,     27.02k,         ,
 13, XYZ_ABC, STAT, buffer is pinned count               ,             3,        .57,         ,
 13, XYZ_ABC, TIME, DB CPU                               ,       6003174,      1.14s,   114.0%,
 13, XYZ_ABC, TIME, sql execute elapsed time             ,       6003624,      1.14s,   114.0%,
 13, XYZ_ABC, TIME, DB time                              ,       6003624,      1.14s,   114.0%,

--  End of Stats snap 1, end=2020-01-14 15:30:20, seconds=5.3

----------------------------------------------------------------------------------
  ActSes   %Thread | INST | SQL_ID          | SQL_CHILD | EVENT  | WAIT_CLASS
----------------------------------------------------------------------------------
    1.00    (100%) |    1 | 608srf0vf5f3q   | 0         | ON CPU | ON CPU

--  End of ASH snap 1, end=2020-01-14 15:30:20, seconds=5, samples_taken=48, AAS=1

Nonchalantly, all the investigations are showing that the culprit query is 608srf0vf5f3q which is burning a lot of CPU. The 221 M of logical read bytes from cache could very well correlate with high CPU consumption. But we still don’t know why and who is reading those hundreds of millions of logical reads from cache?

In addition, the ash_wait_chain script applied to this SQL_ID 608srf0vf5f3q shows that this query is not blocked by any other process:

SQL> @ash_wait_chains event2 sql_id='608srf0vf5f3q' "timestamp'2020-01-14 12:00:00'" "timestamp'2020-01-14 15:45:00'"

-- Display ASH Wait Chain Signatures script v0.4 BETA by Tanel Poder ( http://blog.tanelpoder.com )

%This     SECONDS     AAS WAIT_CHAIN    LAST_SEEN
------ ---------- ------- ------------- --------------------
 100%       12937     1.0 -> ON CPU     2020-01-14 15:44:59

That’s weird. Isn’t it?

After a couple of minutes going round and round in circles, I decided to use the brute force by:

  • Killing the process
  •  Activating the 10046 trace
  •  Running the process for a couple of minutes
  •  Stopping again the process
  •  Analyzing the trace file
alter session set tracefile_identifier = EnablingForeignKey;
@46on 12
exec XZA_DDL.ENABLE_FK(owner => 'ABCDXY', nomtab => 'TABLE_ABC_C', nomfk => NULL);
@46off

And here’s below what I found in the TKPROFED generated trace file

SQL ID: 56tj8jdcw15jb Plan Hash: 2257352843

select /*+ all_rows ordered dynamic_sampling(2) */ A.rowid, :1, :2, :3 
from
 "ABCDXY"."TABLE_ABC_C" A , "ABCDXY"."TABLE_PARENT_ABC" B where( 
  "A"."COL_1" is not null and "A"."COL_2" is not null and "A"."COL_3" 
  is not null and "A"."COL_4" is not null) and( "B"."COL_1" (+)= 
  "A"."COL_1" and "B"."COL_2" (+)= "A"."COL_2" and "B"."COL_3" (+)=
   "A"."COL_3" and "B"."COL_4" (+)= "A"."COL_4") and( "B"."COL_1" is 
  null or "B"."COL_2" is null or "B"."COL_3" is null or "B"."COL_4" is 
  null)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1    217.83     224.03          0    6179071          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3    217.83     224.03          0    6179071          0           0

This recursive query was enough for me to understand and fix this issue. I’ve then started by checking the historical execution of this recursive query as shown below:

SQL> @SqlHistStat
Enter value for sql_id: 56tj8jdcw15jb
Enter value for from_date: 01012020

SNAP_BEGIN                PLAN_HASH_VALUE    EXECS END_OF_FETCH  AVG_ETIME     AVG_LIO   AVG_ROWS
------------------------- --------------- -------- ------------ ----------  ---------- ----------
13-JAN-20 04.00.44.563 PM      2257352843        1            0        381    10934401          0
13-JAN-20 05.00.56.481 PM      2257352843        0            0       3551   104544327          0
13-JAN-20 06.00.05.981 PM      2257352843        0            0       3609   104844651          0
13-JAN-20 07.00.15.180 PM      2257352843        0            0       3619    84153912          0
13-JAN-20 08.00.29.468 PM      2257352843        0            0       3610   104607796          0
13-JAN-20 09.00.38.801 PM      2257352843        0            0       3619    93027006          0
13-JAN-20 10.00.49.488 PM      2257352843        0            0       3595   102089647          0
13-JAN-20 11.00.59.446 PM      2257352843        0            0       3545    99557582          0
14-JAN-20 12.00.09.039 AM      2257352843        0            0       3609   104246681          0
14-JAN-20 01.00.18.823 AM      2257352843        0            0       3608   106861170          0
14-JAN-20 02.00.28.120 AM      2257352843        0            0       3611   103242627          0
14-JAN-20 03.00.37.711 AM      2257352843        0            0       3610   106157348          0
14-JAN-20 04.00.47.161 AM      2257352843        0            0       3608   103770559          0
14-JAN-20 05.00.56.531 AM      2257352843        0            0       3547   103304809          0
14-JAN-20 06.00.05.487 AM      2257352843        0            0       3609   105243323          0
14-JAN-20 07.00.14.675 AM      2257352843        0            0       3608   107326829          0
14-JAN-20 08.00.23.442 AM      2257352843        0            0       3608   108137854          0
14-JAN-20 09.00.32.133 AM      2257352843        0            0       3609   107183542          0
14-JAN-20 10.00.41.111 AM      2257352843        0            0       3610   104855775          0
14-JAN-20 11.00.50.825 AM      2257352843        0            0       3554   100223228          0
14-JAN-20 12.00.00.751 PM      2257352843        0            0       3610    99540252          0
14-JAN-20 01.00.11.282 PM      2257352843        0            0       3614   101153086          0
14-JAN-20 02.00.21.974 PM      2257352843        0            0       3610    98943412          0
14-JAN-20 03.00.32.525 PM      2257352843        1            0        436    12358214          0

24 rows selected.

As you can see, through the END_OF_FETCH column, this query has spanned 24 one-hour snapshots without completing. It uses the following execution plan:

--------------------------------------------------------------------------------
| Id  | Operation                  | Name              | Rows  | Pstart| Pstop |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                   |       |       |       |
|*  1 |  FILTER                    |                   |       |       |       |
|   2 |   NESTED LOOPS OUTER       |                   |     1 |       |       |
|   3 |    PARTITION RANGE ALL     |                   |     1 |     1 |     2 |
|   4 |     TABLE ACCESS FULL      | TABLE_ABC_C       |     1 |     1 |     2 |
|   5 |    PARTITION RANGE ITERATOR|                   |     1 |   KEY |   KEY |
|*  6 |     TABLE ACCESS FULL      | TABLE_PARENT_ABC  |     1 |   KEY |   KEY |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("B"."COL_1" IS NULL OR "B"."COL_2" IS NULL OR "B"."COL_3" IS NULL OR
              "B"."COL_4" IS NULL))
   6 - filter(("B"."COL_1"="A"."COL_1" AND "B"."COL_2"="A"."COL_2" AND
               "B"."COL_3"="A"."COL_3" AND "B"."COL_4"="A"."COL_4"))

I don’t know why whenever I have to fix a very long non-ending process, I almost always end up by finding that a NESTED LOOPS is for something in this pain 😊

It is clear now that the TABLE_ABC_C child table and its parent TABLE_PARENT_ABC need to have fresh statistics before relaunching the FK enabling process. I always keep saying that, when an estimated cardinality of 1 is reported in the execution plan, then there is a big change that this is due to a non-fresh or non-representative statistic.

SQL> exec dbms_stats.gather_table_stats ('ABCDXY', 'TABLE_ABC_C', cascade => true,         no_invalidate => false);
SQL> exec dbms_stats.gather_table_stats ('ABCDXY', 'TABLE_PARENT_ABC', cascade => true, no_invalidate => false);

SQL> @HistStat
Enter value for owner: ABCDXY
Enter value for table_name: TABLE_ABC_C

OBJECT_NAME    OBJECT_TYPE  PREV_STAT_TIME           ROWCNT SAMPLESIZE ESTIMATE_PCT
-------------- ------------ -------------------- ---------- ---------- ------------
TABLE_ABC_C  TABLE       
TABLE_ABC_C  TABLE        09/01/2020 16:59:25           0          0            0
TABLE_ABC_C  TABLE        14/01/2020 16:34:17      946798     946798          100
TABLE_ABC_C  TABLE        14/01/2020 17:09:02      946798     946798          100

SQL> @HistStat
Enter value for owner: ABCDXY
Enter value for table_name: TABLE_PARENT_ABC

OBJECT_NAME         OBJECT_TYPE  PREV_STAT_TIME        ROWCNT SAMPLESIZE ESTIMATE_PCT
------------------- ------------ -------------------- ------- ---------- ------------
TABLE_PARENT_ABC   TABLE       
TABLE_PARENT_ABC   TABLE        09/01/2020 16:59:25        0          0            0
TABLE_PARENT_ABC   TABLE        14/01/2020 16:34:43  9032840    9032840          100
TABLE_PARENT_ABC   TABLE        14/01/2020 17:08:52  9032840    9032840          100

And finally, the coast is clear to re-enable the foreign key in about 10 seconds as shown below:

SQL> exec DBA_DDL.ENABLE_FK(owner => 'ABCDXY', nomtab => 'TABLE_ABC_C', nomfk => NULL);

PL/SQL procedure successfully completed.

Elapsed: 00:00:10.29 --> 10 seconds!

This has been possible because the new execution plan switched from that dramatic NL to a more suitable HASH JOIN as shown below:

SQL_ID  56tj8jdcw15jb, child number 0
-------------------------------------
Plan hash value: 1651427782
------------------------------------------------------------------------------------------
| Id  | Operation            | Name              | Rows  | Bytes |TempSpc| Pstart| Pstop |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                   |       |       |       |       |       |
|   1 |  PARTITION RANGE ALL |                   |     1 |    90 |       |     1 |     3 |
|*  2 |   FILTER             |                   |       |       |       |       |       |
|*  3 |    HASH JOIN OUTER   |                   |     1 |    90 |    18M|       |       |
|   4 |     TABLE ACCESS FULL| TABLE_ABC_C       |   946K|    46M|       |     1 |     3 |
|   5 |     TABLE ACCESS FULL| TABLE_PARENT_ABC  |  9032K|   335M|       |     1 |     3 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("B"."COL_1" IS NULL OR "B"."COL_2" IS NULL OR "B"."COL_3" IS NULL 
              OR "B"."COL_4" IS NULL))
   3 - access("B"."COL_4"="A"."COL_4" AND "B"."COL_1"="A"."COL_1" AND
              "B"."COL_2"="A"."COL_2" AND "B"."COL_3"="A"."COL_3")

SQL> @sqlstats
Enter value for sql_id: 56tj8jdcw15jb

     CHILD PLAN_HASH_VALUE   AVG_GETS   AVG_PIOS  AVG_ETIME     EXECS
---------- --------------- ---------- ---------- ---------- ---------
         0      1651427782     133519     109286  27.018412         1

Bottom Line

Regardless of the kind of Oracle SQL operations you are executing you should always make sure you have given, beforehand, Oracle enough statistics information about your table/index/column. If this is not so, then be sure that you will certainly end up by stopping to be lucky and by facing a performance trouble. And, in the case of enabling Foreign key constraint, this performance pain will be exacerbated by a recursive query involving the parent-child tables. If this recursive query has a wrong execution plan because of lack of statistics, then it will drastically delay the FK enabling execution time and it will be very hard to point out that the root cause of the performance pain is due to that recursive query.


Viewing all articles
Browse latest Browse all 224

Trending Articles