Quantcast
Channel: Mohamed Houri’s Oracle Notes
Viewing all articles
Browse latest Browse all 224

AWR and superfluous historical statistics

$
0
0

An active sql_id is subject to a capture and a load into ASH (gv$active_session_history). As long as this sql_id is present in ASH there is a 1/10 chance for it to be captured into AWR persisted historical tables as well. In addition to the ASH gv$active_session_history and its AWR alter-ego dba_hist_active_sess_history table, performance and tuning specialists are extensively using the dba_hist_sqlstat table into which snapshots of gv$sql are periodically sent by the MMON Lite process via the SGA-ASH circular buffer.

One of the queries I am using against this table is the following one which I am referring to it as histstats.sql (I think I have originally hijacked from the internet):

SELECT
sn.snap_id,
plan_hash_value,
st.sql_profile,
executions_delta execs,
TRUNC(elapsed_time_delta/1e6/DECODE(executions_delta, 0, 1, executions_delta)) avg_etime,
  ROUND(disk_reads_delta    /DECODE(executions_delta,0,1, executions_delta),1) avg_pio,
  ROUND(buffer_gets_delta   /DECODE(executions_delta,0,1, executions_delta), 1) avg_lio ,
  ROUND(rows_processed_delta/DECODE(executions_delta,0, 1, executions_delta), 1) avg_rows
FROM
  dba_hist_sqlstat st,
  dba_hist_snapshot sn
WHERE st.snap_id        = sn.snap_id
AND sql_id              = '&sql_id'
AND begin_interval_time >= to_date('&date','ddmmyyyy')
ORDER BY 1 ASC;

That’s said how would you read and interpret the following output of the above query taken from a running system for a particular sql_id?

SNAP_ID     PLAN_HASH_VALUE SQL_PROFILE                     EXECS AVG_ETIME AVG_LIO
----------  --------------- ------------------------------ ----- ---------- ----------
30838        726354567                                      0       7227    3945460
30838        726354567                                      0       7227    3945460
30839       4211030025      prf_2yhzvghb06vh4_4211030025    1          3      28715
30839       4211030025      prf_2yhzvghb06vh4_4211030025    1          3      28715
30839        726354567                                      0       7140    5219336
30839        726354567                                      0       7140    5219336
30840        726354567                                      0       7203    9389840
30840        726354567                                      0       7203    9389840
30840       4211030025      prf_2yhzvghb06vh4_4211030025    0       2817    7831649
30840       4211030025      prf_2yhzvghb06vh4_4211030025    0       2817    7831649
30841        726354567                                      0       7192    5200201
30841        726354567                                      0       7192    5200201
30841       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0
30841       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0
30842       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0
30842       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0
30842        726354567                                      0       4956    3183667
30842        726354567                                      0       4956    3183667

Or, to make things less confused, how would you interpret the different rows of snapshot 30841 reproduced below?

SNAP_ID     PLAN_HASH_VALUE SQL_PROFILE                     EXECS AVG_ETIME AVG_LIO
----------  --------------- ------------------------------ ----- ---------- ----------
30841        726354567                                      0       7192    5200201
30841        726354567                                      0       7192    5200201
30841       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0
30841       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0

What do those zeros values in the line with a not-null SQL Profile actually mean?

The answer to this question resides in the way Oracle dumps statistics of a SQL statement from standard memory into AWR tables. When an active sql_id having multiple child cursors in gv$sql, the MMON Lite process will (if the sql_id qualify for the capture of course) dump into the dba_hist_sqlstat view statistics of all corresponding distinct child cursors including those not used by the active sql_id at the capture time (even the statistics of non-shareable cursors are dumped into this view provided these cursors are present in gv$sql at the capture moment).

For example in the above list you can see that we have 4 execution statistics at snapshot 30841 of which the two first ones are actually using effectively the plan_hash_value 726354567. According to their elapsed time (7192) these executions did span multiple snapshots (this is a serial execution plan by the way). But the remaining lines, those showing a SQL Profile, are in fact superfluous and confusing. They have been captured only because they correspond to the presence of a second shareable child cursor (plan_hash_value 4211030025) in gv$sql at the moment of the capture (snapshot 30841). One of the indication that they are superfluous is the zeros statistics in their average logical I/O, physical I/O, executions and number of rows processed.

Now that the scene has been set we need a reproducible example. For that purpose any SQL statement having multiple child cursors in gv$sql will do the trick. An easy way of engineering such a case is to use Adaptive Cursor Sharing (in 12.1.0.2.0) among other different scenarios:

The model

INSERT INTO t1
SELECT level n1 ,
  CASE
    WHEN level = 1
    THEN 1
    WHEN level > 1
    AND level <= 101
    THEN 100
    WHEN level > 101
    AND level <= 1101
    THEN 1000
    WHEN level > 10001
    AND level <= 11000
    THEN 10000
    ELSE 1000000
  END n2
FROM dual
  CONNECT BY level < 1200150;

CREATE INDEX t1_i1 ON t1(n2);

BEGIN
  dbms_stats.gather_table_stats
         (user
         ,'t1'
         ,method_opt => 'for all columns size auto'
         ,cascade => true
         ,estimate_percent => dbms_stats.auto_sample_size
         );
END;
/

Generating multiple child cursors using ACS

var ln2 number;
exec :ln2 := 100;

SQL> select count(1) FROm t1 WHERE n2 <= :ln2;

  COUNT(1)
----------
       101

exec :ln2 := 1000000
SQL> select count(1) FROm t1 WHERE n2 <= :ln2;

  COUNT(1)
----------
   1200149

SQL_ID  9sp9wvczrvpty, child number 0
-------------------------------------
select count(1) FROm t1 WHERE n2 <= :ln2

Plan hash value: 2432955788

---------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes |
---------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |
|   1 |  SORT AGGREGATE   |       |     1 |     3 |
|*  2 |   INDEX RANGE SCAN| T1_I1 |   218 |   654 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N2"<=:LN2)

In order for the above sql_id to be bind aware and get a new optimal execution plan at each execution, it needs to be executed once again (more details can be found here):

SQL> select count(1) FROm t1 WHERE n2 <= :ln2;

  COUNT(1)
----------
   1200149

SQL_ID  9sp9wvczrvpty, child number 1
-------------------------------------
select count(1) FROm t1 WHERE n2 <= :ln2

Plan hash value: 3724264953
----------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |
----------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |
|*  2 |   TABLE ACCESS FULL| T1   |  1191K|  3489K|
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N2"<=:LN2)

Notice now that a new child cursor n°1 has been generated while the existing child cursor n° 0 switched to a non-shareable status. In order to wake it up, we need to run again the same query at the corresponding bind variable value:

exec :ln2 := 100;

select count(1) FROm t1 WHERE n2 <= :ln2;

SQL_ID  9sp9wvczrvpty, child number 2
-------------------------------------
select count(1) FROm t1 WHERE n2 <= :ln2

Plan hash value: 2432955788
---------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes |
---------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |
|   1 |  SORT AGGREGATE   |       |     1 |     3 |
|*  2 |   INDEX RANGE SCAN| T1_I1 |   218 |   654 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 – access("N2"<=:LN2)

Finally, we have achieved the initial goal of having multiple execution plans in gv$sql for the above SQL statement:

SELECT
  sql_id ,
  child_number ,
  first_load_time ,
  last_load_time ,
  plan_hash_value ,
  executions execs ,
  rows_processed ,
  trunc(buffer_gets/executions) lio ,
  is_shareable
FROM gv$sql
WHERE sql_id = '9sp9wvczrvpty'
--and is_shareable = 'Y'
order by last_load_time desc;

SQL_ID        CHILD_NUMBER FIRST_LOAD_TIME     LAST_LOAD_TIME      PLAN_HASH_VALUE  EXECS ROWS_PROCESSED        LIO I
------------- ------------ ------------------- ------------------- --------------- ------ -------------- ---------- -
9sp9wvczrvpty            0 2016-09-28/07:19:21 2016-09-28/07:19:21      2432955788      2              2       1225 N
9sp9wvczrvpty            1 2016-09-28/07:19:21 2016-09-28/07:20:44      3724264953      1              1       2222 Y
9sp9wvczrvpty            2 2016-09-28/07:19:21 2016-09-28/07:22:03      2432955788      1              1          3 Y

Notice that the sql_id (9sp9wvczrvpty) has three child cursors of which child cursor 0 is not shareable anymore. There is still nothing sent into AWR table as shown below:

SQL>  @histstats
Enter value for sql_id: 9sp9wvczrvpty
Enter value for date: 27092016

no rows selected

Forcing AWR to dump SQL statistics
In order to make Oracle dumping statistics of this SQL statement we are going to force a manual AWR snapshot capture by means of the following package call:

SQL> exec dbms_workload_repository.create_snapshot;

The call will create a snapshot and dump the above content of gv$sql into dba_hist_sqlstat table as shown below:

SQL>  @histstats
Enter value for sql_id: 9sp9wvczrvpty
Enter value for date: 27092016

   SNAP_ID SQL_ID        PLAN_HASH_VALUE SQL_PROFILE                     EXECS  AVG_ETIME    AVG_LIO  AVG_PIO  AVG_ROWS
---------- ------------- --------------- ------------------------------ ------ ---------- ---------- -------- ---------
      4305 9sp9wvczrvpty      2432955788                                     3          0        817        0         1
      4305 9sp9wvczrvpty      3724264953                                     1          0       2222        0         1

That what was clearly expected: Oracle has captured into dba_hist_sqlstat table 4 executions at plan_hash_value 2432955788 and 1 execution at plan_hash_value 3724264953.

Now that we have multiple child cursors present in gv$sql, if AWR is to dump the sql_id it will then dump all its child cursor even those not used during the captured time. For example if I run again the same query with bind variable (ln2= 1000000) Oracle will use the plan with plan_hash_value 3724264953; but it will, nevertheless, dump statistics of the other non-used plan_hash_value into dba_hist_sqlstat as well.

exec :ln2 := 1000000

select count(1) FROm t1 WHERE n2 <= :ln2;

exec dbms_workload_repository.create_snapshot;

SQL> @histstats
Enter value for sql_id: 9sp9wvczrvpty
Enter value for date: 27092016

SNAP_ID SQL_ID        PLAN_HASH_VALUE SQL_PROFILE    EXECS  AVG_ETIME    AVG_LIO  AVG_PIO  AVG_ROWS
----- ------------- --------------- ------------- ------ ---------- ---------- -------- ---------
 4305 9sp9wvczrvpty      2432955788                    3          0        817        0         1
 4305 9sp9wvczrvpty      3724264953                    1          0       2222        0         1

 4306 9sp9wvczrvpty      2432955788                    0          0          0        0         0
 4306 9sp9wvczrvpty      3724264953                    1          0       2222        0         1

That’s it. Notice how, as expected, Oracle has dumped two rows for the new snapshot 4306 of which the first one with plan_hash_value (2432955788) is superfluous and has been captured in AWR tables only because it was present in gv$sql at the snapshot capture time.

Now that we know that Oracle is dumping all plan_hash_value present in gv$sql at the snapshot capture time even those not used during this snapshot, instead of having the superfluous lines appearing in the historical execution statistics I have added a minor where clause to the above extensively used script to obtain this:

SELECT
 *
FROM
 (SELECT
   sn.snap_id
   ,plan_hash_value
   ,sql_profile
   ,executions_delta execs
   ,trunc(elapsed_time_delta/1e6/decode(executions_delta, 0, 1, executions_delta)) avg_etime
   ,round(disk_reads_delta/decode(executions_delta,0,1, executions_delta),1) avg_pio
   ,round(buffer_gets_delta/decode(executions_delta,0,1, executions_delta), 1) avg_lio
   ,round(px_servers_execs_delta/decode(executions_delta,0,1, executions_delta), 1) avg_px
   ,round(rows_processed_delta/decode(executions_delta,0, 1, executions_delta), 1) avg_rows
  FROM
     dba_hist_sqlstat st,
     dba_hist_snapshot sn
  WHERE st.snap_id = sn.snap_id
  AND sql_id = '&sql_id'
  AND begin_interval_time > to_date('&from_date','ddmmyyyy')
 )
 WHERE avg_lio != 0 – added clause
 ORDER by 1 asc;
SQL> @histstats2
Enter value for sql_id: 9sp9wvczrvpty
Enter value for from_date: 27092016

SNAP_ID PLAN_HASH_VALUE SQL_PROFILE  EXECS  AVG_ETIME  AVG_PIO    AVG_LIO     AVG_PX  AVG_ROWS
------- --------------- ----------- ------ ---------- -------- ---------- ---------- ---------
   4305      2432955788                  3          0        0      817.7          0         1
   4305      3724264953                  1          0        0       2222          0         1

   4306      3724264953                  1          0        0       2222          0         1

The above added where clause excludes SQL statements with 0 logical I/O.

But wait a moment!!!

This might, at the same time, exclude vital information from the historical statistics. For example the following situation will not consume any logical I/O but can spent a lot time locked waiting for a resource to be released:

SQL-1 > select * from t1 where rownum <=  3 for update;

        N1         N2
---------- ----------
      1825    1000000
      1826    1000000
      1827    1000000

SQL-2> lock table t1 in exclusive mode;

The second session will obviously hang. Wait a couple of seconds, go back to session , issue a rollback and watch out the new situation of session n°2

SQL-1> rollback;

SQL-2> lock table t1 in exclusive mode;

Table(s) Locked.

All what remains to do is to take a manual snapshot and check out how many logical I/O have been consumed by the SQL statement which has tried to lock the table in exclusive mode:

SQL> exec dbms_workload_repository.create_snapshot;

select sql_id
from gv$sql where sql_text like '%lock table t1%'
and sql_text not like '%v$sql%';

SQL_ID
-------------
a9nb52un8wqf4

SQL> @histstats
Enter value for sql_id: a9nb52un8wqf4
Enter value for date: 27092016

SNAP_ID SQL_ID        PLAN_HASH_VALUE SQL_PROFILE  EXECS  AVG_ETIME    AVG_LIO  AVG_PIO  AVG_ROWS
------- ------------- --------------- ----------- ------ ---------- ---------- -------- ---------
   4308 a9nb52un8wqf4               0                  1        159          0        0         0

That’s how a SQL can take a long time to complete while consuming 0 Logical I/O. This is why I have added a second extra where clause to the histstats2 script as shown below:

SELECT
*
FROM
  (SELECT
    sn.snap_id ,
    plan_hash_value ,
    executions_delta execs ,
TRUNC(elapsed_time_delta/1e6/DECODE(executions_delta, 0, 1, executions_delta)) avg_etime ,
    ROUND(disk_reads_delta      /DECODE(executions_delta,0,1, executions_delta),1) avg_pio ,
    ROUND(buffer_gets_delta     /DECODE(executions_delta,0,1, executions_delta), 1) avg_lio ,
    ROUND(px_servers_execs_delta/DECODE(executions_delta,0,1, executions_delta), 1) avg_px ,
    ROUND(rows_processed_delta  /DECODE(executions_delta,0, 1, executions_delta), 1) avg_rows
  FROM
    dba_hist_sqlstat st,
    dba_hist_snapshot sn
  WHERE st.snap_id        = sn.snap_id
  AND sql_id              = '&sql_id'
  AND begin_interval_time > to_date('&from_date','ddmmyyyy')
  )
WHERE avg_lio != 0
     OR (avg_lio   =0 AND avg_etime > 0)
 ORDER by 1 asc;

This new script (histats3.sql), when applied to the two sql_id we have investigated in this post gives respectively:

SQL> @HistStats3
Enter value for sql_id: 9sp9wvczrvpty
Enter value for from_date: 27092016

   SNAP_ID PLAN_HASH_VALUE  EXECS  AVG_ETIME  AVG_PIO    AVG_LIO     AVG_PX  AVG_ROWS
---------- --------------- ------ ---------- -------- ---------- ---------- ---------
      4305      2432955788      3          0        0      817.7          0         1
      4305      3724264953      1          0        0       2222          0         1
      4306      3724264953      1          0        0       2222          0         1

SQL> @HistStats3
Enter value for sql_id: a9nb52un8wqf4
Enter value for from_date: 27092016

   SNAP_ID PLAN_HASH_VALUE  EXECS  AVG_ETIME  AVG_PIO    AVG_LIO     AVG_PX  AVG_ROWS
---------- --------------- ------ ---------- -------- ---------- ---------- ---------
      4308               0      1        159        0          0          0         0

It goes without saying that I will be very grateful to have feedback about the added where clause. I might have neglected a situation in which both Logical I/O and elapsed time are not greater than 0 and that are important to appear in the historical statistics.

Last but not least you might have already pointed out the apparition of the AVG_PX column in the above output? That will concern the next article about ASH-AWR statistics for parallel queries. Stay tuned I have a nice example from a running system to share with you and where this column reveals to be very helpful. I may also show a version of the script which includes the end_of_fetch_count column which reveals also to be a valuable information to exploit when trying to understand average elapsed time of queries coming from a web-service and where a time out has been implemented.



Viewing all articles
Browse latest Browse all 224

Trending Articles