An active sql_id is subject to a capture and a load into ASH (gv$active_session_history). As long as this sql_id is present in ASH there is a 1/10 chance for it to be captured into AWR persisted historical tables as well. In addition to the ASH gv$active_session_history and its AWR alter-ego dba_hist_active_sess_history table, performance and tuning specialists are extensively using the dba_hist_sqlstat table into which snapshots of gv$sql are periodically sent by the MMON Lite process via the SGA-ASH circular buffer.
One of the queries I am using against this table is the following one which I am referring to it as histstats.sql (I think I have originally hijacked from the internet):
SELECT sn.snap_id, plan_hash_value, st.sql_profile, executions_delta execs, TRUNC(elapsed_time_delta/1e6/DECODE(executions_delta, 0, 1, executions_delta)) avg_etime, ROUND(disk_reads_delta /DECODE(executions_delta,0,1, executions_delta),1) avg_pio, ROUND(buffer_gets_delta /DECODE(executions_delta,0,1, executions_delta), 1) avg_lio , ROUND(rows_processed_delta/DECODE(executions_delta,0, 1, executions_delta), 1) avg_rows FROM dba_hist_sqlstat st, dba_hist_snapshot sn WHERE st.snap_id = sn.snap_id AND sql_id = '&sql_id' AND begin_interval_time >= to_date('&date','ddmmyyyy') ORDER BY 1 ASC;
That’s said how would you read and interpret the following output of the above query taken from a running system for a particular sql_id?
SNAP_ID PLAN_HASH_VALUE SQL_PROFILE EXECS AVG_ETIME AVG_LIO ---------- --------------- ------------------------------ ----- ---------- ---------- 30838 726354567 0 7227 3945460 30838 726354567 0 7227 3945460 30839 4211030025 prf_2yhzvghb06vh4_4211030025 1 3 28715 30839 4211030025 prf_2yhzvghb06vh4_4211030025 1 3 28715 30839 726354567 0 7140 5219336 30839 726354567 0 7140 5219336 30840 726354567 0 7203 9389840 30840 726354567 0 7203 9389840 30840 4211030025 prf_2yhzvghb06vh4_4211030025 0 2817 7831649 30840 4211030025 prf_2yhzvghb06vh4_4211030025 0 2817 7831649 30841 726354567 0 7192 5200201 30841 726354567 0 7192 5200201 30841 4211030025 prf_2yhzvghb06vh4_4211030025 0 0 0 30841 4211030025 prf_2yhzvghb06vh4_4211030025 0 0 0 30842 4211030025 prf_2yhzvghb06vh4_4211030025 0 0 0 30842 4211030025 prf_2yhzvghb06vh4_4211030025 0 0 0 30842 726354567 0 4956 3183667 30842 726354567 0 4956 3183667
Or, to make things less confused, how would you interpret the different rows of snapshot 30841 reproduced below?
SNAP_ID PLAN_HASH_VALUE SQL_PROFILE EXECS AVG_ETIME AVG_LIO ---------- --------------- ------------------------------ ----- ---------- ---------- 30841 726354567 0 7192 5200201 30841 726354567 0 7192 5200201 30841 4211030025 prf_2yhzvghb06vh4_4211030025 0 0 0 30841 4211030025 prf_2yhzvghb06vh4_4211030025 0 0 0
What do those zeros values in the line with a not-null SQL Profile actually mean?
The answer to this question resides in the way Oracle dumps statistics of a SQL statement from standard memory into AWR tables. When an active sql_id having multiple child cursors in gv$sql, the MMON Lite process will (if the sql_id qualify for the capture of course) dump into the dba_hist_sqlstat view statistics of all corresponding distinct child cursors including those not used by the active sql_id at the capture time (even the statistics of non-shareable cursors are dumped into this view provided these cursors are present in gv$sql at the capture moment).
For example in the above list you can see that we have 4 execution statistics at snapshot 30841 of which the two first ones are actually using effectively the plan_hash_value 726354567. According to their elapsed time (7192) these executions did span multiple snapshots (this is a serial execution plan by the way). But the remaining lines, those showing a SQL Profile, are in fact superfluous and confusing. They have been captured only because they correspond to the presence of a second shareable child cursor (plan_hash_value 4211030025) in gv$sql at the moment of the capture (snapshot 30841). One of the indication that they are superfluous is the zeros statistics in their average logical I/O, physical I/O, executions and number of rows processed.
Now that the scene has been set we need a reproducible example. For that purpose any SQL statement having multiple child cursors in gv$sql will do the trick. An easy way of engineering such a case is to use Adaptive Cursor Sharing (in 12.1.0.2.0) among other different scenarios:
The model
INSERT INTO t1 SELECT level n1 , CASE WHEN level = 1 THEN 1 WHEN level > 1 AND level <= 101 THEN 100 WHEN level > 101 AND level <= 1101 THEN 1000 WHEN level > 10001 AND level <= 11000 THEN 10000 ELSE 1000000 END n2 FROM dual CONNECT BY level < 1200150; CREATE INDEX t1_i1 ON t1(n2); BEGIN dbms_stats.gather_table_stats (user ,'t1' ,method_opt => 'for all columns size auto' ,cascade => true ,estimate_percent => dbms_stats.auto_sample_size ); END; /
Generating multiple child cursors using ACS
var ln2 number; exec :ln2 := 100; SQL> select count(1) FROm t1 WHERE n2 <= :ln2; COUNT(1) ---------- 101 exec :ln2 := 1000000 SQL> select count(1) FROm t1 WHERE n2 <= :ln2; COUNT(1) ---------- 1200149 SQL_ID 9sp9wvczrvpty, child number 0 ------------------------------------- select count(1) FROm t1 WHERE n2 <= :ln2 Plan hash value: 2432955788 --------------------------------------------------- | Id | Operation | Name | Rows | Bytes | --------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 1 | SORT AGGREGATE | | 1 | 3 | |* 2 | INDEX RANGE SCAN| T1_I1 | 218 | 654 | --------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("N2"<=:LN2)
In order for the above sql_id to be bind aware and get a new optimal execution plan at each execution, it needs to be executed once again (more details can be found here):
SQL> select count(1) FROm t1 WHERE n2 <= :ln2; COUNT(1) ---------- 1200149 SQL_ID 9sp9wvczrvpty, child number 1 ------------------------------------- select count(1) FROm t1 WHERE n2 <= :ln2 Plan hash value: 3724264953 ---------------------------------------------------- | Id | Operation | Name | Rows | Bytes | ---------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 1 | SORT AGGREGATE | | 1 | 3 | |* 2 | TABLE ACCESS FULL| T1 | 1191K| 3489K| ---------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("N2"<=:LN2)
Notice now that a new child cursor n°1 has been generated while the existing child cursor n° 0 switched to a non-shareable status. In order to wake it up, we need to run again the same query at the corresponding bind variable value:
exec :ln2 := 100; select count(1) FROm t1 WHERE n2 <= :ln2; SQL_ID 9sp9wvczrvpty, child number 2 ------------------------------------- select count(1) FROm t1 WHERE n2 <= :ln2 Plan hash value: 2432955788 --------------------------------------------------- | Id | Operation | Name | Rows | Bytes | --------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 1 | SORT AGGREGATE | | 1 | 3 | |* 2 | INDEX RANGE SCAN| T1_I1 | 218 | 654 | --------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 – access("N2"<=:LN2)
Finally, we have achieved the initial goal of having multiple execution plans in gv$sql for the above SQL statement:
SELECT sql_id , child_number , first_load_time , last_load_time , plan_hash_value , executions execs , rows_processed , trunc(buffer_gets/executions) lio , is_shareable FROM gv$sql WHERE sql_id = '9sp9wvczrvpty' --and is_shareable = 'Y' order by last_load_time desc; SQL_ID CHILD_NUMBER FIRST_LOAD_TIME LAST_LOAD_TIME PLAN_HASH_VALUE EXECS ROWS_PROCESSED LIO I ------------- ------------ ------------------- ------------------- --------------- ------ -------------- ---------- - 9sp9wvczrvpty 0 2016-09-28/07:19:21 2016-09-28/07:19:21 2432955788 2 2 1225 N 9sp9wvczrvpty 1 2016-09-28/07:19:21 2016-09-28/07:20:44 3724264953 1 1 2222 Y 9sp9wvczrvpty 2 2016-09-28/07:19:21 2016-09-28/07:22:03 2432955788 1 1 3 Y
Notice that the sql_id (9sp9wvczrvpty) has three child cursors of which child cursor 0 is not shareable anymore. There is still nothing sent into AWR table as shown below:
SQL> @histstats Enter value for sql_id: 9sp9wvczrvpty Enter value for date: 27092016 no rows selected
Forcing AWR to dump SQL statistics
In order to make Oracle dumping statistics of this SQL statement we are going to force a manual AWR snapshot capture by means of the following package call:
SQL> exec dbms_workload_repository.create_snapshot;
The call will create a snapshot and dump the above content of gv$sql into dba_hist_sqlstat table as shown below:
SQL> @histstats Enter value for sql_id: 9sp9wvczrvpty Enter value for date: 27092016 SNAP_ID SQL_ID PLAN_HASH_VALUE SQL_PROFILE EXECS AVG_ETIME AVG_LIO AVG_PIO AVG_ROWS ---------- ------------- --------------- ------------------------------ ------ ---------- ---------- -------- --------- 4305 9sp9wvczrvpty 2432955788 3 0 817 0 1 4305 9sp9wvczrvpty 3724264953 1 0 2222 0 1
That what was clearly expected: Oracle has captured into dba_hist_sqlstat table 4 executions at plan_hash_value 2432955788 and 1 execution at plan_hash_value 3724264953.
Now that we have multiple child cursors present in gv$sql, if AWR is to dump the sql_id it will then dump all its child cursor even those not used during the captured time. For example if I run again the same query with bind variable (ln2= 1000000) Oracle will use the plan with plan_hash_value 3724264953; but it will, nevertheless, dump statistics of the other non-used plan_hash_value into dba_hist_sqlstat as well.
exec :ln2 := 1000000 select count(1) FROm t1 WHERE n2 <= :ln2; exec dbms_workload_repository.create_snapshot; SQL> @histstats Enter value for sql_id: 9sp9wvczrvpty Enter value for date: 27092016 SNAP_ID SQL_ID PLAN_HASH_VALUE SQL_PROFILE EXECS AVG_ETIME AVG_LIO AVG_PIO AVG_ROWS ----- ------------- --------------- ------------- ------ ---------- ---------- -------- --------- 4305 9sp9wvczrvpty 2432955788 3 0 817 0 1 4305 9sp9wvczrvpty 3724264953 1 0 2222 0 1 4306 9sp9wvczrvpty 2432955788 0 0 0 0 0 4306 9sp9wvczrvpty 3724264953 1 0 2222 0 1
That’s it. Notice how, as expected, Oracle has dumped two rows for the new snapshot 4306 of which the first one with plan_hash_value (2432955788) is superfluous and has been captured in AWR tables only because it was present in gv$sql at the snapshot capture time.
Now that we know that Oracle is dumping all plan_hash_value present in gv$sql at the snapshot capture time even those not used during this snapshot, instead of having the superfluous lines appearing in the historical execution statistics I have added a minor where clause to the above extensively used script to obtain this:
SELECT * FROM (SELECT sn.snap_id ,plan_hash_value ,sql_profile ,executions_delta execs ,trunc(elapsed_time_delta/1e6/decode(executions_delta, 0, 1, executions_delta)) avg_etime ,round(disk_reads_delta/decode(executions_delta,0,1, executions_delta),1) avg_pio ,round(buffer_gets_delta/decode(executions_delta,0,1, executions_delta), 1) avg_lio ,round(px_servers_execs_delta/decode(executions_delta,0,1, executions_delta), 1) avg_px ,round(rows_processed_delta/decode(executions_delta,0, 1, executions_delta), 1) avg_rows FROM dba_hist_sqlstat st, dba_hist_snapshot sn WHERE st.snap_id = sn.snap_id AND sql_id = '&sql_id' AND begin_interval_time > to_date('&from_date','ddmmyyyy') ) WHERE avg_lio != 0 – added clause ORDER by 1 asc;
SQL> @histstats2 Enter value for sql_id: 9sp9wvczrvpty Enter value for from_date: 27092016 SNAP_ID PLAN_HASH_VALUE SQL_PROFILE EXECS AVG_ETIME AVG_PIO AVG_LIO AVG_PX AVG_ROWS ------- --------------- ----------- ------ ---------- -------- ---------- ---------- --------- 4305 2432955788 3 0 0 817.7 0 1 4305 3724264953 1 0 0 2222 0 1 4306 3724264953 1 0 0 2222 0 1
The above added where clause excludes SQL statements with 0 logical I/O.
But wait a moment!!!
This might, at the same time, exclude vital information from the historical statistics. For example the following situation will not consume any logical I/O but can spent a lot time locked waiting for a resource to be released:
SQL-1 > select * from t1 where rownum <= 3 for update; N1 N2 ---------- ---------- 1825 1000000 1826 1000000 1827 1000000 SQL-2> lock table t1 in exclusive mode;
The second session will obviously hang. Wait a couple of seconds, go back to session , issue a rollback and watch out the new situation of session n°2
SQL-1> rollback; SQL-2> lock table t1 in exclusive mode; Table(s) Locked.
All what remains to do is to take a manual snapshot and check out how many logical I/O have been consumed by the SQL statement which has tried to lock the table in exclusive mode:
SQL> exec dbms_workload_repository.create_snapshot; select sql_id from gv$sql where sql_text like '%lock table t1%' and sql_text not like '%v$sql%'; SQL_ID ------------- a9nb52un8wqf4 SQL> @histstats Enter value for sql_id: a9nb52un8wqf4 Enter value for date: 27092016 SNAP_ID SQL_ID PLAN_HASH_VALUE SQL_PROFILE EXECS AVG_ETIME AVG_LIO AVG_PIO AVG_ROWS ------- ------------- --------------- ----------- ------ ---------- ---------- -------- --------- 4308 a9nb52un8wqf4 0 1 159 0 0 0
That’s how a SQL can take a long time to complete while consuming 0 Logical I/O. This is why I have added a second extra where clause to the histstats2 script as shown below:
SELECT * FROM (SELECT sn.snap_id , plan_hash_value , executions_delta execs , TRUNC(elapsed_time_delta/1e6/DECODE(executions_delta, 0, 1, executions_delta)) avg_etime , ROUND(disk_reads_delta /DECODE(executions_delta,0,1, executions_delta),1) avg_pio , ROUND(buffer_gets_delta /DECODE(executions_delta,0,1, executions_delta), 1) avg_lio , ROUND(px_servers_execs_delta/DECODE(executions_delta,0,1, executions_delta), 1) avg_px , ROUND(rows_processed_delta /DECODE(executions_delta,0, 1, executions_delta), 1) avg_rows FROM dba_hist_sqlstat st, dba_hist_snapshot sn WHERE st.snap_id = sn.snap_id AND sql_id = '&sql_id' AND begin_interval_time > to_date('&from_date','ddmmyyyy') ) WHERE avg_lio != 0 OR (avg_lio =0 AND avg_etime > 0) ORDER by 1 asc;
This new script (histats3.sql), when applied to the two sql_id we have investigated in this post gives respectively:
SQL> @HistStats3 Enter value for sql_id: 9sp9wvczrvpty Enter value for from_date: 27092016 SNAP_ID PLAN_HASH_VALUE EXECS AVG_ETIME AVG_PIO AVG_LIO AVG_PX AVG_ROWS ---------- --------------- ------ ---------- -------- ---------- ---------- --------- 4305 2432955788 3 0 0 817.7 0 1 4305 3724264953 1 0 0 2222 0 1 4306 3724264953 1 0 0 2222 0 1 SQL> @HistStats3 Enter value for sql_id: a9nb52un8wqf4 Enter value for from_date: 27092016 SNAP_ID PLAN_HASH_VALUE EXECS AVG_ETIME AVG_PIO AVG_LIO AVG_PX AVG_ROWS ---------- --------------- ------ ---------- -------- ---------- ---------- --------- 4308 0 1 159 0 0 0 0
It goes without saying that I will be very grateful to have feedback about the added where clause. I might have neglected a situation in which both Logical I/O and elapsed time are not greater than 0 and that are important to appear in the historical statistics.
Last but not least you might have already pointed out the apparition of the AVG_PX column in the above output? That will concern the next article about ASH-AWR statistics for parallel queries. Stay tuned I have a nice example from a running system to share with you and where this column reveals to be very helpful. I may also show a version of the script which includes the end_of_fetch_count column which reveals also to be a valuable information to exploit when trying to understand average elapsed time of queries coming from a web-service and where a time out has been implemented.
