Quantcast
Channel: Mohamed Houri’s Oracle Notes
Viewing all articles
Browse latest Browse all 224

Disk file operations I/O

$
0
0

When Disk file operations I/O wait event starts showing at the top of the AWR Top 10 Foreground Events by Total Wait Time it can consume a lot of CPU and freeze your database. Since I have gone through that myself a couple of weeks ago, I thought it might be interesting to write a succinct blog article explaining how to find and fix the root cause of this non-common wait event.

When I started investigating the real-life case all that I have on my hand was the p1 and p3 parameters of this wait event:

  •  p1=8 corresponds to wait for miscellaneous io (ftp, block dump, passwd file)
  •  p3=8 corresponds to any other type of file

To say the least I have been unfortunate with this number.

As I was searching online, I came across a forum question where one of the contributors was pointing to an audit file type and operation. And from there I built the following simple model

SQL> show parameter audit

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
audit_file_dest                      string      /u01/app/oracle/product/19c/dbhome_1/rdbms/audit                                                
audit_sys_operations                 boolean     TRUE
audit_syslog_level                   string
audit_trail                          string      OS
unified_audit_common_systemlog       string
unified_audit_sga_queue_size         integer     1048576
unified_audit_systemlog              string
AUDIT ALL BY c##test BY ACCESS;
AUDIT SELECT TABLE, UPDATE TABLE, INSERT TABLE, DELETE TABLE BY c##test BY ACCESS;
AUDIT EXECUTE PROCEDURE BY c##test BY ACCESS;
create table t1 as select rownum n1, mod(rownum,2) n2, lpad('x',10) vc from dual connect by level <=1e5;

create or replace procedure p1
as
   cursor c1 is
 	select *
    from t1;
begin
  for r1 in c1
  loop
    update t1 set vc = 'yz' where n1 = r1.n1;
    insert into t1 values (r1.n1, r1.n2, r1.vc);
    delete from t1 where n1=r1.n1 and n2=r1.n2 and vc =r1.vc;
   end loop;
commit;
end p1;

I’ve set the audit trail parameter at OS level and I managed to audit several DML operations done by the c##test user.

All that I have to do now to reproduce the Disk file operations I/O wait event on the p1 and p3 values set to 8 is to execute the p1 procedure as shown below

  SQL> exec p1();

While the session executing the p1 procedure (sid=205) was running I snapped it to see what it was doing behind the scene

SQL> @snapper all 5 1 205
Sampling SID 205 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.30 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) 
- Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! 

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                            ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    205, C##TEST   , STAT, opened cursors cumulative            ,          1859,     188.24,         ,             ,          ,           ,       1.01 per execution
    205, C##TEST   , STAT, recursive calls                      ,          1866,     188.95,         ,             ,          ,           ,     1.28ms recursive CPU per recursive call
    205, C##TEST   , STAT, recursive cpu usage                  ,           238,       24.1,         ,             ,          ,           ,      2.38s total recursive CPU usage
    205, C##TEST   , STAT, session logical reads                ,        413327,     41.85k,         ,             ,          ,           ,    414.55k total buffer visits
    205, C##TEST   , STAT, CPU used by this session             ,           241,       24.4,         ,             ,          ,           ,      2.41s total CPU used by this session
    205, C##TEST   , STAT, user I/O wait time                   ,             6,        .61,         ,             ,          ,           ,          0 per execution
    205, C##TEST   , STAT, non-idle wait time                   ,             6,        .61,         ,             ,          ,           ,          0 per execution
    205, C##TEST   , STAT, non-idle wait count                  ,          1858,     188.14,         ,             ,          ,           ,       1.01 per execution
    205, C##TEST   , STAT, physical read total IO requests      ,             1,         .1,         ,             ,          ,           ,     16.38k bytes per request
    205, C##TEST   , STAT, physical read total bytes            ,         16384,      1.66k,         ,             ,          ,           ,       8.94 per execution
    205, C##TEST   , STAT, cell physical IO interconnect bytes  ,         16384,      1.66k,         ,             ,          ,           ,       8.94 per execution
    205, C##TEST   , STAT, db block gets                        ,          1911,      193.5,         ,             ,          ,           ,       1.04 per execution
    205, C##TEST   , STAT, db block gets from cache             ,          1911,      193.5,         ,             ,          ,           ,       1.04 per execution
    205, C##TEST   , STAT, db block gets from cache (fastpath)  ,           644,      65.21,         ,             ,          ,           ,        .35 per execution
    205, C##TEST   , STAT, consistent gets                      ,        412820,      41.8k,         ,             ,          ,           ,     225.34 per execution
    205, C##TEST   , STAT, consistent gets from cache           ,        412811,      41.8k,         ,             ,          ,           ,     225.33 per execution
    205, C##TEST   , STAT, consistent gets pin                  ,        412811,      41.8k,         ,             ,          ,           ,     225.33 per execution
    205, C##TEST   , STAT, consistent gets pin (fastpath)       ,        412811,      41.8k,         ,             ,          ,           ,     225.33 per execution
    205, C##TEST   , STAT, consistent gets direct               ,             9,        .91,         ,             ,          ,           ,          0 per execution
    205, C##TEST   , STAT, logical read bytes from cache        ,    3397402624,    344.01M,         ,             ,          ,           ,      1.85M per execution
    205, C##TEST   , STAT, physical reads                       ,             2,         .2,         ,             ,          ,           ,          0 per execution
    205, C##TEST   , STAT, physical reads direct                ,             2,         .2,         ,             ,          ,           ,          0 per execution
    205, C##TEST   , STAT, physical read IO requests            ,             1,         .1,         ,             ,          ,           ,     16.38k bytes per request
    205, C##TEST   , STAT, physical read bytes                  ,         16384,      1.66k,         ,             ,          ,           ,       8.94 per execution
    205, C##TEST   , STAT, db block changes                     ,          3749,     379.61,         ,             ,          ,           ,       2.05 per execution
    205, C##TEST   , STAT, change write time                    ,            49,       4.96,         ,             ,          ,           ,        .03 per execution
    205, C##TEST   , STAT, free buffer requested                ,          1265,     128.09,         ,             ,          ,           ,        .69 per execution
    205, C##TEST   , STAT, dirty buffers inspected              ,            10,       1.01,         ,             ,          ,           ,        .01 per execution
    205, C##TEST   , STAT, pinned buffers inspected             ,             1,         .1,         ,             ,          ,           ,  .55*10^-3 per execution
    205, C##TEST   , STAT, hot buffers moved to head of LRU     ,            65,       6.58,         ,             ,          ,           ,        .04 per execution
    205, C##TEST   , STAT, free buffer inspected                ,           123,      12.45,         ,             ,          ,           ,        .07 per execution
    205, C##TEST   , STAT, switch current to new buffer         ,          1240,     125.56,         ,             ,          ,           ,        .68 per execution
    205, C##TEST   , STAT, switch current caused by our pin     ,          1240,     125.56,         ,             ,          ,           ,        .68 per execution
    205, C##TEST   , STAT, shared hash latch upgrades - no wait ,           726,      73.51,         ,             ,          ,           ,         .4 per execution
    205, C##TEST   , STAT, calls to kcmgcs                      ,          3714,     376.07,         ,             ,          ,           ,       2.03 per execution
    205, C##TEST   , STAT, calls to kcmgas                      ,             4,        .41,         ,             ,          ,           ,          0 per execution
    205, C##TEST   , STAT, calls to get snapshot scn: kcmgss    ,          1855,     187.83,         ,             ,          ,           ,       1.01 per execution
    205, C##TEST   , STAT, redo entries                         ,          1856,     187.93,         ,             ,          ,           ,       1.01 per execution
    205, C##TEST   , STAT, redo size                            ,        521376,     52.79k,         ,             ,          ,           ,          ~ bytes per user commit
    205, C##TEST   , STAT, redo ordering marks                  ,             4,        .41,         ,             ,          ,           ,          0 per execution
    205, C##TEST   , STAT, file io wait time                    ,         22467,      2.27k,         ,             ,          ,           ,    22.47ms bad guess of IO wait time per IO request
    205, C##TEST   , STAT, Number of read IOs issued            ,             1,         .1,         ,             ,          ,           ,  .55*10^-3 per execution
    205, C##TEST   , STAT, undo change vector size              ,        182320,     18.46k,         ,             ,          ,           ,      99.52 per execution
    205, C##TEST   , STAT, no work - consistent read gets       ,        406064,     41.12k,         ,             ,          ,           ,     221.65 per execution
    205, C##TEST   , STAT, table scans (long tables)            ,          1231,     124.65,         ,             ,          ,           ,        .67 per execution
    205, C##TEST   , STAT, table scan rows gotten               ,     123963294,     12.55M,         ,             ,          ,           ,     67.67k per execution
    205, C##TEST   , STAT, table scan disk non-IMC rows gotten  ,     123898306,     12.55M,         ,             ,          ,           ,     67.63k per execution
    205, C##TEST   , STAT, table scan blocks gotten             ,        404643,     40.97k,         ,             ,          ,           ,     220.88 per execution
    205, C##TEST   , STAT, heap block compress                  ,             6,        .61,         ,             ,          ,           ,          0 per execution
    205, C##TEST   , STAT, HSC Heap Segment Block Changes       ,          1853,     187.63,         ,             ,          ,           ,       1.01 per execution
    205, C##TEST   , STAT, session cursor cache hits            ,          1840,     186.31,         ,             ,          ,           ,      1.84k softparses avoided thanks to cursor cache
    205, C##TEST   , STAT, buffer is pinned count               ,          1222,     123.74,         ,             ,          ,           ,        .29 % buffer gets avoided thanks to buffer pin caching
    205, C##TEST   , STAT, buffer is not pinned count           ,          2446,     247.68,         ,             ,          ,           ,       1.34 per execution
    205, C##TEST   , STAT, execute count                        ,          1832,      185.5,         ,             ,          ,           ,          ~ executions per parse
    205, C##TEST   , TIME, PL/SQL execution elapsed time        ,         11791,     1.19ms,      .1%, [          ],          ,           ,
    205, C##TEST   , TIME, DB CPU                               ,       2460784,   249.17ms,    24.9%, [@@@       ],          ,           ,
    205, C##TEST   , TIME, sql execute elapsed time             ,       8059797,   816.12ms,    81.6%, [######### ],          ,           ,
    205, C##TEST   , TIME, DB time                              ,       8059797,   816.12ms,    81.6%, [######### ],          ,           ,      18.39 % unaccounted time
    205, C##TEST   , WAIT, Disk file operations I/O             ,         46307,     4.69ms,      .5%, [          ],      1828,      185.1,    25.33us average wait ----------> here
    205, C##TEST   , WAIT, ASM IO for non-blocking poll         ,             8,      .81us,      .0%, [          ],         1,         .1,        8us average wait

--  End of Stats snap 1, end=2020-12-20 09:02:36, seconds=9.9

------------------------------------------------------------------------------------------
  ActSes   %Thread | INST | SQL_ID          | SQL_CHILD | EVENT          | WAIT_CLASS
------------------------------------------------------------------------------------------
     .48     (48%) |    1 | fnbt9ts7bsqzn   | 0         | ON CPU         | ON CPU
     .48     (48%) |    1 | 7wv0zz7965u54   | 0         | ON CPU         | ON CPU
     .03      (3%) |    1 | 1nf0v48kt3a30   | 0         | ON CPU         | ON CPU
     .03      (3%) |    1 | 8myfvr1fb8gju   | 0         | ON CPU         | ON CPU

--  End of ASH snap 1, end=2020-12-20 09:02:36, seconds=5, samples_taken=40, AAS=1

SQL> @snapper ash 5 1 205
Sampling SID 205 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.30 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) 
- Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet!
----------------------------------------------------------------------------------------------------
  ActSes   %Thread | INST | SQL_ID          | SQL_CHILD | EVENT                    | WAIT_CLASS
----------------------------------------------------------------------------------------------------
     .46     (46%) |    1 | fnbt9ts7bsqzn   | 0         | ON CPU                   | ON CPU
     .46     (46%) |    1 | 7wv0zz7965u54   | 0         | ON CPU                   | ON CPU
     .03      (3%) |    1 | 1nf0v48kt3a30   | 0         | ON CPU                   | ON CPU
     .03      (3%) |    1 | fnbt9ts7bsqzn   | 0         | Disk file operations I/O | User I/O ----------> here
     .03      (3%) |    1 | 8myfvr1fb8gju   | 0         | ON CPU                   | ON CPU

As you can see there is a certain proportion of time spent waiting on Disk file operations I/O. And, naturally, the p1 and p3 values, in this case, are respectively 8 and 8 as the followings prove:

SQL>  @ashtopSID
Enter value for sid: 205
Enter value for from: 12-20 09:00:00
Enter value for to: 12-20 09:15:00

  Total                                                                                                                                                                        
  Seconds     AAS %This   SQL_ID        SQL_OPNAME      TOP_LEVEL_SQL EVENT2                     P1TEXT           P1 P3TEXT        P3 
--------- ------- ------- ------------- --------------- ------------- -------------------------- --------------- --- ----------- ---- 
      309      .3   45% | 7wv0zz7965u54 DELETE          8myfvr1fb8gju ON CPU                     FileOperation     8 filetype       8 
      306      .3   44% | fnbt9ts7bsqzn UPDATE          8myfvr1fb8gju ON CPU                     FileOperation     8 filetype       8 
       24      .0    3% | 1nf0v48kt3a30 INSERT          8myfvr1fb8gju ON CPU                     FileOperation     8 filetype       8 
       13      .0    2% | 7wv0zz7965u54 DELETE          8myfvr1fb8gju db file sequential read    file#             1 blocks         1 
        8      .0    1% | fnbt9ts7bsqzn UPDATE          8myfvr1fb8gju db file sequential read    file#             1 blocks         1 
        7      .0    1% | 7wv0zz7965u54 DELETE          8myfvr1fb8gju Disk file operations I/O   FileOperation     8 filetype       8 ----------> here
        6      .0    1% | fnbt9ts7bsqzn UPDATE          8myfvr1fb8gju Disk file operations I/O   FileOperation     8 filetype       8 ----------> here
        5      .0    1% | 1nf0v48kt3a30 INSERT          8myfvr1fb8gju Disk file operations I/O   FileOperation     8 filetype       8 ----------> here
        3      .0    0% | 7wv0zz7965u54 DELETE          8myfvr1fb8gju db file scattered read     file#             1 blocks         3 
        3      .0    0% |                               8myfvr1fb8gju ON CPU                     FileOperation     8 filetype       8 
        2      .0    0% | 0jjc60pmrntdv SELECT          8myfvr1fb8gju ON CPU                     FileOperation     8 filetype       8 
        1      .0    0% | 1nf0v48kt3a30 INSERT          8myfvr1fb8gju ON CPU                     file#             1 blocks         1 
        1      .0    0% | 1nf0v48kt3a30 INSERT          8myfvr1fb8gju db file sequential read    file#             5 blocks         1 
        1      .0    0% | 7wv0zz7965u54 DELETE          8myfvr1fb8gju resmgr:cpu quantum         location          3                0 
        1      .0    0% | 8myfvr1fb8gju PL/SQL EXECUTE  8myfvr1fb8gju ON CPU                     FileOperation     8 filetype       8

Interestingly ASH shows that the CPU consumed by Oracle is also related to the manipulation of files of type 8. I don’t know really how to correctly interpret these p1 and p3 values in the context of CPU


select decode(event,null, 'on cpu', event), p1,p3, count(1)
    from dba_hist_active_sess_history
    where    sample_time between to_date('20122020 09:00:00', 'ddmmyyyy hh24:mi:ss')
                and     to_date('20122020 09:15:00', 'ddmmyyyy hh24:mi:ss')
group by decode(event,null, 'on cpu', event), p1,p3 
order by 4 desc;

DECODE(EVENT,NULL,'ONCPU',EVENT)          P1         P3   COUNT(1)
--------------------------------  ---------- ---------- ----------
on cpu                                     8          8         64
db file sequential read                    1          1         10
on cpu                                   100          0          6
Disk file operations I/O                   8          8          4
on cpu                                     1          1          4
db file scattered read                     3          3          3
on cpu                                     0          0          3
control file sequential read               0          1          3
on cpu                                     3          1          2
ASM file metadata operation                0          0          2
on cpu                               2490368          0          1
db file parallel write                     1 2147483647          1
db file async I/O submit                 106          0          1
on cpu                                     3          0          1
on cpu                                     2          1          1
on cpu                                131072          0          1
on cpu                                     5          1          1
on cpu                                 65536          0          1
oracle thread bootstrap              5910576          0          1
on cpu                                262144          0          1

Summary

Whenever you see a Disk file operations I/O wait event consuming a lot of time and CPU then get the p1,p2,p3 values from ASH or AWR


SQL> @sed "Disk file operations"
Show wait event descriptions matching %Disk file operations%..

EVENT_NAME                     WAIT_CLASS           PARAMETER1           PARAMETER2           PARAMETER3
------------------------------ -------------------- -------------------- -------------------- --------------------
Disk file operations I/O       User I/O             FileOperation        fileno               filetype

If p1 and p3 values equal to 8 and 8 respectively then check if you are, inadvertently, auditing too many repetitive actions by means of the following select


SQL>  select user_name, audit_option from DBA_STMT_AUDIT_OPTS where user_name='C##TEST';

USER_NAME                 AUDIT_OPTION
------------------------- ----------------------------------------
C##TEST                   ALTER SYSTEM
C##TEST                   SYSTEM AUDIT
C##TEST                   CREATE SESSION
C##TEST                   TABLE
C##TEST                   CLUSTER
C##TEST                   TABLESPACE
C##TEST                   USER
C##TEST                   ROLLBACK SEGMENT
C##TEST                   TYPE
C##TEST                   SYNONYM
C##TEST                   PUBLIC SYNONYM
C##TEST                   VIEW
C##TEST                   SEQUENCE
C##TEST                   DATABASE LINK
C##TEST                   PUBLIC DATABASE LINK
C##TEST                   ROLE
C##TEST                   DIMENSION
C##TEST                   PROCEDURE
C##TEST                   TRIGGER
C##TEST                   PROFILE
C##TEST                   DIRECTORY
C##TEST                   MATERIALIZED VIEW
C##TEST                   MINING MODEL
C##TEST                   INDEX
C##TEST                   SELECT TABLE
C##TEST                   INSERT TABLE
C##TEST                   UPDATE TABLE
C##TEST                   DELETE TABLE
C##TEST                   NOT EXISTS
C##TEST                   SYSTEM GRANT
C##TEST                   EXECUTE PROCEDURE
C##TEST                   CONTEXT
C##TEST                   SQL TRANSLATION PROFILE
C##TEST                   LOCKDOWN PROFILE
C##TEST                   ATTRIBUTE DIMENSION
C##TEST                   HIERARCHY
C##TEST                   ANALYTIC VIEW
C##TEST                   READ DIRECTORY
C##TEST                   WRITE DIRECTORY
C##TEST                   EXECUTE DIRECTORY 

If so then reduce the number of audited actions if you want to reduce the impact of this wait event.
In passing, I think that this wait event would not have shown up at the TOP if I have set the audit trail parameter value to DB. I left this not-demonstrated though.


Viewing all articles
Browse latest Browse all 224