The Problem
Suppose you want to find out the resource consumed by a session. The resources could be redo generation, CPU used, logical I/O, undo records generated – the list is endless. This is required for a lot of things. Consider a case where you want to find out which apps are generating the most redo; you would issue a query like this:
select sid, value from v$sesstat s, v$statname n where n.statistic# = s.statistic# and n.name = 'redo size' /
The value column will show the redo generated. From the SID you can identify the session. Your next stop is v$session to get the other relevant information such as username, module, authentication scheme, etc. Problem solved, right?
Not so fast. Look at the above query; it selects from v$sesstat. When the session is disconnected, the stats disappear, making the entries for that session go from v$sesstat. If you run the query, you will not find these sessions. You have to constantly select from the v$sesstat view to capture the stats of the sessions hoping that you would capture the stats before the session disconnects. But it will be not be guaranteed. Some short sessions will be missed in between collection samples. Even if you are lucky to capture some stats of a short session, the other relevant information from v$session will be gone.
Oracle provides a package dbms_monitor, where a procedure named client_id_stat_enable allows you to enable stats collection on a future session where the client_id matches a specific value, e.g. CLIENT1. Here is an example:
execute dbms_monitor.client_id_stat_enable('CLIENT1');
However there are three issues:
(1) It collects only about 27 stats, out of 400+
(2) It offers only three choices for selecting sessions – client_id, module_name and service_name.
(3) It aggregate them, sums up all stats for a specific client_id. That is pretty much useless without a detailed session level.
So, in short, I didn’t have a readily available solution.
Solution
Well, necessity is the mother of invention. When you can’t find a decent tool; you build it; and so did I. I built this tool to capture the stats. This is version 1 of the tool. It has some limitations, as shown at the end. These limitations do not apply to all situations; so the tool may be useful in a majority of the cases. Later I will expand the tool to overcome these limitations.
Concept
The fundamental problem, as you recall, is not the dearth of data (v$sesstat has plenty); it’s the sessions in the future. To capture those sessions, the tool relies on a post-logon database trigger to capture the values.
The second problem was persistence. V$SESSTAT is a dynamic performance view, which means the records of the session will be gone when the session disappears. So, the tool relies on a table to store the data.
The third problem is the getting the values at the very end of the session. The difference between the values captured at the end and beginning of the session are the stats. To capture the values at the very end; not anytime before, the tool relies on a pre-logoff database trigger.
The fourth challenge is identification of sessions. SID of a session is not unique; it can be reused for a new session; it will definitely be reused when the database is recycled. So, the tool uses a column named CAPTURE_ID, a sequentially incremented number for each capture. Since we capture once at the beginning and then at the end, I must use the same capture_id. I use a package variable to store that capture_Id.
Finally, the tool allows you to enable stats collections based on some session attributes such as username, client_id, module, service_name, etc. For instance you may want to enable stats for any session where the username = ‘SCOTT’ or where the os_user is ‘ananda’, etc. These preferences are stored in a table reserved for that purpose.
Construction
Now that you understand how the tool is structured, let me show the actual code and scripts to create the tool.
(1) First, we should create the table that holds the preferences. Let’s call this table RECSTATS_ENABLED. This table holds all the different sessions attributes (ip address, username, module, etc.) that can enable stats collection in a session.
CREATE TABLE SYS.RECSTATS_ENABLED ( SESSION_ATTRIBUTE VARCHAR2(200 BYTE), ATTRIBUTE_VALUE VARCHAR2(2000 BYTE) ) /
If you want to enable stats collection of a session based on a session attribute, insert a record into this table with the session attribute and the value. Here are some examples. I want to collect stats on all sessions where client_info matches ‘MY_CLIENT_INFO1’. You would insert a record like this:
insert into recstats_enabled values ('CLIENT_INFO','MY_CLIENT_INFO1');
Here are some more examples. All sessions where ACTION is ‘MY_ACTION1’:
insert into recstats_enabled values ('ACTION','MY_ACTION1');
Those of user SCOTT:
insert into recstats_enabled values ('SESSION_USER','SCOTT')
Those with service name APP:
insert into recstats_enabled values ('SERVICE_NAME','APP')
You can insert as many preferences as you want. You can even insert multiple values of a specific attribute. For instance, to enable stats on sessions with service names APP1 and APP2, insert two records.
Important: the session attribute names follow the naming convention of the USERENV context used in SYS_CONTEXT function.
(2) Next, we will create a table to hold the statistics
CREATE TABLE SYS.RECSTATS ( CAPTURE_ID NUMBER, CAPTURE_POINT VARCHAR2(10 BYTE), SID NUMBER, SERIAL# NUMBER, ACTION VARCHAR2(2000 BYTE), CLIENT_DENTIFIER VARCHAR2(2000 BYTE), CLIENT_INFO VARCHAR2(2000 BYTE), CURRENT_EDITION_NAME VARCHAR2(2000 BYTE), CURRENT_SCHEMA VARCHAR2(2000 BYTE), CURRENT_USER VARCHAR2(2000 BYTE), DATABASE_ROLE VARCHAR2(2000 BYTE), HOST VARCHAR2(2000 BYTE), IDENTIFICATION_TYPE VARCHAR2(2000 BYTE), IP_ADDRESS VARCHAR2(2000 BYTE), ISDBA VARCHAR2(2000 BYTE), MODULE VARCHAR2(2000 BYTE), OS_USER VARCHAR2(2000 BYTE), SERVICE_NAME VARCHAR2(2000 BYTE), SESSION_USER VARCHAR2(2000 BYTE), TERMINAL VARCHAR2(2000 BYTE), STATISTIC_NAME VARCHAR2(2000 BYTE), STATISTIC_VALUE NUMBER; ) TABLESPACE USERS
Note, I used the tablespace USERS; because I don’t want this table, which can potentially grow to huge size, to overwhelm the system tablespace. The STATISTIC_NAME and STATISTIC_VALUE columns record the stats collected. The other columns record the other relevant data from the sessions. All the attributes here have been shown with VARCHAR2(2000) for simplicity; of course they don’t need that much of space. In the future versions, I will put a more meaningful limit; but 2000 does not hurt as it is varchar2.
The capture point will show when the values were captured – START or END of the session.
(3) We will also need a sequence to identify the sessions. Each session will have 400+ stats; we will have a set at the end and once at the beginning. We could choose SID as an identifier; but SIDs could be reused. So, we need something that is truly unique – a sequence number. This will be recorded in the CAPTURE_ID column in the stats table.
SQL> create sequence seq_recstats;
(4) To store the capture ID when the post-logon trigger is fired, to be used inside the pre-logoff trigger, we must use a variable that would be visible to entire session. A package variable is the best for that.
create or replace package pkg_recstats is g_recstats_id number; end;
(5) Finally, we will go on to the meat of the tool – the triggers. First, the post-logon trigger to capture the stats in the beginning of the session:
CREATE OR REPLACE TRIGGER SYS.tr_post_logon_recstats
after logon on database
declare
 l_stmt varchar2(32000);
 l_attr_val recstats_enabled.attribute_value%TYPE;
 l_capture_point recstats.capture_point%type := 'START';
 l_matched boolean := FALSE;
begin
 pkg_recstats.g_recstats_id := null;
 for r in (
  select session_attribute, attribute_value
  from recstats_enabled
  order by session_attribute
 )
 loop
  exit when l_matched;
            -- we select the userenv; but the null values may cause
            -- problems in matching; so let’s use a value for NVL
            -- that will never be used - !_!_!
  l_stmt := 'select nvl(sys_context(''USERENV'','''||
   r.session_attribute||'''),''!_!_!_!'') from dual';
  execute immediate l_stmt into l_attr_val;
  if l_attr_val = r.attribute_value then
   -- match; we should record the stats
   -- and exit the loop; since stats should
   -- be recorded only for one match.
   l_matched := TRUE;
   select seq_recstats.nextval
   into pkg_recstats.g_recstats_id
   from dual;
   insert into recstats
   select
    pkg_recstats.g_recstats_id,
    l_capture_point,
    sys_context('USERENV','SID'),
    null,
    sys_context('USERENV','ACTION'),
    sys_context('USERENV','CLIENT_IDENTIFIER'),
    sys_context('USERENV','CLIENT_INFO'),
    sys_context('USERENV','CURRENT_EDITION_NAME'),
    sys_context('USERENV','CURRENT_SCHEMA'),
    sys_context('USERENV','CURRENT_USER'),
    sys_context('USERENV','DATABASE_ROLE'),
    sys_context('USERENV','HOST'),
    sys_context('USERENV','IDENTIFICATION_TYPE'),
    sys_context('USERENV','IP_ADDRESS'),
    sys_context('USERENV','ISDBA'),
    sys_context('USERENV','MODULE'),
    sys_context('USERENV','OS_USER'),
    sys_context('USERENV','SERVICE_NAME'),
    sys_context('USERENV','SESSION_USER'),
    sys_context('USERENV','TERMINAL'),
    n.name,
    s.value
   from v$mystat s, v$statname n
   where s.statistic# = n.statistic#;
  end if;
 end loop;
end;
The code is self explanatory. I have provided more explanation as comments where needed.
(6) Next, the pre-logoff trigger to capture the stats at the end of the session:
CREATE OR REPLACE TRIGGER SYS.tr_pre_logoff_recstats
before logoff on database
declare
 l_capture_point recstats.capture_point%type := 'END';
begin
 if (pkg_recstats.g_recstats_id is not null) then
  insert into recstats
  select
   pkg_recstats.g_recstats_id,
   l_capture_point,
   sys_context('USERENV','SID'),
   null,
   sys_context('USERENV','ACTION'),
   sys_context('USERENV','CLIENT_IDENTIFIER'),
   sys_context('USERENV','CLIENT_INFO'),
   sys_context('USERENV','CURRENT_EDITION_NAME'),
   sys_context('USERENV','CURRENT_SCHEMA'),
   sys_context('USERENV','CURRENT_USER'),
   sys_context('USERENV','DATABASE_ROLE'),
   sys_context('USERENV','HOST'),
   sys_context('USERENV','IDENTIFICATION_TYPE'),
   sys_context('USERENV','IP_ADDRESS'),
   sys_context('USERENV','ISDBA'),
   sys_context('USERENV','MODULE'),
   sys_context('USERENV','OS_USER'),
   sys_context('USERENV','SERVICE_NAME'),
   sys_context('USERENV','SESSION_USER'),
   sys_context('USERENV','TERMINAL'),
   n.name,
   s.value
  from v$mystat s, v$statname n
  where s.statistic# = n.statistic#;
  commit;
 end if;
end;
/
Again the code is self explanatory. We capture the stats only of the global capture ID has been set by the post-logoff trigger. If we didn’t do that all the sessions would have started recording stats at their completion.
Execution
Now that the setup is complete, let’s perform a test by connecting as a user with the service name APP:
SQL> connect arup/arup@app
In this session, perform some actions that will generate a lot of activity. The following SQL will do nicely:
SQL> create table t as select * from all_objects;
SQL> exit
Now check the RECSTATS table to see the stats on this catured_id, which happens to be 1330.
col name format a60 col value format 999,999,999 select a.statistic_name name, b.statistic_value - a.statistic_value value from recstats a, recstats b where a.capture_id = 1330 and a.capture_id = b.capture_id and a.statistic_name = b.statistic_name and a.capture_point = 'START' and b.capture_point = 'END' and (b.statistic_value - a.statistic_value) != 0 order by 2 /
Here is the output:
NAME VALUE ------------------------------------------------------------ ------------ workarea memory allocated -2 change write time 1 parse time cpu 1 table scans (long tables) 1 cursor authentications 1 sorts (memory) 1 user commits 2 opened cursors current 2 IMU Flushes 2 index scans kdiixs1 2 parse count (hard) 2 workarea executions - optimal 2 redo synch writes 2 redo synch time 3 rows fetched via callback 5 table fetch by rowid 5 parse time elapsed 5 recursive cpu usage 8 switch current to new buffer 10 cluster key scan block gets 10 cluster key scans 10 deferred (CURRENT) block cleanout applications 10 Heap Segment Array Updates 10 table scans (short tables) 12 messages sent 13 index fetch by key 15 physical read total multi block requests 15 SQL*Net roundtrips to/from client 18 session cursor cache hits 19 session cursor cache count 19 user calls 25 CPU used by this session 28 CPU used when call started 29 buffer is not pinned count 33 execute count 34 parse count (total) 35 opened cursors cumulative 36 physical read total IO requests 39 physical read IO requests 39 calls to get snapshot scn: kcmgss 45 non-idle wait count 67 user I/O wait time 116 non-idle wait time 120 redo ordering marks 120 calls to kcmgas 143 enqueue releases 144 enqueue requests 144 DB time 149 hot buffers moved to head of LRU 270 recursive calls 349 active txn count during cleanout 842 cleanout - number of ktugct calls 842 consistent gets - examination 879 IMU undo allocation size 968 physical reads cache prefetch 997 physical reads 1,036 physical reads cache 1,036 table scan blocks gotten 1,048 commit cleanouts 1,048 commit cleanouts successfully completed 1,048 no work - consistent read gets 1,060 redo subscn max counts 1,124 Heap Segment Array Inserts 1,905 calls to kcmgcs 2,149 consistent gets from cache (fastpath) 2,153 free buffer requested 2,182 free buffer inspected 2,244 HSC Heap Segment Block Changes 2,519 db block gets from cache (fastpath) 2,522 consistent gets 3,067 consistent gets from cache 3,067 bytes received via SQL*Net from client 3,284 bytes sent via SQL*Net to client 5,589 redo entries 6,448 db block changes 9,150 db block gets 10,194 db block gets from cache 10,194 session logical reads 13,261 IMU Redo allocation size 16,076 table scan rows gotten 72,291 session uga memory 88,264 session pga memory 131,072 session uga memory max 168,956 undo change vector size 318,640 session pga memory max 589,824 physical read total bytes 8,486,912 cell physical IO interconnect bytes 8,486,912 physical read bytes 8,486,912 redo size 8,677,104
This clearly shows you all the stats of that session. Of course the table recorded all other details of the session as well – such as username, client_id, etc., which are useful later for more detailed analysis. You can perform aggregations as well now. Here is an example of the stats collected for redo size:
select session_user, sum(STATISTIC_VALUE) STVAL from recstats where STATISTIC_NAME = 'redo size' group by session_user / Output: SESSION_USER STVAL ------------ --------- ARUP 278616 APEX 4589343 … and so on …
You can disassemble the aggregates to several attributes as well. Here is an example where you want to find out the redo generated from different users coming from different client machines
select session_user, host, sum(STATISTIC_VALUE) stval from recstats where STATISTIC_NAME = 'redo size' group by session_user, host / Output: SESSION_USER HOST STVAL ------------ ----------- ------- ARUP oradba2 12356 ARUP oradba1 264567 APEX oradba2 34567 … and so on …
Granularity like this shows you how the application from different client servers helped; not just usernames.
Limitations
As I mentioned, there are some limitations you should be aware of. I will address them in the next iterations of the tool. These are not serious and applicable in only certain cases. As long as you don’t encounter that case, you should be fine.
(1) The logoff trigger does not fire when the user exits from the session ungracefully, such as closing down the SQL*Plus window, or closing the program before exiting. In such cases the stats at the end of the session will not be recorded. In most application infrastructure it does not happen; but it could happen for adhoc user sessions such as people connecting through TOAD.
(2) The session attributes such as module, client_id and action can be altered within the session. If that is the case, this tool does not record that fact since there is no triggering event. The logoff trigger records the module, action and client_id set at that time. These attributes are not usually changed in application code; so it may not apply to your case.
(3) Parallel Query sessions will have a special issue since there will be no logoff trigger. So in case of parallel queries, you will not see any differential stats. If you don’t use PQ, as most OLTP applications do, you will not be affected.
(4) If the session just sits there without disconnecting, the logoff trigger will never fire and the stats will never be captured. Of course, it will be eventually captured when the session exits.
Once again, these limitations apply only to certain occasions. As long as you are aware of these caveats, you will be able to use this tool to profile many of your applications.
Happy Profiling!
Hi Arup,
ReplyDeleteThanks for the tool , could we normalize the recstats table to two tables so that the environment variables need not be stored for each stat value.
Thanks,
Balaji
Why do you capture the stats at logon? All the accumulators (e.g., "redo size") in v$sesstat start from 0 anyway; diffing the absolute values (e.g., "workarea memory allocated") makes no sense as they can go up and down during the life of the session.
ReplyDeleteOne capture just before logoff (perhaps capturing v$session_event as well) should be all that's needed.
Cheers,
Flado
@Balaji - yes, they can be normalized into two tables. I didn't deliberately do it because the repeating elements are too few (about 15); and keeping them in the same table helps doing some serious filtering, group by operations. Normalizing into two tables means always joining, requiring an index and so on.
ReplyDelete@Flado - yes, the accumilators wil lbe all zero at the very beginning of the session. What if there are some special recursive process that happens before the session gets completely established, such as VPD, application contexts, etc; or some other trigger that establishes some other variables? These will acculmulate the values. In normal application profiling you *may* want to ignore these "noises". That's where the post-login trigger comes handy.
ReplyDeleteIf you would rather not ignore these recursive processes increasing the stats, by all means ignore the front end stats and assume 0. No harm done. But the presence of the front end stats provide flexibility.
Hi Arup,
ReplyDeleteWouldn't it be easier to just use Statspack for the collection? The post-logon trigger would simply call
statspack.snap(
i_session_id=>sys_context('USERENV','SID'),
i_ucomment=>'START or betrer SID', i_snap_level=>0)
instead of the INSERT-SELECT.
The pre-logoff trigger would make the same call with a different comment value to enable matching of snapshots made by the same session.
Advantages I see over your approach:
1. Statspack is supplied by and supported by Oracle which makes it easier for the production DBA or consultant to convince management to install it if not already installed.
2. Amount of data captured is easily and extensively configurable (snapshot levels, thresholds)
3. Reporting tools are available from Oracle (spreport.sql, sprepsql.sql) and many third parties, since the schema is well-known.
4. It is easy to manually take intermediate snapshots while a session is running to help concentrating on parts of the app processing.
Drawbacks I see:
1. Increased disk space usage as Statspack captures much more data even at level 0.
2. Increased resource usage, especially for applications with short-lived sessions or prone to cause logon storms.
The second drawback is very significant, but when used sparingly, this method provides far greater flexibility.
Cheers!
Flado
@Flado - you are right about the resource consumption of statspack. That's precisely why I discounted it.
ReplyDeleteHowever, your point about Statspack being "official" is valid. Some organizations may balk at a third party tool, especially one that is free; but not at something that is standard in Oracle.
Now that you have both approaches, you may decode the one that best fits your need.
You made some good points here. I did a search on the topic and found most people agree with your blog. Thanks
ReplyDeleteThis article highlights the importance of capturing session stats for application profiling, addressing the challenge of collecting data from future or short-lived sessions. The tool developed leverages database triggers and tables for persistence, overcoming limitations of dbms_monitor. This reminds me of performance tuning in games like slope unblocked where identifying resource-intensive elements can drastically improve the user experience.
ReplyDelete