Sunday, September 12, 2010

A Tool to Enable Stats Collection for Future Sessions for Application Profiling

The other day I was putting together my presentation for Oracle Open World on Application Profiling in RAC. I was going to describe a methodology for putting a face to an app by measuring how it behaves in a database – a sort of a signature of that application. I was going to use the now-ubiquitous 10046 trace for wait events and other activities inside the database. For resource consumption such as redo generated, logical I/Os, etc., I used the v$sesstat; but then I was stuck. How would I collect the stats of a session when the session has not even started and I don’t know the SID. That problem led to the development of this tool where the stats of a future session can be recorded based on some identifying factors such as username, module, etc. Hope this helps in your performance management efforts.


The Problem

Suppose you want to find out the resource consumed by a session. The resources could be redo generation, CPU used, logical I/O, undo records generated – the list is endless. This is required for a lot of things. Consider a case where you want to find out which apps are generating the most redo; you would issue a query like this:

select sid, value
from v$sesstat s, v$statname n
where n.statistic# = s.statistic#
and n.name = 'redo size'
/



The value column will show the redo generated. From the SID you can identify the session. Your next stop is v$session to get the other relevant information such as username, module, authentication scheme, etc. Problem solved, right?

Not so fast. Look at the above query; it selects from v$sesstat. When the session is disconnected, the stats disappear, making the entries for that session go from v$sesstat. If you run the query, you will not find these sessions. You have to constantly select from the v$sesstat view to capture the stats of the sessions hoping that you would capture the stats before the session disconnects. But it will be not be guaranteed. Some short sessions will be missed in between collection samples. Even if you are lucky to capture some stats of a short session, the other relevant information from v$session will be gone.

Oracle provides a package dbms_monitor, where a procedure named client_id_stat_enable allows you to enable stats collection on a future session where the client_id matches a specific value, e.g. CLIENT1. Here is an example:

execute dbms_monitor.client_id_stat_enable('CLIENT1');

However there are three issues:

(1) It collects only about 27 stats, out of 400+

(2) It offers only three choices for selecting sessions – client_id, module_name and service_name.

(3) It aggregate them, sums up all stats for a specific client_id. That is pretty much useless without a detailed session level.

So, in short, I didn’t have a readily available solution.

Solution

Well, necessity is the mother of invention. When you can’t find a decent tool; you build it; and so did I. I built this tool to capture the stats. This is version 1 of the tool. It has some limitations, as shown at the end. These limitations do not apply to all situations; so the tool may be useful in a majority of the cases. Later I will expand the tool to overcome these limitations.

Concept

The fundamental problem, as you recall, is not the dearth of data (v$sesstat has plenty); it’s the sessions in the future. To capture those sessions, the tool relies on a post-logon database trigger to capture the values.

The second problem was persistence. V$SESSTAT is a dynamic performance view, which means the records of the session will be gone when the session disappears. So, the tool relies on a table to store the data.

The third problem is the getting the values at the very end of the session. The difference between the values captured at the end and beginning of the session are the stats. To capture the values at the very end; not anytime before, the tool relies on a pre-logoff database trigger.

The fourth challenge is identification of sessions. SID of a session is not unique; it can be reused for a new session; it will definitely be reused when the database is recycled. So, the tool uses a column named CAPTURE_ID, a sequentially incremented number for each capture. Since we capture once at the beginning and then at the end, I must use the same capture_id. I use a package variable to store that capture_Id.

Finally, the tool allows you to enable stats collections based on some session attributes such as username, client_id, module, service_name, etc. For instance you may want to enable stats for any session where the username = ‘SCOTT’ or where the os_user is ‘ananda’, etc. These preferences are stored in a table reserved for that purpose.

Construction

Now that you understand how the tool is structured, let me show the actual code and scripts to create the tool.

(1) First, we should create the table that holds the preferences. Let’s call this table RECSTATS_ENABLED. This table holds all the different sessions attributes (ip address, username, module, etc.) that can enable stats collection in a session.

CREATE TABLE SYS.RECSTATS_ENABLED
(
  SESSION_ATTRIBUTE  VARCHAR2(200 BYTE),
  ATTRIBUTE_VALUE    VARCHAR2(2000 BYTE)
)
/

If you want to enable stats collection of a session based on a session attribute, insert a record into this table with the session attribute and the value. Here are some examples. I want to collect stats on all sessions where client_info matches ‘MY_CLIENT_INFO1’. You would insert a record like this:

insert into recstats_enabled values ('CLIENT_INFO','MY_CLIENT_INFO1');

Here are some more examples. All sessions where ACTION is ‘MY_ACTION1’:

insert into recstats_enabled values ('ACTION','MY_ACTION1');

Those of user SCOTT:

insert into recstats_enabled values ('SESSION_USER','SCOTT')

Those with service name APP:

insert into recstats_enabled values ('SERVICE_NAME','APP')



You can insert as many preferences as you want. You can even insert multiple values of a specific attribute. For instance, to enable stats on sessions with service names APP1 and APP2, insert two records.

Important: the session attribute names follow the naming convention of the USERENV context used in SYS_CONTEXT function.

(2) Next, we will create a table to hold the statistics

CREATE TABLE SYS.RECSTATS
(
  CAPTURE_ID            NUMBER,
  CAPTURE_POINT         VARCHAR2(10 BYTE),
  SID                   NUMBER,
  SERIAL#               NUMBER,
  ACTION                VARCHAR2(2000 BYTE),
  CLIENT_DENTIFIER      VARCHAR2(2000 BYTE),
  CLIENT_INFO           VARCHAR2(2000 BYTE),
  CURRENT_EDITION_NAME  VARCHAR2(2000 BYTE),
  CURRENT_SCHEMA        VARCHAR2(2000 BYTE),
  CURRENT_USER          VARCHAR2(2000 BYTE),
  DATABASE_ROLE         VARCHAR2(2000 BYTE),
  HOST                  VARCHAR2(2000 BYTE),
  IDENTIFICATION_TYPE   VARCHAR2(2000 BYTE),
  IP_ADDRESS            VARCHAR2(2000 BYTE),
  ISDBA                 VARCHAR2(2000 BYTE),
  MODULE                VARCHAR2(2000 BYTE),
  OS_USER               VARCHAR2(2000 BYTE),
  SERVICE_NAME          VARCHAR2(2000 BYTE),
  SESSION_USER          VARCHAR2(2000 BYTE),
  TERMINAL              VARCHAR2(2000 BYTE),
  STATISTIC_NAME        VARCHAR2(2000 BYTE),
  STATISTIC_VALUE       NUMBER;
)
TABLESPACE USERS



Note, I used the tablespace USERS; because I don’t want this table, which can potentially grow to huge size, to overwhelm the system tablespace. The STATISTIC_NAME and STATISTIC_VALUE columns record the stats collected. The other columns record the other relevant data from the sessions. All the attributes here have been shown with VARCHAR2(2000) for simplicity; of course they don’t need that much of space. In the future versions, I will put a more meaningful limit; but 2000 does not hurt as it is varchar2.

The capture point will show when the values were captured – START or END of the session.

(3) We will also need a sequence to identify the sessions. Each session will have 400+ stats; we will have a set at the end and once at the beginning. We could choose SID as an identifier; but SIDs could be reused. So, we need something that is truly unique – a sequence number. This will be recorded in the CAPTURE_ID column in the stats table.

SQL> create sequence seq_recstats;

(4) To store the capture ID when the post-logon trigger is fired, to be used inside the pre-logoff trigger, we must use a variable that would be visible to entire session. A package variable is the best for that.

create or replace package pkg_recstats
is
   g_recstats_id number;
end;


(5) Finally, we will go on to the meat of the tool – the triggers. First, the post-logon trigger to capture the stats in the beginning of the session:

CREATE OR REPLACE TRIGGER SYS.tr_post_logon_recstats
after logon on database
declare
 l_stmt varchar2(32000);
 l_attr_val recstats_enabled.attribute_value%TYPE;
 l_capture_point recstats.capture_point%type := 'START';
 l_matched boolean := FALSE;
begin
 pkg_recstats.g_recstats_id := null;
 for r in (
  select session_attribute, attribute_value
  from recstats_enabled
  order by session_attribute
 )
 loop
  exit when l_matched;
            -- we select the userenv; but the null values may cause
            -- problems in matching; so let’s use a value for NVL
            -- that will never be used - !_!_!
  l_stmt := 'select nvl(sys_context(''USERENV'','''||
   r.session_attribute||'''),''!_!_!_!'') from dual';
  execute immediate l_stmt into l_attr_val;
  if l_attr_val = r.attribute_value then
   -- match; we should record the stats
   -- and exit the loop; since stats should
   -- be recorded only for one match.
   l_matched := TRUE;
   select seq_recstats.nextval
   into pkg_recstats.g_recstats_id
   from dual;
   insert into recstats
   select
    pkg_recstats.g_recstats_id,
    l_capture_point,
    sys_context('USERENV','SID'),
    null,
    sys_context('USERENV','ACTION'),
    sys_context('USERENV','CLIENT_IDENTIFIER'),
    sys_context('USERENV','CLIENT_INFO'),
    sys_context('USERENV','CURRENT_EDITION_NAME'),
    sys_context('USERENV','CURRENT_SCHEMA'),
    sys_context('USERENV','CURRENT_USER'),
    sys_context('USERENV','DATABASE_ROLE'),
    sys_context('USERENV','HOST'),
    sys_context('USERENV','IDENTIFICATION_TYPE'),
    sys_context('USERENV','IP_ADDRESS'),
    sys_context('USERENV','ISDBA'),
    sys_context('USERENV','MODULE'),
    sys_context('USERENV','OS_USER'),
    sys_context('USERENV','SERVICE_NAME'),
    sys_context('USERENV','SESSION_USER'),
    sys_context('USERENV','TERMINAL'),
    n.name,
    s.value
   from v$mystat s, v$statname n
   where s.statistic# = n.statistic#;
  end if;
 end loop;
end;

The code is self explanatory. I have provided more explanation as comments where needed.

(6) Next, the pre-logoff trigger to capture the stats at the end of the session:

CREATE OR REPLACE TRIGGER SYS.tr_pre_logoff_recstats
before logoff on database
declare
 l_capture_point recstats.capture_point%type := 'END';
begin
 if (pkg_recstats.g_recstats_id is not null) then
  insert into recstats
  select
   pkg_recstats.g_recstats_id,
   l_capture_point,
   sys_context('USERENV','SID'),
   null,
   sys_context('USERENV','ACTION'),
   sys_context('USERENV','CLIENT_IDENTIFIER'),
   sys_context('USERENV','CLIENT_INFO'),
   sys_context('USERENV','CURRENT_EDITION_NAME'),
   sys_context('USERENV','CURRENT_SCHEMA'),
   sys_context('USERENV','CURRENT_USER'),
   sys_context('USERENV','DATABASE_ROLE'),
   sys_context('USERENV','HOST'),
   sys_context('USERENV','IDENTIFICATION_TYPE'),
   sys_context('USERENV','IP_ADDRESS'),
   sys_context('USERENV','ISDBA'),
   sys_context('USERENV','MODULE'),
   sys_context('USERENV','OS_USER'),
   sys_context('USERENV','SERVICE_NAME'),
   sys_context('USERENV','SESSION_USER'),
   sys_context('USERENV','TERMINAL'),
   n.name,
   s.value
  from v$mystat s, v$statname n
  where s.statistic# = n.statistic#;
  commit;
 end if;
end;
/

Again the code is self explanatory. We capture the stats only of the global capture ID has been set by the post-logoff trigger. If we didn’t do that all the sessions would have started recording stats at their completion.

Execution

Now that the setup is complete, let’s perform a test by connecting as a user with the service name APP:

SQL> connect arup/arup@app



In this session, perform some actions that will generate a lot of activity. The following SQL will do nicely:

SQL> create table t as select * from all_objects;


SQL> exit



Now check the RECSTATS table to see the stats on this catured_id, which happens to be 1330.

col name format a60
col value format 999,999,999
select a.statistic_name name, b.statistic_value - a.statistic_value value
from recstats a, recstats b
where a.capture_id = 1330
and a.capture_id = b.capture_id
and a.statistic_name = b.statistic_name
and a.capture_point = 'START'
and b.capture_point = 'END'
and (b.statistic_value - a.statistic_value) != 0
order by 2
/


Here is the output:


NAME                                                                VALUE
------------------------------------------------------------ ------------
workarea memory allocated                                              -2
change write time                                                       1
parse time cpu                                                          1
table scans (long tables)                                               1
cursor authentications                                                  1
sorts (memory)                                                          1
user commits                                                            2
opened cursors current                                                  2
IMU Flushes                                                             2
index scans kdiixs1                                                     2
parse count (hard)                                                      2
workarea executions - optimal                                           2
redo synch writes                                                       2
redo synch time                                                         3
rows fetched via callback                                               5
table fetch by rowid                                                    5
parse time elapsed                                                      5
recursive cpu usage                                                     8
switch current to new buffer                                           10
cluster key scan block gets                                            10
cluster key scans                                                      10
deferred (CURRENT) block cleanout applications                         10
Heap Segment Array Updates                                             10
table scans (short tables)                                             12
messages sent                                                          13
index fetch by key                                                     15
physical read total multi block requests                               15
SQL*Net roundtrips to/from client                                      18
session cursor cache hits                                              19
session cursor cache count                                             19
user calls                                                             25
CPU used by this session                                               28
CPU used when call started                                             29
buffer is not pinned count                                             33
execute count                                                          34
parse count (total)                                                    35
opened cursors cumulative                                              36
physical read total IO requests                                        39
physical read IO requests                                              39
calls to get snapshot scn: kcmgss                                      45
non-idle wait count                                                    67
user I/O wait time                                                    116
non-idle wait time                                                    120
redo ordering marks                                                   120
calls to kcmgas                                                       143
enqueue releases                                                      144
enqueue requests                                                      144
DB time                                                               149
hot buffers moved to head of LRU                                      270
recursive calls                                                       349
active txn count during cleanout                                      842
cleanout - number of ktugct calls                                     842
consistent gets - examination                                         879
IMU undo allocation size                                              968
physical reads cache prefetch                                         997
physical reads                                                      1,036
physical reads cache                                                1,036
table scan blocks gotten                                            1,048
commit cleanouts                                                    1,048
commit cleanouts successfully completed                             1,048
no work - consistent read gets                                      1,060
redo subscn max counts                                              1,124
Heap Segment Array Inserts                                          1,905
calls to kcmgcs                                                     2,149
consistent gets from cache (fastpath)                               2,153
free buffer requested                                               2,182
free buffer inspected                                               2,244
HSC Heap Segment Block Changes                                      2,519
db block gets from cache (fastpath)                                 2,522
consistent gets                                                     3,067
consistent gets from cache                                          3,067
bytes received via SQL*Net from client                              3,284
bytes sent via SQL*Net to client                                    5,589
redo entries                                                        6,448
db block changes                                                    9,150
db block gets                                                      10,194
db block gets from cache                                           10,194
session logical reads                                              13,261
IMU Redo allocation size                                           16,076
table scan rows gotten                                             72,291
session uga memory                                                 88,264
session pga memory                                                131,072
session uga memory max                                            168,956
undo change vector size                                           318,640
session pga memory max                                            589,824
physical read total bytes                                       8,486,912
cell physical IO interconnect bytes                             8,486,912
physical read bytes                                             8,486,912
redo size                                                       8,677,104




This clearly shows you all the stats of that session. Of course the table recorded all other details of the session as well – such as username, client_id, etc., which are useful later for more detailed analysis. You can perform aggregations as well now. Here is an example of the stats collected for redo size:

select session_user, sum(STATISTIC_VALUE) STVAL
from recstats
where STATISTIC_NAME = 'redo size'
group by session_user
/

Output:

SESSION_USER STVAL
------------ ---------
ARUP            278616
APEX           4589343
… and so on …

You can disassemble the aggregates to several attributes as well. Here is an example where you want to find out the redo generated from different users coming from different client machines

select session_user, host, sum(STATISTIC_VALUE) stval
from recstats
where STATISTIC_NAME = 'redo size'
group by session_user, host
/

Output:

SESSION_USER HOST          STVAL
------------ ----------- -------
ARUP         oradba2       12356
ARUP         oradba1      264567
APEX         oradba2       34567
… and so on …


Granularity like this shows you how the application from different client servers helped; not just usernames.



Limitations

As I mentioned, there are some limitations you should be aware of. I will address them in the next iterations of the tool. These are not serious and applicable in only certain cases. As long as you don’t encounter that case, you should be fine.

(1) The logoff trigger does not fire when the user exits from the session ungracefully, such as closing down the SQL*Plus window, or closing the program before exiting. In such cases the stats at the end of the session will not be recorded. In most application infrastructure it does not happen; but it could happen for adhoc user sessions such as people connecting through TOAD.

(2) The session attributes such as module, client_id and action can be altered within the session. If that is the case, this tool does not record that fact since there is no triggering event. The logoff trigger records the module, action and client_id set at that time. These attributes are not usually changed in application code; so it may not apply to your case.

(3) Parallel Query sessions will have a special issue since there will be no logoff trigger. So in case of parallel queries, you will not see any differential stats. If you don’t use PQ, as most OLTP applications do, you will not be affected.

(4) If the session just sits there without disconnecting, the logoff trigger will never fire and the stats will never be captured. Of course, it will be eventually captured when the session exits.

Once again, these limitations apply only to certain occasions. As long as you are aware of these caveats, you will be able to use this tool to profile many of your applications.

Happy Profiling!

7 comments:

Balaji Ramachandran said...

Hi Arup,

Thanks for the tool , could we normalize the recstats table to two tables so that the environment variables need not be stored for each stat value.

Thanks,
Balaji

Flado said...

Why do you capture the stats at logon? All the accumulators (e.g., "redo size") in v$sesstat start from 0 anyway; diffing the absolute values (e.g., "workarea memory allocated") makes no sense as they can go up and down during the life of the session.
One capture just before logoff (perhaps capturing v$session_event as well) should be all that's needed.

Cheers,
Flado

Arup Nanda said...

@Balaji - yes, they can be normalized into two tables. I didn't deliberately do it because the repeating elements are too few (about 15); and keeping them in the same table helps doing some serious filtering, group by operations. Normalizing into two tables means always joining, requiring an index and so on.

Arup Nanda said...

@Flado - yes, the accumilators wil lbe all zero at the very beginning of the session. What if there are some special recursive process that happens before the session gets completely established, such as VPD, application contexts, etc; or some other trigger that establishes some other variables? These will acculmulate the values. In normal application profiling you *may* want to ignore these "noises". That's where the post-login trigger comes handy.

If you would rather not ignore these recursive processes increasing the stats, by all means ignore the front end stats and assume 0. No harm done. But the presence of the front end stats provide flexibility.

Flado said...

Hi Arup,

Wouldn't it be easier to just use Statspack for the collection? The post-logon trigger would simply call
statspack.snap(
i_session_id=>sys_context('USERENV','SID'),
i_ucomment=>'START or betrer SID', i_snap_level=>0)
instead of the INSERT-SELECT.
The pre-logoff trigger would make the same call with a different comment value to enable matching of snapshots made by the same session.

Advantages I see over your approach:
1. Statspack is supplied by and supported by Oracle which makes it easier for the production DBA or consultant to convince management to install it if not already installed.
2. Amount of data captured is easily and extensively configurable (snapshot levels, thresholds)
3. Reporting tools are available from Oracle (spreport.sql, sprepsql.sql) and many third parties, since the schema is well-known.
4. It is easy to manually take intermediate snapshots while a session is running to help concentrating on parts of the app processing.

Drawbacks I see:
1. Increased disk space usage as Statspack captures much more data even at level 0.
2. Increased resource usage, especially for applications with short-lived sessions or prone to cause logon storms.
The second drawback is very significant, but when used sparingly, this method provides far greater flexibility.

Cheers!
Flado

Arup Nanda said...

@Flado - you are right about the resource consumption of statspack. That's precisely why I discounted it.

However, your point about Statspack being "official" is valid. Some organizations may balk at a third party tool, especially one that is free; but not at something that is standard in Oracle.

Now that you have both approaches, you may decode the one that best fits your need.

Kyle Heath said...

You made some good points here. I did a search on the topic and found most people agree with your blog. Thanks

Translate