Thursday, August 07, 2008

Full Table Scans on RAC

The phone goes off, yet again, and this time it seems to be a serious problem. Here are the symptoms: all the applications seems to be working fine albeit at a lesser performance level, except one app, which has crawled to a halt. The app has abended since the queries are timing out. This is, by the a way a RAC database.

This is rather interesting - only one application is slow, instead of all the applications. If this were a systemwide issue, then all the apps would have been affected; not just one. The DBA checked the wait events, as she should have and the predminant events were "gc buffer busy". Well, this is a systemwide issue as a result of saturated interconnect, isn't it - she muses; then how come application A is the only one affected?

So, this is a big clue: the waits are RAC related - "gc buffer busy", yet only one application is slow. First, I got the SIDs of the sessions waiting for that event:

select SID, sql_id, sql_child_number
from v$session
where event = 'gc buffer busy';

After I got the SIDs, I checked the SQL statements issued by the sessions:

select sql_text
from v$sql
where sql_id = ''
and child_number = 24

The results were interesting - all of them were updating or selecting from a single table. Next I checked the execution plan for the SQLs:

select * from table (
dbms_xplan.display_cursor (

select ...

Plan hash value: 871755554
Id Operation Name

The output tells it all - there is a full table scan going on for a 14 million row table. Well, why? I checked for the presence of indexes and yes, there is an index. Why isn't the index being used?

This is yet another case of "why my index is not being used". It happens at a relatively lesser frequency; but happens nevertheless.

Upon examination I discovered that the table is range partitioned and the index is local. Next I checked if the index has become unusable somehow:

select partition_name, status
from dba_ind_partitions
where index_name = 'the_inex_name';

------------------------------ --------


Note the status of the last two partitions - UNUSABLE. Why so?

I checked the last_ddl_time:

select last_ddl_time
from dba_objects
where object_name = 'the_index_name';

And voila! The index was altered today. Why? The answer came quickly: there is a partition maintenance script that splits the PMAX partition to PMAX and a partition named PT_OCT2009. The split worked fine; but the DBA forgot to add a line like:

alter index this_index_name rebuild partition PMAX;

So, the partition remained unusable. Remember, this is a local index; so the clause MAINTAIN GLOBAL INDEXES will not do anything. The partition must be explicitly rebuilt. Since the partition was not usable, the query plan suddenly changed to full table scan, as I saw earlier. Since this caused a lot of buffer gets, the buffer pool of all the instances of the RAC cluster got flloded with the blocks of this table, causing gc buffer busy waits. So, even though the wait event is RAC related, the issue was due to abnormally large buffer gets and was limited to a specific application only. Eventually, the issue would have saturated the interconnect and affected other apps; but since we caught it early, that event was averted.

I rebuilt the index partitions by explicitly issuing the alter index commands. After that, the queries started using the index and, of course, the gc waits disappeared.


(1) just because you see "gc" waits, don't automatically assume it is RAC specific issue. The underlying issue may have manifested itself as the RAC component issues.
(2) Add to the list of "why oracle is not using my index" one more: unusable index partition(s)
(3) When you split partition of a table, in addition to the clause "maintain global indexes", issue the rebuild of the split partitions of local indexes.
Post a Comment