The Arup Nanda Blog: 2010

Sunday, December 12, 2010

New York Oracle User Group 2010 Session

Thank you for all those who came to my session in NYOUG Metro Area Meeting. I know you had choices; and I am honored by for gracing my session. I apologize for running out of time. In the hindsight, I should have entertained questions at the end.

You can download the presentation here.
You can download the paper that describes the presentation here.

Based on the feedback I received, I started a new blog series "100 Things You Probably Didn't Know About Oracle Database". You may want to check out the first of the series. It talks about the commits and the frequency of block flushes to the disk - questions that came up during my presentation.

100 Things You Probably Didn't Know About Oracle Database

Recently, while delivering a presentation on Cache Fusion at New York Oracle Users Group (www.nyoug.org), the regional user group where I have been a long time member, I was surprised to hear from many participants some beliefs they had held for a long time that were completely wrong. I always thought these were as obvious as they come; but of course I was dead wrong. What was even more surprising that most of these believers were veterans in Oracle Database technologies; not newbies. Part of the problem – I think – lies with the system that focuses on the execution rather than learning and part of it due to the lack of clear documentation. During that discussion some encouraged me to write about these. I immediately agreed it was a great idea and merited serious attention. Here is the product: my attempt at explaining some of the “mysteries” of how Oracle Database operates. I will cover 100 such nuggets of information, roughly once a week.

Before you start, however, I would like to bring your attention to this important point. You may already be aware of these facts. I did; so it is reasonable to believe that a vast majority would as well. Under no circumstances I am claiming these to be ground breaking or awe-inspiring. If you are already familiar with this specific knowledge, I am not at all surprised. Please feel free to skip. For those who read on, I hope you found these helpful and will take a moment to write to me how you felt.

Part 1: Myth of Commit Causing Buffer to be Flushed to the Disk

Consider a scenario: In the EMP table I updated my salary from 1000 to 2000; and committed immediately. The very instance after I issued commit, if I check the datafile on the disk of the tablespace where this table is located, which value should I see – 1000 or 2000? (Remember, the value was committed)

Did you answer 2000 (perhaps because it was committed)? If so, then consider a normal application where commits are issued up to tens of thousands of times every minute. In a traditional database the weakest link in the chain is always I/O. If Oracle issued an update of the datafile every time someone commits, it would grind to a halt.

Did you answer 1000? Well, in that case, consider a case when the instance crashes. The datafile would have had 1000; not 2000 – the value that was committed. In such a case the instance must bring back the value committed (2000, in this case) to the datafile. How?

Let’s examine a different scenario. Suppose I did not issue a commit after the update (perhaps I was not sure of the implication of giving myself a pay hike or perhaps I had pang of conscience). I left the session as is and left for the day. The next day I was sick and didn’t come to work. 24 hours passed since I updated the record. At that point, if someone reads the datafile, what value would they see – 1000 or 2000?

Did you answer 1000 – a logical choice since the read consistency model of Oracle Database guarantees that the other sessions will see the pre-change data for the un-committed transactions?

Question #3 in this scenario: if you check the redo log file (not the datafile), what value will you find there – 1000 or 2000? Remember, it has not been committed. Did you answer 1000? It sort of makes sense; the changes are not committed so there is no reason for them to be in the redo log file, which is a very important part of the recovery process. If you answered 2000, then how would you explain the recovery process? In case of instance failure, the recovery must read the redo log file and since the transaction was not committed, it must roll the vale back to the previous – 1000. How would it do that if the redo log file contains 2000, not 1000?

The answers, if you can’t wait any longer: 1000 for the first question, 2000 for the second and 2000 for the third. How so? Please read on.

Explanation

To understand the mechanics of the process, let’s go over the buffer management process of the Oracle database. It’s a rudimentary detail but is quite vital in the path to understand the myth here. Consider a very small table in an equally small tablespace we created:

SQL> create tablespace testts datafile '/tmp/testts_01.dbf' size 1M;

SQL> create table mythbuster1 (col1 varchar2(200)) tablespace testts;

Insert a row:

SQL> insert into mythbuster1 values (‘ORIGINAL_VALUE’);
SQL> Commit;

Shutdown and restart the database so that the buffer cache is completely devoid of this table. You can also issue ALTER SYSTEM FLUSH BUFFER_CACHE; but I want to make sure all traces of this table (and value of the column inside) vanish from all memory areas – buffer cache, shared pool, PGA, whatever. You can now check the presence of the value in the datafile:

$ strings /tmp/testts_01.dbf
}|{z
-N?pD112D2
TESTTS
 1j)
 w>!
ORIGINAL_VALUE

The value is right there. Now suppose a user issues a statement like this from SQL*Plus:

SQL> select * from mythbuster1;

Oracle creates a process – called “server process” – on behalf of this user session to service the request from the session. This process is named, in unix and like OS’es, oracle. Here is how you can find it out:

$ ps -aef|grep sqlplus
oracle   14257 14214  0 13:42 pts/2    00:00:00 sqlplus   as sysdba
$ ps -aef | grep 14257
oracle   14257 14214  0 13:42 pts/2    00:00:00 sqlplus   as sysdba
oracle   14258 14257  0 13:42 ?        00:00:00 oracleD112D2 DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

The process 14258 is the server process. The SQL*Plus process is known as the user process which can be any process a user executes such as a Java program, a Pro*C code, a TOAD session and so on. It’s the server process that handles all the interaction with the Oracle database; not the user process. This is why Oracle database interaction is said to be based on a Two Task Architecture; there are always two tasks – the user task that a regular user has written and the server task that performs the database operations. This is an important concept established during the early foundations of the Oracle database to protect the database from errant code in the user task introduced either maliciously or inadvertently.

The server process then identifies the block the row exists in. Since the database instance just came up the buffer cache is empty and the block will not be found. Therefore the server process issues a read call from the datafile for that specific block. The block is read from the disk to the buffer cache. Until the loading of the block from the disk to the buffer cache is complete, the session waits with the event – db file scattered read. In this very case the session issues a full table scan. Had it performed an index scan, the session would have waited with the event db file sequential read. [I know, I know – it seems to defy conventional logic a little bit. I would have assumed index scan to be named scattered reads and full table scans to be sequential].

Once this process is complete, the buffer cache holds the copy of the block of the table mythbuster1. Subsequent session, if they select from the table, will simply get the data from this buffer; not from the disk.

Now, suppose the session issues the statement:

SQL> update mythbuster1 set col1 = ‘CHANGED_VALUE’;

And commits:

SQL> commit;

Immediately afterwards, check for the presence of the values in the datafile:

$ strings /tmp/testts_01.dbf
}|{z
-N?pD112D2
TESTTS
 1j)
 w>!
ORIGINAL_VALUE

The old value, not the new value, is found. The datafile on the disk still has the old value; not the new one, even though the transaction has been committed. The update statement actually updated only the buffer in the cache; not the disk. So, when is the data on the disk updated?

The datafile gets updated by a process known as Database Writer (a.k.a. Database Buffer Writer). It’s named DBW0. Actually, there may be more than one such process and they are named DBW0, DBW1, etc. – more conveniently addressed as DBWn. For the purpose of the discussion here, let’s assume only one process – DBW0. It has only one responsibility – to update the datafile with the most up to date buffers from the buffer caches. [Note: I used buffer caches – plural. This is not a typo. There may be more than one buffer cache in the database – keep, recycle, default and other block sizes – but that’s for another day]. The buffer that has been updated is known as a dirty buffer since its contents are different from the block on the disk. DBW0 process writes the contents of the buffer to the disk – making it clean again.

But the big question is when DBW0 writes the dirty buffer to the disk? Ah, that’s the very question we are pondering over here. There are several “triggering” events that cause DBW0 to copy the buffers to the disk – also called flushing of the buffers. By the way, DBW0 is a lazy process; it does not flush buffers by itself or on a regular basis. It sleeps most of the time and must be woken up by another process to perform its duties. One such watchdog process is called the Checkpoint (you can check its existence by ps -aef | grep ckpt in Unix systems). Checkpoint actually does not perform the flushing (also called checkpointing activity); but calls the DBW0 process to do it. How often does Checkpoint process perform a checkpoint? It depends on various conditions – the biggest of all is the MTTR setting, which we will cover later in a different installment.

Next. let's examine a different scenario. Drop the table, create the table again, recycle the database to remove all buffers of the table and then perform the update; but do not commit. Then flush the buffers from the cache to the disk. You can also trigger a checkpointing activity manually instead of waiting for the checkpoint process. Here is how to do it:

SQL> alter system checkpoint;

After that statement completes, check the presence of the values in the datafile again:

$ strings /tmp/testts_01.dbf
}|{z
-N?pD112D2
TESTTS
 1j)
 w>!
CHANGED_VALUE,
ORIGINAL_VALUE

The old value is still there; but that is an artifact; it will eventually be gone. The new value is updated in the datafile. But do you remember a very important fact – the transaction is still not committed? In a different session, if you check the data in COL1 column, you will see the value ORIGINAL_VALUE. Where does Oracle get that value from? It gets that value from the Undo Segments in the Undo Tablespace. The undo tablespace contains the pre-change value.

Well, now you may wonder how on earth the server process knows that the data is uncommitted and therefore the undo segment is to be checked. Good question. Let me add yet another wrinkle to it – the datafile contains the new value; not the old one. How does Oracle even know to return which rows pre-change? It gets that information from the header of the block where the transactions are recorded – called Transaction Table – or, a little bit differently: Interested Transaction List (ITL). I will cover that in detail in a future installment of this series. For the time being, please bear in mind that the block header holds that information. When the server process accesses the buffer (or the block on the disk) to get the column value, it accesses the transaction table, sees that there is an uncommitted transaction against it and gets the undo information from there. Finally it creates a different copy of the buffer as it would have looked like had the update statement not been issued. This process is called Consistent Read (CR) Processing.

Now back to our original discussion. Since DBW0 does not immediately flush the buffers to the datafile, it makes datafile inconsistent with the committed data. Won’t that compromise the recovery process? What would happen when the instance crashes before the flushing has occurred? Since the transaction was previously committed, the recovery should update the datafile. Where does that information come from? Undo tablespace? No; undo tablespace is also another datafile; it gets flushed in the same manner; so it may not have those values. Besides it may not even contain the new value.

Redo Stream

This is where the other leg of the database’s guarantee of the committed transaction comes in. When the changes occur in the table, Oracle also records the information in another pool in the memory called Log Buffer. Compared to buffer caches, which could be several terabytes; this buffer is tiny – often just a few MBs. The update statement records the pre and post change values to the log buffer (not to the log file, mind you). But the log buffer is just an area of memory; it also goes away when the instance crashes. So how does Oracle use the information to protect the committed data?

This is where the redo log files (a.k.a. online redo logs) come into picture. When the session commits, the contents of the log buffer are flushed to the redo log files. Until the flushing is completed, the session waits with various wait events depending on conditions, the majority of which are “log file sync” and “log file parallel write”. But does the log buffer flushing occur only when a commit occurs? No. There are other triggering events as well:
(1) When one third of the log buffer is full
(2) When 1 MB of log buffer is written
(3) Every three seconds

There are other events as well; but these are the major ones. Since commit statement flushes the log buffer to the redo log file, even if the instance crashes the information is stored in the redo log file and can be easily read by the instance recovery processes. In case of a RAC database, a single instance may have crashed. The instance recovery is done by one of the surviving instances. But it must read the redo entries of the crashed instance to reconstruct the blocks on the disk. This is why the redo log files, although for only one instance, must be visible to all nodes.

Even if the commit is not issued, the other triggering events flush the contents of the log buffer to the redo log files as well. The presence of the redo entries on the redo log files is independent of the commit. From the previous discussion you learned that the checkpoint flushes the buffers from the cache to the disk, regardless of the issuance of the commit statement. Therefore, these are the interesting possibilities after a session updates the data (which is updated in the buffer cache):

Scenario	Session committed?	Log Buffer Flushed	Checkpoint Occurred	Datafile Updated	Redo Log Updated	Comment
1	No	No	No	No	No
2	No	Yes	No	No	Yes
3	No	No	Yes	Yes	No
4	No	Yes	Yes	Yes	Yes
5	Yes	Yes	No	No	Yes	Commit will force a redo log flush
6	Yes	Yes	Yes	Yes	Yes

Looking at the table above you may see some interesting conundrums – redo log has the changed data but datafile does not and vice versa. How does Oracle know when and what exactly to recover since the presence of record in the redo log file is not a guarantee that the data was committed?

To address that issue, Oracle places a special “marker”, called a Commit Marker in the redo stream which goes into the redo log buffer. When instance recovery is required, Oracle doesn’t just recover anything that is present in the redo log buffer; it looks for a commit marker. If one is not found, then the changes are deemed to be uncommitted; and therefore Oracle rolls them back. If the changes are not found in the redo log, then the changes are uncommitted – guaranteed (remember, a commit will definitely flush the log buffer to redo). In that case Oracle rolls them back from the datafiles – a process known as rolling back. When the changes are found in redo log (along with the commit marker) but no corresponding changes in the datafile (scenario #5), Oracle will apply the changes to the datafile from the redo entries – a process known as roll forward. Recovery consists of both rolling back and forward.

To put it all together, here is a rough algorithm for the actions of the recovery process:

Read the redo log entries starting with the oldest one
Check the SCN number of the change
Look for the commit marker. If the commit marker is found, then data has been committed.
If found, then look for the changes in the datafile (via the SCN number)
   Change has been reflected in the datafile?
   If yes, then move on
   If no, then apply the changes to the datafile (roll forward)
If not found, then the data is uncommitted. Look for the changes in the datafile.
   Change found in datafile?
   If no, then move on
   If yes, then update the datafile with the pre-change data (rollback)

Takeaways

Let me reiterate some of the lessons from this installment.

(1) Data buffers are flushed to the disk from the buffer cache independently of the commit statement. Commit does not flush the buffers to the disk.
(2) If the buffer is modified in the buffer cache but not yet flushed to the disk, it is known as a dirty buffer.
(3) If a buffer is clean (i.e. not dirty), it does not mean that the data changes have been committed.
(4) When a commit occurs, the log buffer (not the buffer cache) is flushed to the disk
(5) Log buffer may already have been flushed to the disk due to other triggering events. So if a change in found in the redo log file, the change is not necessarily commited.
(6) A commit statement puts a special “commit marker” on the redo log, which is the guarantee of a commit.
(7) The frequency of the flushing of the buffer cache to the datafiles is controlled by the MTTR setting and whether free buffers are needed in the cache due to incoming blocks from the datafiles.

How do you use this information? There are several things for you to consider:

(1) The more you commit, the more log buffer will be flushed, not very good for I/O.
(2) The more aggressive the MTTR target is, the less time it will take if the instance crashes, but the more the frequency of flushing to the datafile will be as well – causing I/O
(3) The MTTR target has nothing to do with commit frequency; they are two independent activities. So, reducing commit frequency will not cause a reduction in flushing frequency.
(4) If your buffer size is small, there will be more the need to flush
(5) Exactly how small is “small”? There is no fixed formula; it depends on how much of the data in the buffer cache is updated.

I hope you enjoyed this installment of “100 Things …”. In the future installments I will explain some of the other nuances of the Oracle database that you may not have been aware of. As always, I will highly appreciate if you could drop me a line telling me your feedback – good, bad and anything inbetwen.

Friday, December 03, 2010

Exadata on Cloud

I just saw this from Oracle Newsflash. Oracle Launches Exadata On Demand for High-Performance Computing. http://www.oracle.com/newsletters/information-indepth/on-demand/dec-10/exadataod.html.

This is very exciting. Of course the details are little at this time. But when the model matures, it allows us to play with the technology without investing a whole lot in in. A pet peeve of mine about Exadata was that it was a hardware; not a software. Unlike other Oracle products, I couldn't just download it and test the features. The limitation is huge. My role in my organization is to develop roadmaps and champion technological innovations. But without getting hands dirty, it comes to a pure paper excercise, which is not something I ever do.

With this, I am sure a lot of folks will now be able to validate their current systems against the Exadata platforms, making it easier and more prudent for them to champion it within their sphere of influence.

Friday, November 19, 2010

Seminar in Bucharest

Thank you all those who attended my seminars in Bucharest. Thank you for taking the time of two days with me. I hope it has been worthwhile. I sincerely apologize for the delay in start of the first day. It was entirely my fault - I messed up the time difference calculations.

Please access the scripts and other materials from http://www.proligence.com/advrman_bucharest. You should have the userid and password from the class. If you have forgotten, please let me know.

Here are is the addition resource I was referring to. To present the consolidated view the status of all RMAN jobs in a single catalog: http://arup.blogspot.com/2010/11/tool-to-present-consolidated-view-of.html

Friday, November 12, 2010

My Oracle New Features Series on OTN

Some people have emailed me that they can't find my Oracle 11g New Features for DBAs and Developers on Oracle Technology Network anymore.

OTN has apparently done some cleanup, defragmentation and coalescing and as a result a lot of URLs have moved around, including those for articles of mine. The new URL for my Oracle 11g NF series is http://www.oracle.com/technetwork/articles/sql/index-099021.html

Just in case you are looking for my Oracle 10g New Features for DBAs and Developers series (including the 10gR2 addendum), here it is http://www.oracle.com/technetwork/articles/sql/index-082320.html.

While on the subject of changed URLs, in case you are wondering about my other series on OTN - Project Lockdown - that is a four part series on Oracle Security, here it is http://www.oracle.com/technetwork/articles/index-087388.html

Hope you find them useful.

Thursday, November 11, 2010

Tool to Add Range Partitions Automatically to a Table

Continuing on my earlier posts on partitioning, here are is another tool on this topic. If you have been using partitioned tables, you might have noticed that one of the thorny issues in partitioning is the need to create partitions. If you don't have the partition ready, the inserts will fail with the following message

ERROR at line 1:
ORA-14400: inserted partition key does not map to any partition

This error is pretty disastrous to applications as they essentially stop functioning, causing outage, until you add the necessary partition. What can you do to prevent it? Maintain a unblinking vigil, of course, which may quickly become impractical. Oracle 11g has a nice answer - the INTERVAL partitioning, which automatically creates partitions when the new records come in. What if you are on 10g or do not want to use INTERVAL partitioning for some reason? Is there a way to simulate the interval partitioning?

After getting frustrated with the lack of proper partitions more than once, I decided to develop this tool. I perfected the tool over several months to make it more generic. Here it is. Enjoy.

Important: This is provided for educational purpose only. Use it at your own risk. There is no implied or expressed warranty and the author assumes no responsibility for any adverse effect.

What the Tool does

The tool is implemented as a stored procedure. When executed, it reads a driver table called partition_create_schedule, which records the tables to which the partitions are to be added. The table is created as shown below:

/* ---------------------------------------------------------------------
Name        : partition_create_schedule
Purpose     : To store the retention period of partitions of
            : indivisual tables. This data is used by partition_drop
            : procedure to automatically drop partitions.
----------------------------------------------------------------------*/

create table partition_create_schedule
(
 OWNER                  VARCHAR2(30),
 TABLE_NAME             VARCHAR2(30),
 PARTITION_UNIT         VARCHAR2(15),
 NUM_PARTS_TO_CREATE    NUMBER(4)
)
/
comment on table partition_create_schedule is 'To record the tables for which partitions are to be added automatically'
/
comment on  column partition_create_schedule.owner is 'the owner of the table'
/
comment on  column partition_create_schedule.table_name is 'the name of the table'
/
comment on  column partition_create_schedule.PARTITION_UNIT is 'the partitioning scheme - MONTHLY/DAILY/WEEKELY/QUARTERLY/YEARLY'
/
comment on column partition_create_schedule.num_parts_to_create is 'the number of units to skip in creating the partition. For instance, if you want to create the next partition 3 days from now for a DAILY unit, place 3 here'
/

The partitioning unit is important. Remember, in date-range partitioned tables, you merely provide the range; not whether the range is monthly, weekly or something like that. Even if you have a somewhat similar range, there is no guarantee that it will remain so. You may decide to split some partition or coalesce two. Therefore it is important that you tell the tool what type of partitioning duration the table is under.

Let's insert some records:

insert into partition_create_schedule values ('ARUP','PTEST1','MONTHLY',1);
insert into partition_create_schedule values ('ARUP','PTEST2','MONTHLY',1);
insert into partition_create_schedule values ('ARUP','PTEST3','QUARTERLY',1);
insert into partition_create_schedule values ('ARUP','PTEST4','DAILY',1);
commit;

Now comes the tool - the procedure. To build it, I considered these objectives:

(1) The intervals could be daily, weekly, monthly, quarterly and yearly
(2) When this procedure executes, it should automatically compute the boundary of the partition to add from the existing partitions. This should not be expected to be input from the user.
(3) This procedure will be executed every day automatically. In case of a DAILY partition, a new partition will be created; but not for other ranges. For weekly ranges, the new partition should be created only on the first day of the week; for monthly, on the first day of the month, etc.
(4) The global indexes must be updated automatically
(5) It must handle the two possibilities of the existing highest partition, e.g. a partition with a distinct boundary value and one with (MAXVALUE) clause. In the former case, a partition is added. In the latter case, the maxvalue partition has to be "split"; not added. The procedure must take care of both cases automatically.
(6) When the partition is split, the local indexes get invalidated and must be rebuilt.
(7) It should send an email to the DBA in both success and failure.

The tool handles all the above objectives pretty well.

CREATE OR REPLACE procedure SYS.partition_create
as
 cursor sched_cur is
  select * 
  from partition_create_schedule
  ;
 l_sched_rec    sched_cur%rowtype;
 l_cutoff_date   date;
 l_hvc     varchar2(2000);
 l_new_hvc    varchar2(2000);
 l_old_hvc    varchar2(2000);
 l_part_name    varchar2(2000);
 l_new_part_name   varchar2(2000);
 l_table_name   varchar2(2000);
 l_hv     date;
 l_new_hv    date;
 l_sql     varchar2(2000);
 l_full     varchar2(32000);
 l_ts     varchar2(2000);
 l_errm     varchar2(2000);
 l_mail_body    varchar2(2000);
 l_out     varchar2(2000);
 l_recipients   varchar2(2000) :=
  'dba.ops@mycompany.com';
 errc     number(10);
 l_place     varchar2(2);
 l_exec_flag    varchar2(4);
 invalid_partition_unit exception;
begin
 execute immediate 'alter session set smtp_out_server = ''smtp.mycompany.com''';
 l_out :=
  rpad('Owner',10)||
  rpad('Table',30)||
  rpad('Part Unit',10)||
  rpad('Old HV',11)||
  rpad('New HV',11)||
  rpad('Exec',4)
  ;
 dbms_output.put_line(l_out);
 l_out :=
  rpad('-',9,'-')||' '||
  rpad('-',29,'-')||' '||
  rpad('-',9,'-')||' '||
  rpad('-',10,'-')||' '||
  rpad('-',10,'-')||' '||
  rpad('-',3,'-')
  ;
 dbms_output.put_line(l_out);
 open sched_cur;
 loop --{
  fetch sched_cur into l_sched_rec;
  exit when sched_cur%notfound;
  select high_value, tablespace_name
  into l_old_hvc, l_ts
  from dba_tab_partitions tp
  where partition_position = (
    select max(partition_position) from dba_tab_partitions
    where table_name = tp.table_name
    and table_owner = tp.table_owner
  )
  and table_name = l_sched_rec.table_name
  and table_owner = l_sched_rec.owner;
  if (l_old_hvc = 'MAXVALUE') then --{
   -- PMAX code. Split the partition
   -- first get the hign value of the partition just prior to PMAX
   select high_value, tablespace_name
   into l_hvc, l_ts
   from dba_tab_partitions tp
   where partition_position = (
     select max(partition_position) - 1 from dba_tab_partitions
     where table_name = tp.table_name
     and table_owner = tp.table_owner
   )
   and table_name = l_sched_rec.table_name
   and table_owner = l_sched_rec.owner;
   execute immediate 'select '||l_hvc||' from dual' into l_hv;
   l_exec_flag := 'NO';
   case l_sched_rec.partition_unit  --{
    when 'DAILY' then
     l_new_hv := l_hv + l_sched_rec.num_parts_to_create;
     l_new_part_name := 'Y'||to_char(l_new_hv,'YY')||'M'||to_char(l_new_hv,'MM')||'D'||to_char(l_new_hv,'DD');
     l_exec_flag := 'YES';
    when 'WEEKLY' then
     l_new_hv := l_hv + (7*l_sched_rec.num_parts_to_create);
     l_new_part_name := 'Y'||to_char(l_new_hv,'YY')||'W'||to_char(l_new_hv,'WW');
     if (to_char(sysdate,'d') = '1') then
      l_exec_flag := 'YES';
     end if;
    when 'MONTHLY' then
     l_new_hv := add_months(l_hv,l_sched_rec.num_parts_to_create);
     if (to_char(sysdate,'dd') = '1') then
      l_exec_flag := 'YES';
     end if;
     l_new_part_name := 'Y'||to_char(l_new_hv,'YY')||'M'||to_char(l_new_hv,'MM');
    when 'QUARTERLY' then
     l_new_hv := add_months(l_hv,3*l_sched_rec.num_parts_to_create);
     l_new_part_name := 'Y'||to_char(l_new_hv,'YY')||'Q'||to_char(l_new_hv,'Q');
     if (to_char(sysdate,'mm/dd') in ('01/01','04/01','07/01','10/01')) then
      l_exec_flag := 'YES';
     end if;
    when 'ANNUALLY' then
     l_new_hv := add_months(l_hv,12*l_sched_rec.num_parts_to_create);
     l_new_part_name := 'Y'||to_char(l_new_hv,'YY');
     if to_char(sysdate,'mm/dd') = '01/01' then
      l_exec_flag := 'YES';
     end if;
    else
     raise invalid_partition_unit; 
   end case; --}
   l_new_hvc := to_char(l_new_hv,'mm/dd/yyyy');
   l_sql := 'alter table '||
    l_sched_rec.owner||'.'||
    l_sched_rec.table_name||
    ' split partition pmax '||
    ' at (to_date('''||l_new_hvc||
    ''',''mm/dd/yyyy''))'||
    ' into ( partition '||l_new_part_name||
    ' tablespace '||l_ts||
    ' initrans 11 storage (freelists 11 freelist groups 7)'||
    ' , partition pmax) update global indexes';
  else --}{
   -- non PMAX code. Add the partition
   l_hvc := l_old_hvc;
   execute immediate 'select '||l_hvc||' from dual' into l_hv;
   l_exec_flag := 'NO';
   case l_sched_rec.partition_unit  --{
    when 'DAILY' then
     l_new_hv := l_hv + l_sched_rec.num_parts_to_create;
     l_new_part_name := 'Y'||to_char(l_new_hv,'YY')||'M'||to_char(l_new_hv,'MM')||'D'||to_char(l_new_hv,'DD');
     l_exec_flag := 'YES';
    when 'WEEKLY' then
     l_new_hv := l_hv + (7*l_sched_rec.num_parts_to_create);
     l_new_part_name := 'Y'||to_char(l_new_hv,'YY')||'W'||to_char(l_new_hv,'WW');
     if (to_char(sysdate,'d') = '1') then
      l_exec_flag := 'YES';
     end if;
    when 'MONTHLY' then
     l_new_hv := add_months(l_hv,l_sched_rec.num_parts_to_create);
     if (to_char(sysdate,'dd') = '1') then
      l_exec_flag := 'YES';
     end if;
     l_new_part_name := 'Y'||to_char(l_new_hv,'YY')||'M'||to_char(l_new_hv,'MM');
    when 'QUARTERLY' then
     l_new_hv := add_months(l_hv,3*l_sched_rec.num_parts_to_create);
     l_new_part_name := 'Y'||to_char(l_new_hv,'YY')||'Q'||to_char(l_new_hv,'Q');
     if (to_char(sysdate,'mm/dd') in ('01/01','04/01','07/01','10/01')) then
      l_exec_flag := 'YES';
     end if;
    when 'ANNUALLY' then
     l_new_hv := add_months(l_hv,12*l_sched_rec.num_parts_to_create);
     l_new_part_name := 'Y'||to_char(l_new_hv,'YY');
     if to_char(sysdate,'mm/dd') = '01/01' then
      l_exec_flag := 'YES';
     end if;
    else
     raise invalid_partition_unit; 
   end case; --}
   l_new_hvc := to_char(l_new_hv,'mm/dd/yyyy');
   l_sql := 'alter table '||
    l_sched_rec.owner||'.'||
    l_sched_rec.table_name||
    ' add partition '||
    l_new_part_name||
    ' values less than '||
    ' (to_date('''||l_new_hvc||
    ''',''mm/dd/yyyy''))'||
    ' tablespace '||l_ts||
    ' initrans 11 storage (freelists 11 freelist groups 7)';
  end if; --}
  l_out :=
   rpad(l_sched_rec.owner,10)||
   rpad(l_sched_rec.table_name,30)||
   rpad(l_sched_rec.partition_unit,10)||
   rpad(to_char(l_hv,'mm/dd/yyyy'),11)||
   rpad(l_new_hvc,11)||
   rpad(l_exec_flag,4)
  ;
  dbms_output.put_line(l_out);
  if (l_exec_flag = 'YES') then --{
   dbms_output.put_line('l_new_part_name='||l_new_part_name);
   dbms_output.put_line('l_new_hvc='||l_new_hvc);
   l_mail_body := 'Statement Executed:'||l_sql;
         dbms_output.put_line('l_mail_body='||l_mail_body);
   begin
    execute immediate l_sql;
             l_mail_body := l_mail_body ||'...OK';
   exception
    when OTHERS then
     l_errm := SQLERRM;
     l_mail_body := l_mail_body||' ERROR='||l_errm;
     raise;
   end;
   -- table partition split. Now index.
   for ind_cur in (
    select index_owner, index_name, partition_name
    from dba_ind_partitions
    where (index_owner, index_name) in (
     select owner, index_name
     from dba_indexes
     where table_name = l_sched_rec.table_name
     and table_owner = l_sched_rec.owner
    )
   and status = 'UNUSABLE'
   ) loop --{
    dbms_output.put_line('Index Part='||ind_cur.index_owner||'.'||ind_cur.index_name||'.'||ind_cur.partition_name);
    l_sql := 'alter index '||
     ind_cur.index_owner||'.'||
     ind_cur.index_name||' rebuild partition '||
     ind_cur.partition_name||' nologging online';
    l_mail_body := l_mail_body||chr(12)||
     'Statement Executed:'||l_sql;
             dbms_output.put_line('l_mail_body='||l_mail_body);
    begin
     execute immediate l_sql;
        l_mail_body := l_mail_body||'...OK';
    exception
     when OTHERS then
         l_errm := SQLERRM;
         l_mail_body := l_mail_body||' ERROR='||l_errm;
      raise;
    end;
   end loop; --}
   -- index partitions made usable
  end if; --}
 end loop; --}
 close sched_cur;
 dbms_output.put_line(l_full);
 utl_mail.send(
  'oracle@prodb1',l_recipients,
  null, null,
  'Succeeded: PROPRD1 Partition Create',
  l_mail_body
 );
exception
 when OTHERS then
  l_errm := SQLERRM;
  utl_mail.send(
   'oracle@prodb1',l_recipients,
   null, null,
   'Failed: PROPRD1 Partition Create',
   l_mail_body
  );
  raise_application_error(-20001,l_errm);
end;
/
show error

The code is self explanatory. Let's see some example outputs:

SQL> @ex
Owner     Table                         Part Unit Old HV     New HV     Exec
--------- ----------------------------- --------- ---------- ---------- ---
ARUP      PTEST1                        MONTHLY   02/01/2010 04/01/2010 NO
ARUP      PTEST2                        MONTHLY   02/01/2010 04/01/2010 NO
ARUP      PTEST3                        QUARTERLY 04/01/2010 10/01/2010 NO
ARUP      PTEST4                        DAILY     01/12/2010 01/14/2010 YES
l_new_part_name=Y10M01D14
l_new_hvc=01/14/2010
l_mail_body=Statement Executed:alter table ARUP.PTEST4 split partition pmax  at
(to_date('01/14/2010','mm/dd/yyyy')) into ( partition Y10M01D14 tablespace
MGMT_TABLESPACE initrans 11 storage (freelists 11 freelist groups 7) , partition
pmax) update global indexes
ARUP      PTEST5                        DAILY     01/04/2010 01/06/2010 YES
l_new_part_name=Y10M01D06
l_new_hvc=01/06/2010
l_mail_body=Statement Executed:alter table ARUP.PTEST5 add partition Y10M01D06
values less than  (to_date('01/06/2010','mm/dd/yyyy')) tablespace
MGMT_TABLESPACE initrans 11 storage (freelists 11 freelist groups 7)
ARUP      PTEST6                        DAILY     01/02/2010 01/04/2010 YES
l_new_part_name=Y10M01D04
l_new_hvc=01/04/2010
l_mail_body=Statement Executed:alter table ARUP.PTEST6 add partition Y10M01D04
values less than  (to_date('01/04/2010','mm/dd/yyyy')) tablespace
MGMT_TABLESPACE initrans 11 storage (freelists 11 freelist groups 7)
ARUP      PTEST7                        DAILY     01/02/2010 01/04/2010 YES
l_new_part_name=Y10M01D04
l_new_hvc=01/04/2010
l_mail_body=Statement Executed:alter table ARUP.PTEST7 add partition Y10M01D04
values less than  (to_date('01/04/2010','mm/dd/yyyy')) tablespace
MGMT_TABLESPACE initrans 11 storage (freelists 11 freelist groups 7)

The output is sent to the mail address recorded in the stored procedure - in case of both success and failure. Now all you have to do is to put it in a Scheduler Job and let it run every day. Populate the table PARTITION_CREATE_SCHEDULE as needed and those tables will be automatically added partitions.

Hope you enjoy and find it useful. As always, I will appreciate your feedback.

Tuesday, November 09, 2010

Tool to Present a Consolidated View of the Status of RMAN Backups

A very important area of database administration is backups which are essential to the availability of the databases. If you use RMAN to backup databases (as you should be doing), how will you ensure that all the database backups have successfully completed? One way of making sure that occurs is by going into each server and checking the backup log – a task not just tedious but often impractical considering the large number of databases we have. Is there an easy way – via a consolidated output of all databases?

The Tool

Yes, there is an easy way – by checking the catalog, assuming of course that you are using a catalog, as you should be. I developed this tool to extract the information from the catalog for each database. Since the repository may be different for each database being backed up, I used a view to union all the RC_* views from all the repositories on a common schema which I named RMAN_COMMON. Then I wrote a PL/SQL program to pull the report presenting the information on all databases together. Since my objective was to have a consolidated view of all the backups, whether they succeeded or not, I selected from multiple views to provide an indented output in case of failures to identify specifically which component failed. I will illustrate with an output shown below. The tool can be customized for number of days it reports. Here I have chosen for the last 2 days.

SQL> conn rman_common/rman_common@catalogdb
Connected.
SQL> @status

********  RMAN Database Backup Report *********

DB Name  Start Time         End Time           Status   Time Tak Output S Type
-------- ------------------ ------------------ -------- -------- -------- -------
DEMO1    2010-11-08 00:25:1 2010-11-08 00:50:4 COMPLETE 00:25:27    17.99 DISK
DEMO1    2010-11-09 00:25:1 2010-11-09 00:50:3 COMPLETE 00:25:16    17.99 DISK
ECR      2010-11-07 23:12:0 2010-11-07 23:14:2 COMPLETE 00:02:13     5.55 DISK
ECR      2010-11-08 23:12:0 2010-11-08 23:14:3 COMPLETE 00:02:21     5.58 DISK
EMREP    2010-11-08 01:00:2 2010-11-08 01:02:5 COMPLETE 00:02:35   326.07 DISK
EMREP    2010-11-09 01:00:2 2010-11-09 01:02:5 COMPLETE 00:02:28   353.71 DISK
PROPRD   2010-11-07 23:00:1 2010-11-08 00:04:0 COMPLETE 01:03:49    50.48 DISK
PROPRD   2010-11-08 00:04:2 2010-11-08 11:47:1 COMPLETE 11:42:42    27.59
PROPRD   2010-11-08 10:35:3 2010-11-08 11:20:4 COMPLETE 00:45:12    30.00 DISK
PROPRD   2010-11-08 11:28:5 2010-11-08 12:21:2 COMPLETE 00:52:33    30.00 DISK
PROPRD   2010-11-08 12:23:5 2010-11-08 12:38:5 COMPLETE 00:15:00    10.00 DISK
PROPRD   2010-11-08 12:43:3 2010-11-08 12:43:4 COMPLETE 00:00:07   192.00 DISK
PROPRD   2010-11-08 12:46:1 2010-11-08 12:46:2 COMPLETE 00:00:07   224.00 DISK
PROPRD   2010-11-08 12:48:1 2010-11-08 13:14:0 COMPLETE 00:25:50    20.00 DISK
PROPRD   2010-11-08 13:37:3 2010-11-08 13:58:4 COMPLETE 00:21:11    15.00 DISK
PROPRD   2010-11-08 14:00:2 2010-11-08 14:13:5 COMPLETE 00:13:30    10.00 DISK
PROPRD   2010-11-08 14:29:0 2010-11-08 14:29:0 FAILED   00:00:01     0.00
. Operation            Input                Status
. -------------------- -------------------- --------------------
. .                    .                    FAILED
.  Level  Status   Operation            Object Type
.  ------ -------- -------------------- --------------------
.  >      COMPLETE RMAN                 .
.  ->     FAILED   BACKUP               DATAFILE INCR
PROPRD   2010-11-08 23:00:2 2010-11-09 00:21:0 COMPLETE 01:20:43    66.75 DISK
PROPRD   2010-11-09 00:21:2 2010-11-09 14:07:3 COMPLETE 13:46:09    40.25
LIGPRD1  2010-11-08 03:00:0 2010-11-08 03:00:4 COMPLETE 00:00:37     1.74 DISK
LIGPRD1  2010-11-08 03:00:5 2010-11-08 03:04:1 COMPLETE 00:03:12     1.49
LIGPRD1  2010-11-09 03:00:0 2010-11-09 03:00:5 COMPLETE 00:00:45     2.59 DISK
LIGPRD1  2010-11-09 03:01:0 2010-11-09 03:05:1 COMPLETE 00:04:05     1.37
LIGPRD2  2010-11-08 02:00:0 2010-11-08 02:00:3 COMPLETE 00:00:29     1.04 DISK
LIGPRD2  2010-11-08 02:00:4 2010-11-08 02:02:3 COMPLETE 00:01:52     1.34
LIGPRD2  2010-11-09 02:00:0 2010-11-09 02:00:4 COMPLETE 00:00:35     1.67 DISK
LIGPRD2  2010-11-09 02:00:5 2010-11-09 02:03:2 COMPLETE 00:02:26     1.38
SW1      2010-11-08 00:05:0 2010-11-08 00:06:1 COMPLETE 00:01:06   519.17 DISK
SW1      2010-11-09 00:05:0 2010-11-09 00:08:0 COMPLETE 00:03:04     2.01 DISK

PL/SQL procedure successfully completed.

Here you can see the databases in the catalog – PROPRD, LIGPRD1, LIGPRD2 and SW1. The columns – “Start Time”, “End Time” and “Time Taken” – are self-explanatory. The “Output Size” shows the size of the backupset produced. The “Status” column shows the status of the job – the key to this report. If it shows “COMPLETE”, then all was well in the job. If it shows “FAILED” then lines below show what actually failed. For instance you can see on 8^th Nov, incremental backup of one datafile of PROPRD failed. That one definitely needs investigating. You got all that important information in just one report. As you add all the databases into the same catalog, your reports will be more complete and expansive.

Construction

Now that you saw the result of the tool, let’s see the code behind it. First I created a user – RMAN_COMMON:

SQL> grant create session, unlimited tablespace, create view to rman_common identified by rman_common;

We need just three views from the repositories; so this user needs to be granted select privileges on those only. As SYS user, grant these:

grant select on rman_PROPRD.rc_rman_backup_job_details to rman_common;
grant select on rman_LIGPRD11.rc_rman_backup_job_details to rman_common;
grant select on rman_LIGPRD21.rc_rman_backup_job_details to rman_common;
grant select on rman_11g.rc_rman_backup_job_details to rman_common;
--
grant select on rman_PROPRD.rc_rman_backup_subjob_details to rman_common;
grant select on rman_LIGPRD11.rc_rman_backup_subjob_details to rman_common;
grant select on rman_LIGPRD21.rc_rman_backup_subjob_details to rman_common;
grant select on rman_11g.rc_rman_backup_subjob_details to rman_common;
--
grant select on rman_PROPRD.rc_rman_status to rman_common;
grant select on rman_LIGPRD11.rc_rman_status to rman_common;
grant select on rman_LIGPRD21.rc_rman_status to rman_common;
grant select on rman_11g.rc_rman_status to rman_common;

Of course, you will need to customize the above script to add more repositories as you add them.

Next, you will create the views by selecting from the corresponding views from individual repositories. Instead of creating a view with all the columns, I chose only a few columns. This will help us in combining the views from 10g and 11g where the columns could be different.

conn rman_common/rman_common
--
create view rc_rman_backup_job_details
as
select
        db_name,
        start_time,
        end_time,
        status,
        time_taken_display,
        output_bytes_display,
        output_device_type,
        session_key,
        session_recid,
        session_stamp
from rman_PROPRD.rc_rman_backup_job_details
union all
select
        db_name,
        start_time,
        end_time,
        status,
        time_taken_display,
        output_bytes_display,
        output_device_type,
        session_key,
        session_recid,
        session_stamp
from rman_LIGPRD11.rc_rman_backup_job_details
union all
select
        db_name,
        start_time,
        end_time,
        status,
        time_taken_display,
        output_bytes_display,
        output_device_type,
        session_key,
        session_recid,
        session_stamp
from rman_LIGPRD21.rc_rman_backup_job_details
union all
select
        db_name,
        start_time,
        end_time,
        status,
        time_taken_display,
        output_bytes_display,
        output_device_type,
        session_key,
        session_recid,
        session_stamp
from rman_11g.rc_rman_backup_job_details
/
create view rc_rman_backup_subjob_details
as
select
        operation,
        input_type,
        status,
        session_stamp
from rman_PROPRD.rc_rman_backup_subjob_details
union all
select
        operation,
        input_type,
        status,
        session_stamp
from rman_LIGPRD11.rc_rman_backup_subjob_details
union all
select
        operation,
        input_type,
        status,
        input_type,
        status,
        session_stamp
from rman_LIGPRD21.rc_rman_backup_subjob_details
union all
select
        operation,
        input_type,
        status,
        session_stamp
from rman_11g.rc_rman_backup_subjob_details
/
create view rc_rman_status
as
select
        row_level,
        operation,
        object_type,
        status,
        session_key,
        session_recid
from rman_PROPRD.rc_rman_status
union all
select
        row_level,
        operation,
        object_type,
        status,
        session_key,
        session_recid
from rman_LIGPRD11.rc_rman_status
union all
select
        row_level,
        operation,
        object_type,
        status,
        session_key,
        session_recid
        session_key,
        session_recid
from rman_LIGPRD21.rc_rman_status
union all
select
        row_level,
        operation,
        object_type,
        status,
        session_key,
        session_recid
from rman_11g.rc_rman_status
/

Now that the views are in place, you can create the actual tool. Here is the PL/SQL code:

/*---------------------------------------------------------------------------
Name            : RMAN Job Summary Report
Version         : 2.0
Released        : Nov 9th, 2010
Author          : Arup Nanda.
Description     : Creates a report of all backup jobs in a specified number
                  of days. If anything failed, it also shows the sub-job details.
                  Tested on 11g; but will work on 10gR2 as well.
                  Connect to RMAN repository and execute.
                  For number of days, change the l_days constant.
---------------------------------------------------------------------------*/
prompt
prompt ********  RMAN Database Backup Report *********
prompt
set serveroutput on size unlimited
declare
        l_days                  constant number := 2;
        l_place                 varchar2(400);
        l_db_name               rc_rman_backup_job_details.db_name%type;
        l_start_time            varchar2(40);
        l_end_time              varchar2(40);
        l_status                rc_rman_backup_job_details.status%type;
        l_time_taken_display    rc_rman_backup_job_details.time_taken_display%type;
        l_output_bytes_display  rc_rman_backup_job_details.output_bytes_display%type;
        l_output_device_type    rc_rman_backup_job_details.output_device_type%type;
        l_session_key           rc_rman_backup_job_details.session_key%type;
        l_session_recid         rc_rman_backup_job_details.session_recid%type;
        l_session_stamp         rc_rman_backup_job_details.session_stamp%type;

        l_operation             rc_rman_backup_subjob_details.operation%type;
        l_input_type            rc_rman_backup_subjob_details.input_type%type;

        l_command_level         varchar2(9);
        l_object_type           rc_rman_status.object_type%type;
        cursor bjd_cur
        (
                p_days in number
        )
        is
        select
                bjd.db_name,
                to_char(bjd.start_time, 'yyyy-mm-dd hh24:mi:ss'),
                to_char(bjd.end_time, 'yyyy-mm-dd hh24:mi:ss'),
                bjd.status,
                bjd.time_taken_display,
                bjd.output_bytes_display,
                bjd.output_device_type,
                bjd.session_key,
                bjd.session_recid,
                bjd.session_stamp
        from rc_rman_backup_job_details bjd
        where end_time > sysdate - p_days
        order by  bjd.db_name, bjd.start_time;

begin
        l_place := 'Place 100';
        dbms_output.put_line(
                rpad('DB Name',8)
                ||' '||
                rpad('Start Time',18)
                ||' '||
                rpad('End Time',18)
                ||' '||
                rpad('Status',8)
                ||' '||
                rpad('Time Taken',8)
                ||' '||
                rpad('Output Size',8)
                ||' '||
                rpad('Type',8)
        );
        dbms_output.put_line(
                rpad('-',8,'-')
                ||' '||
                rpad('-',18,'-')
                ||' '||
                rpad('-',18,'-')
                ||' '||
                rpad('-',8,'-')
                ||' '||
                rpad('-',8,'-')
                ||' '||
                rpad('-',8,'-')
                ||' '||
                rpad('-',8,'-')
        );
        open bjd_cur (l_days);
        loop
                fetch bjd_cur
                into
                        l_db_name,
                        l_start_time,
                        l_end_time,
                        l_status,
                        l_time_taken_display,
                        l_output_bytes_display,
                        l_output_device_type,
                        l_session_key,
                        l_session_recid,
                        l_session_stamp
                ;
                exit when bjd_cur%notfound;
                dbms_output.put_line(
                        rpad(l_db_name ,8)
                        ||' '||
                        rpad(l_start_time ,18)
                        ||' '||
                        rpad(l_end_time ,18)
                        ||' '||
                        rpad(l_status ,8)
                        ||' '||
                        rpad(l_time_taken_display ,8)
                        ||' '||
                        rpad(l_output_bytes_display ,8)
                        ||' '||
                        rpad(l_output_device_type,8)
                );
                --
                --
                l_place := 'Place 300';
                if (l_status != 'COMPLETED') then
                        for bsjd_rec in (
                        select
                                operation,
                                input_type,
                                status
                        from rc_rman_backup_subjob_details
                        where session_stamp = l_session_stamp
                        ) loop
                        l_place := 'Place 400';
                        dbms_output.put_line(
                                '.'
                                ||' '||
                                rpad('Operation',20)
                                ||' '||
                                rpad('Input',20)
                                ||' '||
                                rpad('Status',20)
                        );
                        dbms_output.put_line(
                                '.'
                                ||' '||
                                rpad('-',20,'-')
                                ||' '||
                                rpad('-',20,'-')
                                ||' '||
                                rpad('-',20,'-')
                        );
                        dbms_output.put_line(
                                '.'
                                ||' '||
                                rpad(nvl(l_operation,'.') ,20)
                                ||' '||
                                rpad(nvl(l_input_type,'.') ,20)
                                ||' '||
                                rpad(nvl(l_status,'.') ,20)
                        );
                        end loop;
                        --
                        l_place := 'Place 500';
                        dbms_output.put_line(
                                '.  '||
                                rpad('Level' ,6)
                                ||' '||
                                rpad('Status' ,8)
                                ||' '||
                                rpad('Operation' ,20)
                                ||' '||
                                rpad('Object Type' ,20)
                        );
                        dbms_output.put_line(
                                '.  '||
                                rpad('-' ,6,'-')
                                ||' '||
                                rpad('-' ,8,'-')
                                ||' '||
                                rpad('-' ,20,'-')
                                ||' '||
                                rpad('-' ,20,'-')
                        );
                        for status_rec in (
                                select
                                        rpad('-', row_level, '-')||'>' command_level,
                                        operation,
                                        object_type,
                                        status
                                from rc_rman_status
                                where session_key = l_session_key
                                order by row_level, session_recid
                        ) loop
                                l_place := 'Place 600';
                                dbms_output.put_line(
                                        '.  '||
                                        rpad(nvl(status_rec.command_level,'.') ,6)
                                        ||' '||
                                        rpad(nvl(status_rec.status,'.') ,8)
                                        ||' '||
                                        rpad(nvl(status_rec.operation,'.') ,20)
                                        ||' '||
                                        rpad(nvl(status_rec.object_type,'.') ,20)
                                );
                        end loop;
                end if;
                end loop;
exception
        when OTHERS then
                dbms_output.put_line(l_place);
                raise;
end;
/

This is the PL/SQL code in the file status.sql. You can adjust the value of the constant l_days to generate the report for as many days as you want. When you run the script it shows the status of the RMAN jobs in last 2 days.

Hope you find this useful. As always, any feedback will be highly appreciated.

Sunday, November 07, 2010

Automatic Range Partition Dropping Tool

You have probably used paritions on your tables, especially period based range partitions on date columns. In some applications older partitions become useless after a specified period of time and you have to drop them. But you have to manually drop them; there is no automatic drop utility. Oracle offers interval partitioning for automatic creation of range partitions; but there is no counterpart in dropping operation. It could be chore, especially when the partitions are defined based on different time intervals on different tables.

I have developed a tool to make this process automatic. The concept of the tool is simple - I use a driver table that stores the preferences such as the partitioning time intervals of individual tables I am interested in and how often then should be purged. The actual tool is a stored procedure that reads this driver table, determines if the oldest partition satisfies the condition of purge and drops it if is it the case. You can put this procedure in a Scheduler job and run it every day. It will compute the partitions automatically is decide on dropping the relevant partitions everyday. It will also update global indexes.

Disclaimer: this is being provided for educational purpose only; use at your own risk. The author assumes absolutely no responsibility for any issues caused by the use of this tool.

First, I will need the driver table:

/* ---------------------------------------------------------------------
Name : partition_drop_schedule
Purpose : To store the retention period of partitions of 
        : indivisual tables. This data is used by partition_drop
        : procedure to automatically drop partitions.
----------------------------------------------------------------------*/
create table partition_drop_schedule
(

OWNER VARCHAR2(30),
   TABLE_NAME VARCHAR2(30),
   PART_SCHEME VARCHAR2(10),
   RETENTION_DURATION NUMBER(5),
   RETENTION_UNIT VARCHAR2(10)
) 
/
comment on table partition_drop_schedule is 'To store the retention period of partitions of individual tables. This data is used by partition_drop procedure to automatically drop partitions.'

comment on column partition_drop_schedule.owner is 'the owner of the table'
/
comment on column partition_drop_schedule.table_name is 'the name of the table'
/
comment on column partition_drop_schedule.part_scheme is 'the partitioning time scheme (DAILY/WEEKLY/MONTHLY/QUARTERLY/HALFYEARLY/YEARLY) of the table'
/
comment on column partition_drop_schedule.RETENTION_DURATION is 'the duration of retention of the partition in number'
/
comment on column partition_drop_schedule.RETENTION_UNIT is 'the unit of duration of retention (DAY/WEEK/MONTH/QUARTER/HALFYEAR/YEAR of the table'
/

Here are some sample inserts into the table:

insert into partition_drop_schedule values
(
    'ARUP','TEST','MONTHLY',12,'MONTH'
)
/

commit

Next comes the stored procedure:

CREATE OR REPLACE procedure SYS.partition_drop
as
 cursor sched_cur is
  select * 
  from partition_drop_schedule
  ;
 l_sched_rec    sched_cur%rowtype;
 l_cutoff_date   date;
 l_hv     date;
 l_sql     varchar2(2000);
 l_dis_cons_sql   varchar2(2000);
 l_ena_cons_sql   varchar2(2000);
 l_drop_sql    varchar2(2000);
 errm     varchar2(2000);
 l_out     varchar2(2000);
 l_full     varchar2(32000);
 l_recipients   varchar2(2000) :=
  'dba.ops@myowncompany.com';
 errc     number(10);
 l_place     varchar2(2);
 l_cons_disabled   boolean := FALSE;
 type l_varchar2_30_ty is table of varchar2(30);
 l_child_owner    l_varchar2_30_ty;
 l_child_table_name  l_varchar2_30_ty;
 l_constraint_name  l_varchar2_30_ty;
 CURSOR cons_cur (
  p_owner in varchar2,
  p_table_name in varchar2
 ) is
  select owner, table_name, constraint_name
  from dba_constraints
  where constraint_type = 'R'
  and status = 'ENABLED'
  and r_constraint_name = (
   select constraint_name
   from dba_constraints
   where owner = p_owner
   and table_name = p_table_name
   and constraint_type = 'P'
  );
begin
 -- if the email server is not set already, set it here
      execute immediate 'alter session set smtp_out_server = ''smtp.myowncompany.com''';
 l_out :=
  rpad('Owner',10)||
  rpad('Table',30)||
  rpad('Retention',10)||
  rpad('Cutoff Dt',10)
  ;
 dbms_output.put_line(l_out);
 l_full := l_out;
 l_out :=
  rpad('-',9,'-')||' '||
  rpad('-',29,'-')||' '||
  rpad('-',9,'-')||' '||
  rpad('-',9,'-')
  ;
 l_full := l_full||chr(12)||l_out;
 dbms_output.put_line(l_out);
 open sched_cur;
 loop
  fetch sched_cur into l_sched_rec;
  exit when sched_cur%notfound;
  l_out := rpad(l_sched_rec.owner,10);
  l_out := l_out||rpad(l_sched_rec.table_name,30);
  l_cutoff_date :=
   case l_sched_rec.PART_SCHEME
    when 'MONTHLY' then
     case l_sched_rec.retention_unit
      when 'MONTH' then
       add_months(sysdate,-1*l_sched_rec.retention_duration)
      else
       null
      end
    when 'DAILY' then
     case l_sched_rec.retention_unit
      when 'DAY' then
       sysdate-l_sched_rec.retention_duration
      else
       null
      end
    when 'HALFYEARLY' then
     case l_sched_rec.retention_unit
      when 'HALFYEAR' then
       add_months(sysdate,-6*l_sched_rec.retention_duration)
      when 'MONTH' then
       add_months(sysdate,-1*l_sched_rec.retention_duration)
      else
       null
      end
    when 'YEARLY' then
     null
    when 'WEEKLY' then
     null
    when 'QUARTERLY' then
     case l_sched_rec.retention_unit
      when 'QUARTER' then
       add_months(sysdate,-3*l_sched_rec.retention_duration)
      when 'MONTH' then
       add_months(sysdate,-1*l_sched_rec.retention_duration)
      else
       null
      end
   end;
  l_out := l_out||rpad(l_sched_rec.retention_duration||' '||l_sched_rec.retention_unit,10);
  l_out := l_out||rpad(l_cutoff_date,10);
  dbms_output.put_line(l_out);
  l_full := l_full||chr(12)||l_out;
  for part_cur in
  (
   select partition_name, high_value
   from dba_tab_partitions
   where table_owner = l_sched_rec.owner
   and table_name = l_sched_rec.table_name
   order by partition_position
  ) loop
   -- dbms_output.put_line('l_sched_rec.owner='||l_sched_rec.owner);
   -- dbms_output.put_line('l_sched_rec.table_name='||l_sched_rec.table_name);
   if part_cur.high_value != 'MAXVALUE' then
    execute immediate 'select '||part_cur.high_value||
     ' from dual' into l_hv;
    if l_cutoff_date > l_hv then
     l_out := 'Partition '||
      part_cur.partition_name||
      ' with high value '||
      l_hv||
      ' to be dropped ...';
     dbms_output.put_line(l_out);
     l_full := l_full||chr(12)||l_out;
     l_drop_sql := 'alter table '||l_sched_rec.owner||'.'||
     l_sched_rec.table_name||
     ' drop partition '||part_cur.partition_name||
     ' update global indexes';
     dbms_output.put_line('l_drop_sql='||l_drop_sql);
     begin
      --
      -- Disable the FKs 
      --
      l_cons_disabled := FALSE;
      open cons_cur (l_sched_rec.owner, l_sched_rec.table_name);
      fetch cons_cur
      bulk collect
      into l_child_owner, l_child_table_name, l_constraint_name;
      close cons_cur;
      if nvl(l_child_owner.COUNT,0) > 0 then
       l_cons_disabled := TRUE;
       for ctr in l_child_owner.FIRST .. l_child_owner.LAST loop
        dbms_output.put_line('l_child_owner='||l_child_owner(ctr));
        dbms_output.put_line('l_child_table_name='||l_child_table_name(ctr));
        dbms_output.put_line('l_constraint_name='||l_constraint_name(ctr));
        l_dis_cons_sql := 'alter table '||l_child_owner(ctr)||'.'||
         l_child_table_name(ctr)||' disable constraint '||
         l_constraint_name(ctr);
        dbms_output.put_line('l_dis_cons_sql='||l_dis_cons_sql);
        l_sql := l_dis_cons_sql;
        execute immediate l_sql;
       end loop;
      end if;
      l_sql := l_drop_sql;
      execute immediate l_sql;
      --
      -- Enable the FKs
      --
      if (l_cons_disabled) then
       for ctr in l_child_owner.FIRST .. l_child_owner.LAST loop
        l_dis_cons_sql := 'alter table '||l_child_owner(ctr)||'.'||
         l_child_table_name(ctr)||' enable novalidate constraint '||
         l_constraint_name(ctr);
        dbms_output.put_line('l_dis_cons_sql='||l_dis_cons_sql);
        l_sql := l_dis_cons_sql;
        execute immediate l_sql;
       end loop;
      end if;
      l_out := l_out||'DONE';
      dbms_output.put_line(l_sql);
      dbms_output.put_line(l_out);
      l_full := l_full||'DONE';
      l_full := l_full||chr(12)||l_sql;
     exception
      when others then
       errm := SQLERRM;
       errc := SQLCODE;
       l_out := l_out||'FAILED';
       dbms_output.put_line(l_sql);
       dbms_output.put_line('Place:'||l_place);
       l_full := l_full||'FAILED';
       l_full := l_full||chr(12)||chr(12)||l_sql;
     end;
    end if;
   end if;
  end loop;
 end loop;
 close sched_cur;
 dbms_output.put_line(l_full);
 utl_mail.send(
  'oracle@prodb1',l_recipients,
  null, null,
  'Succeeded: PROPRD Partition Drop',
  l_full
 );
exception
 when OTHERS then
  errm := SQLERRM;
  utl_mail.send(
   'oracle@prodb1',l_recipients,
   null, null,
   'Failed: PROPRD Partition Drop',
   l_full
  );
  raise_application_error(-20001,errm);
end;
/
show error

Now that the procedure is complete, you should do a quick test to make sure the procedure is working as expected. Update the record in the table PARTITION_DROP_SCHEDULE to reflect 1 month retention and execute this procedure. It should drop the partition and maintain the global indexes. You can confirm the actions by:

(1) checking the DBA_TAB_PARTITIONS to see that the partition is indeed gone
(2) checking the DBA_INDEXES to see that the global index is indeed in USABLE state

If all looks well, you should just put it in a DBMS_SCHEDULER job and set it to run every day at a time that will be considered off peak, e.g. 3 AM.

Important Considerations

This is not intended for mass partition drops. For instance if you are planning to drop 100 partitions of a table, do not use it. This tool will update global indexes for each partition - a process highly inefficient for large numbers of partitions. You are much better off making the global indexes unusable, dropping all the partitions and then rebuilding global indexes. That will be much faster.
At the time of the drop, the session will get an exclusive lock on the table for a fraction of a second. So you will see a slight performance degradation for that instant only.
Never use this tool without adequately testing. Partition drop is a DDL operation - an irreversible process.

As always, I would appreciate comments and suggestions from you on this and any other blog entries. Thanks for reading and happy tooling.

Friday, October 22, 2010

Keynote Presentation at NOUG Oct 2010

Thank you all who came to watch my keynote presentation - "Real World DBA Best Practices" - for Northeast Oracle User Group in Boston, 21st October, 2010.

You can download the presentation here. Before downloading, please let me reiterate what I mentioned during the meeting - a best practice is not one if you do not understand the reason behind it and do not understand the applicability to your specific situation. I was not selling a product or service nor was I asking you to blindly follow it. All I wanted from you was to consider the points.

Monday, October 04, 2010

The Business of Religion

Most of you who read my blog will probably be shocked to see a topic like this. I work with Oracle technologies and my blogs have been predominantly about that (well, actually always about that). But I am also a human being living in societies and there are other thoughts that cross my mind. So here is something very, very different. This is not about Oracle, or IT, or even working in general. Please stop reading if you are not interested.

One of the activities I perform outside of my work is mentoring. These are not just rookies trying to go up in their careers; but accomplished professional (some more accomplished than me) trying to get some sense from a different perspective and I am willing to help. One such mentee (I will not name him to protect his privacy) mentioned something about religion that got me thinking. It was about a debate on religion and the threats on Americans travelling internationally and some other assorted topics. As a rule, I do not engage in debates about religion. Don't get me wrong, I love to debate passionately on topics I find interesting or important; but with some exceptions - religion being one of them.

It's rather sad to see how some reasonably intelligent or outwardly sound people justify the killing of innocent people as a manifestation of their religious obligations. One could argue that that act by itself is not religious or even spiritual; it's fanatical bordering on lunacy. But from the perspective of the perpetrators, that's what it is - sanctioned religious beliefs. Religion is a very dangerous double edged sword. On one hand it brings discipline, moral values and, for the lack of a better term - "humanity" to humans. Consider this - why would you not steal from thy neighbor? It makes your act profitable and your actions immensely efficient (a lot of gain from a small effort); but you won't; because it's against established morality. Belief system - whether in the form of religion or otherwise, pours the very foundation of morality - a fact most of us probably do not realize. It differentiates us from other animals.

But the root of problem is that there is no boundary. Where should the belief system stop? Religion is after all a set of rituals and behavioral standards formed by human beings (although in some cases presumed or claimed vetted by supreme beings). Its specific actions are defined by the leaders, like any other group, but with an important difference. Belief systems adapt to situations - making the process forming the set of acceptable activities of religion highly fluid and devoid of specific direction. Some adapt it to their interpretation, in so much so, that they actually believe in their interpretation. Contrary to popular belief, these people may not be illiterate goat herding nomads; they may be at the top echelons of established societies, with a firm belief in their own direction of the religious activities.

And that makes the business of religion highly dangerous. While on one hand we cherish the extremely positive role religion plays in forming societies, we also witness the catastrophic consequence of the abuse of religion. That's why I avoid debating about religion in any form - there is a fine line between being spiritual and fanatical - and with each side deciding where the line should be. The debate has at least one attribute - logic and reason take a backseat to emotion and sentiment.

The Arup Nanda Blog

Pages