The Arup Nanda Blog

Tuesday, December 29, 2009

Instance_Number is busy Message during Standby Instance Startup

Recently I encountered a situation that demonstrates how, in an Oracle database, an error may stem from a very unrelated cause. A DBA was building a physical standby database for an upcominmg training. The two servers he was using were part of a RAC cluster; so the Oracle binaries were already there. He decided to use the same ORACLE_HOME for the new databases as well - a quite logical decision to savbe on space and administration issues. He created a primary database on server n1 and a standby database on the server n2. Follooing the usual manual procedure in building the standby database, he copied the pfile from primary database, modified the parameters and brought up the standby instance in nomount mode on the server n2.

SQL> startup nomount pfile=initSTBY.ora

But it refused to come up, with the following error:

ORA-00304: requested INSTANCE_NUMBER is busy

Alert log showed:

USER (ospid: 14210): terminating the instance due to error 304
Instance terminated by USER, pid = 14210

This was highly unusual. The primary and standby both were non-RAC; there was no instance number concept in a non-RAC database. By the way, the RAC instance on that server (or on the server n1) was not running; so there was no question of any conflict with the RAC instances either. The primary database was called PRIM while the standby was called STBY - eliminating the possibility of an instance name clash as well. And this error came while merely trying to start the instance, not even while mounting - eliminating the standby controlfile as a cause.

The error 304 showed:

00304, 00000, "requested INSTANCE_NUMBER is busy"
// *Cause: An instance tried to start by using a value of the
// initialization parameter INSTANCE_NUMBER that is already in use.
// *Action: Either
// a) specify another INSTANCE_NUMBER,
// b) shut down the running instance with this number
// c) wait for instance recovery to complete on the instance with
// this number.

Needless to say, being for a non-RAC database there was no "instance_number" parameter in the initialization parameter file of primary or the standby. So, the suggestions for the resolution seemed odd. MetaLink provided no help. All the ORA-304 errors were related to RAC with the instance_number mismatch.

As it always happens, it fell on my lap at this time. With just days to go live, I had to find a solution quickly. Long hours of troubleshooting, tracing the processes and examination of occasional trace files did not yield any clue. All the clue seemed to point to RAC, which this database was not. The Oracle Home was a RAC home, which meant the oracle binary was linked with the "rac" option.

So, the next logical step was to install a new Oracle Home without the rac option. After doing so, I tried to bring up the instance, using the new ORACLE_HOME and LD_LIBRARY_PATH variable; but, alas, the same error popped up.

Just to eliminate the possibility of some unknown bug, I decided to put an instance_number parameter, setting it to "1", from the default "0". The same error. I changed it to "2", again, the result was the same error.

Although this didn't help, at least it gave a clue that the error was not related to instance_number. The error message was clearly wrong. With this in mind, I went back to the basics. I went through the alert log with a fine toothed comb, scanning and analyzing each line.

The following line drew my attention:

DB_UNQIUE_NAME STBY is not in the Data Guard configuration

This was odd; the db_unique_name STBY was not defined in DG configuration. [BTW, note the spelling of "unique" in the message above. That is not what I typed; it was a copy and paste from the actual message in the alert log. Someone in Oracle Development should really pay atytention to typos in messages. This is clearly more than a nuisance; what if some process scans for db_unique_name related errors? It will not find the message at all!]

Checking the dg configuration, I found that the DBA has correctly defined the primary and standby names. In any case, Data Guard has not been started yet; this is merely at the instance startup - why is it complaining for data guard configuration at this time.

Perplexed, I resorted to a different approach. I renamed the pfile and all other relevant files. Then I built the standby myself, from scratch - using the same names - PRIM and STBY. And this time, everything worked fine. The instance STBY did come up.

While this solvbed the urgency problem, everyone, inclduing myself, wanted to know what the issue was in the earlier case where the DBA had failed to bring up the instance. To get the answer, I compared the files I created with the DBA created when tried and failed. Voila! The cause was immediately clear - the DBA forgot to put a vital parameter in the pfile of the standby instance:

db_unique_name = 'STBY'

This parameter was absent; so it took the default value as the db_name, which was "PRIM". This caused the instance to fail with a seemingly unrelated message - "ORA-304 Instance_number is busy"!

Learning Points

  1. In Oracle, most errors are obvious; but some are not. So, do not assume the error message is accurate. If all logic fails, assume the error messsage is wrong, or at least inaccurate.
  2. Instead of over-analyzing the process already followed, it may make sense to take a breather, wipe out everything and start fropm scratch. This is evben mor effective when someone else does it, offering a fresh approach and possibly not repeating the same mistakes.
  3. Finally, the issue at hand: if you do not define db_unique_name parameter in the standnby instance, you will receive ORA-304 during instance startup.

Hope this was helpful. Happy New Year, everybody.

Labels: , , , , , ,

Friday, December 18, 2009

Resolving Gaps in Data Guard Apply Using Incremental RMAN BAckup

Recently, we had a glitch on a Data Guard (physical standby database) on infrastructure. This is not a critical database; so the monitoring was relatively lax. And that being done by an outsourcer does not help it either. In any case, the laxness resulted in a failure remaining undetected for quite some time and it was eventually discovered only when the customer complained. This standby database is usually opened for read only access from time to time.This time, however, the customer saw that the data was significantly out of sync with primary and raised a red flag. Unfortunately, at this time it had become a rather political issue.

Since the DBA in charge couldn’t resolve the problem, I was called in. In this post, I will describe the issue and how it was resolved. In summary, there are two parts of the problem:

(1) What happened
(2) How to fix it

What Happened

Let’s look at the first question – what caused the standby to lag behind. First, I looked for the current SCN numbers of the primary and standby databases. On the primary:

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1447102

On the standby:

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1301571

Clearly there is a difference. But this by itself does not indicate a problem; since the standby is expected to lag behind the primary (this is an asynchronous non-real time apply setup). The real question is how much it is lagging in the terms of wall clock. To know that I used the scn_to_timestamp function to translate the SCN to a timestamp:

SQL> select scn_to_timestamp(1447102) from dual;

SCN_TO_TIMESTAMP(1447102)
-------------------------------
18-DEC-09 08.54.28.000000000 AM

I ran the same query to know the timestamp associated with the SCN of the standby database as well (note, I ran it on the primary database, though; since it will fail in the standby in a mounted mode):

SQL> select scn_to_timestamp(1301571) from dual;

SCN_TO_TIMESTAMP(1301571)
-------------------------------
15-DEC-09 07.19.27.000000000 PM

This shows that the standby is two and half days lagging! The data at this point is not just stale; it must be rotten.

The next question is why it would be lagging so far back in the past. This is a 10.2 database where FAL server should automatically resolved any gaps in archived logs. Something must have happened that caused the FAL (fetch archived log) process to fail. To get that answer, first, I checked the alert log of the standby instance. I found these lines that showed the issue clearly:


Fri Dec 18 06:12:26 2009
Waiting for all non-current ORLs to be archived...
Media Recovery Waiting for thread 1 sequence 700
Fetching gap sequence in thread 1, gap sequence 700-700

Fri Dec 18 06:13:27 2009
FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 700-700
DBID 846390698 branch 697108460
FAL[client]: All defined FAL servers have been attempted.


Going back in the alert log, I found these lines:

Tue Dec 15 17:16:15 2009
Fetching gap sequence in thread 1, gap sequence 700-700
Error 12514 received logging on to the standby
FAL[client, MRP0]: Error 12514 connecting to DEL1 for fetching gap sequence
Tue Dec 15 17:16:15 2009
Errors in file /opt/oracle/admin/DEL2/bdump/del2_mrp0_18308.trc:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Tue Dec 15 17:16:45 2009
Error 12514 received logging on to the standby
FAL[client, MRP0]: Error 12514 connecting to DEL1 for fetching gap sequence
Tue Dec 15 17:16:45 2009
Errors in file /opt/oracle/admin/DEL2/bdump/del2_mrp0_18308.trc:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

This clearly showed the issue. On December 15th at 17:16:15, the Managed Recovery Process encountered an error while receiving the log information from the primary. The error was ORA-12514 “TNS:listener does not currently know of service requested in connect descriptor”. This is usually the case when the TNS connect string is incorrectly specified. The primary is called DEL1 and there is a connect string called DEL1 in the standby server.

The connect string works well. Actually, right now there is no issue with the standby getting the archived logs; so there connect string is fine - now. The standby is receiving log information from the primary. There must have been some temporary hiccups causing that specific archived log not to travel to the standby. If that log was somehow skipped (could be an intermittent problem), then it should have been picked by the FAL process later on; but that never happened. Since the sequence# 700 was not applied, none of the logs received later – 701, 702 and so on – were applied either. This has caused the standby to lag behind since that time.

So, the fundamental question was why FAL did not fetch the archived log sequence# 700 from the primary. To get to that, I looked into the alert log of the primary instance. The following lines were of interest:


Tue Dec 15 19:19:58 2009
Thread 1 advanced to log sequence 701 (LGWR switch)
Current log# 2 seq# 701 mem# 0: /u01/oradata/DEL1/onlinelog/o1_mf_2_5bhbkg92_.log
Tue Dec 15 19:20:29 2009Errors in file /opt/oracle/product/10gR2/db1/admin/DEL1/bdump/del1_arc1_14469.trc:
ORA-00308: cannot open archived log '/u01/oraback/1_700_697108460.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
Tue Dec 15 19:20:29 2009
FAL[server, ARC1]: FAL archive failed, see trace file.
Tue Dec 15 19:20:29 2009
Errors in file /opt/oracle/product/10gR2/db1/admin/DEL1/bdump/del1_arc1_14469.trc:
ORA-16055: FAL request rejected
ARCH: FAL archive failed.
Archiver continuing
Tue Dec 15 19:20:29 2009
ORACLE Instance DEL1 - Archival Error. Archiver continuing.


These lines showed everything clearly. The issue was:

ORA-00308: cannot open archived log '/u01/oraback/1_700_697108460.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory


The archived log simply was not available. The process could not see the file and couldn’t get it across to the standby site.

Upon further investigation I found that the DBA actually removed the archived logs to make some room in the filesystem without realizing that his action has removed the most current one which was yet to be transmitted to the remote site. The mystery surrounding why the FAL did not get that log was finally cleared.

Solution

Now that I know the cause, the focus was now on the resolution. If the archived log sequence# 700 was available on the primary, I could have easily copied it over to the standby, registered the log file and let the managed recovery process pick it up. But unfortunately, the file was gone and I couldn’t just recreate the file. Until that logfile was applied, the recovery will not move forward. So, what are my options?

One option is of course to recreate the standby - possible one but not technically feasible considering the time required. The other option is to apply the incremental backup of primary from that SCN number. That’s the key – the backup must be from a specific SCN number. I have described the process since it is not very obvious. The following shows the step by step approach for resolving this problem. I have shown where the actions must be performed – [Standby] or [Primary].

1. [Standby] Stop the managed standby apply process:

SQL> alter database recover managed standby database cancel;

Database altered.

2. [Standby] Shutdown the standby database

3. [Primary] On the primary, take an incremental backup from the SCN number where the standby has been stuck:

RMAN> run {
2> allocate channel c1 type disk format '/u01/oraback/%U.rmb';
3> backup incremental from scn 1301571 database;
4> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: sid=139 devtype=DISK

Starting backup at 18-DEC-09
channel c1: starting full datafile backupset
channel c1: specifying datafile(s) in backupset
input datafile fno=00001 name=/u01/oradata/DEL1/datafile/o1_mf_system_5bhbh59c_.dbf

piece handle=/u01/oraback/06l16u1q_1_1.rmb tag=TAG20091218T083619 comment=NONE
channel c1: backup set complete, elapsed time: 00:00:06
Finished backup at 18-DEC-09
released channel: c1

4. [Primary] On the primary, create a new standby controlfile:

SQL> alter database create standby controlfile as '/u01/oraback/DEL1_standby.ctl';

Database altered.

5. [Primary] Copy these files to standby host:

oracle@oradba1 /u01/oraback# scp *.rmb *.ctl oracle@oradba2:/u01/oraback
oracle@oradba2's password:
06l16u1q_1_1.rmb 100% 43MB 10.7MB/s 00:04
DEL1_standby.ctl 100% 43MB 10.7MB/s 00:04

6. [Standby] Bring up the instance in nomount mode:

SQL> startup nomount

7. [Standby] Check the location of the controlfile:

SQL> show parameter control_files

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_files string /u01/oradata/standby_cntfile.ctl

8. [Standby] Replace the controlfile with the one you just created in primary.

9. $ cp /u01/oraback/DEL1_standby.ctl /u01/oradata/standby_cntfile.ctl

10.[Standby] Mount the standby database:

SQL> alter database mount standby database;

11.[Standby] RMAN does not know about these files yet; so you must let it know – by a process called cataloging. Catalog these files:

$ rman target=/

Recovery Manager: Release 10.2.0.4.0 - Production on Fri Dec 18 06:44:25 2009

Copyright (c) 1982, 2007, Oracle. All rights reserved.

connected to target database: DEL1 (DBID=846390698, not open)
RMAN> catalog start with '/u01/oraback';

using target database control file instead of recovery catalog
searching for all files that match the pattern /u01/oraback

List of Files Unknown to the Database
=====================================
File Name: /u01/oraback/DEL1_standby.ctl
File Name: /u01/oraback/06l16u1q_1_1.rmb

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u01/oraback/DEL1_standby.ctl
File Name: /u01/oraback/06l16u1q_1_1.rmb

12.Recover these files:

RMAN> recover database;

Starting recover at 18-DEC-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: /u01/oradata/DEL2/datafile/o1_mf_system_5lptww3f_.dbf
...
channel ORA_DISK_1: reading from backup piece /u01/oraback/05l16u03_1_1.rmb
channel ORA_DISK_1: restored backup piece 1
piece handle=/u01/oraback/05l16u03_1_1.rmb tag=TAG20091218T083619
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07

starting media recovery

archive log thread 1 sequence 8012 is already on disk as file /u01/oradata/1_8012_697108460.dbf
archive log thread 1 sequence 8013 is already on disk as file /u01/oradata/1_8013_697108460.dbf


13. After some time, the recovery fails with the message:

archive log filename=/u01/oradata/1_8008_697108460.dbf thread=1 sequence=8009
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 12/18/2009 06:53:02
RMAN-11003: failure during parse/execution of SQL statement: alter database recover logfile '/u01/oradata/1_8008_697108460.dbf'
ORA-00310: archived log contains sequence 8008; sequence 8009 required
ORA-00334: archived log: '/u01/oradata/1_8008_697108460.dbf'

This happens because we have come to the last of the archived logs. The expected archived log with sequence# 8008 has not been generated yet.

14.At this point exit RMAN and start managed recovery process:

SQL> alter database recover managed standby database disconnect from session;

Database altered.

15.Check the SCN’s in primary and standby:

[Standby] SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1447474
[Primary] SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1447478
Now they are very close to each other. The standby has now caught up.

Labels: , , , ,

Tuesday, December 08, 2009

Sepaking at New York Oracle Users Group

I had the opportunity to speak at the New York Oracle Users Group (http://nyoug.org/info/metro_area_user_group/Metro_Area_User_Group_Meeting_Sep_2009_Agenda.pdf) today at Hotel New Yorker. I spoke about reucing the risk during the 10g to 11g upgrade using various tools and techniques available to us as a DBA. Here is the slide deck, if you are iterested.

Some of the technologies I have described during the talk have been explained in detail on various articles I have written. Here they are:

Database Replay: http://www.oracle.com/technology/oramag/oracle/08-jan/o18dbasereplay.html
SQL Performance Analyzer (SPA):
http://www.oracle.com/technology/oramag/oracle/08-mar/o28sqlperf.html
SQL Plan Management: http://www.oracle.com/technology/oramag/oracle/09-mar/o29spm.html

As always, I highly appreciate folks coming to me after the talk and provide the feedback. Please keep the feedbacks coming, via emails, phone calls, telepathy, whatever. I really look forward to them.

Labels:

Wednesday, November 18, 2009

RAC Perf Tuning Seminar in Istanbul, Turkey

For those who attended my 2-day training event in Istanbul, I wish to express my sincere thanks for the participation. For me, or any speaker, the privilege of having your attention for 2 days away from your work and family, means a lot. Some folks came all the way from Ankara and had to go back to work the following day. I hope you all got to learn something worthwhile your time.

The scripts are located in: http://www.proligence.com/racperf_istanbul/scripts.zip The password and userid is the ones I gave you in the class.

I also want to reiterate my earlier request to send me your detailed and honest feedback at arup@proligence.com. You may want to write it in Turkish, if that is more convenient. The important thing is to provide the feedback; I can always translate using Google.

Thanks, Hande and Madalina from Oracle University for arranging it. Much appreciated. Now I am off to Estonia.

Labels: , , , , ,

Thursday, October 15, 2009

OOW09 Session#4 DBA 11g New Features

For all those who came to my last of my four sessions - 11g New Features for DBAs - I appreciate your taking the time. It was a pleasant surprise to see about 500 people showing up at a lunch time slot on the last day of the conference.

Here is the presentation link. I hope you enjoyed the session and found it useful.

Labels: , , , ,

OOW09 Session# 3

I just finished the third of my four presentations - SQL Plan Management. Considering it was at 9 AM on the last day of the conference, right after the big Aerosmith concert, I was expecting a lot less crowd. But, to my pleasant surprise about 150 brave souls turned up. Thank you all. I hope you found it useful.

Here is the presentation material. While you are there, feel free to browse around. And, of course, I will highly appreciate if you could send me your comments, either here or via email - whatever you are comfortable with.

Labels: , , , , ,

Wednesday, October 14, 2009

OOW09 My Session#2

Today I delivered the second of my four sessions - "Upgrade Case Study: Database Replay, Snapshot Standby and Plan Baselines".

For those you attended, I thank you very much. Here is the presentation.

Labels: , , , ,

Tuesday, October 13, 2009

OOW Day2

Why do you come to Open World? I'm sure we will get all kinds of reasons, as many as there are stars in the sky. Some predominant themes are - getting to know more about the Oracle (or related) technologies by attending sessions, reconnecting with old friends and building networking. Of course, getting freebies from the Exhibit Halls, I'm sure, can't be far behind as a motivator.

I come to OOW for all those reasons as well. But high up in my list is the visit to the Exhibit Halls. No; not for the tee-shirts that do not fit me and graphics I don't really dig. I visit the demogrounds and exhibit halls to know about the products and tools that I should be aware of. Where else would you find 1000+ companies advertising the products at one place? Sure, I can call them and find out; but ho do I find them? OOW exhibit halls are prime "hunting" grounds to look for new ideas and tools that I should be interested in; or at least be aware of. I can not only look at the tools; I can actually get some relevant technical facts in 5 minutes which might take weeks of scheduling and hours of marketing talk. And, if I decide the product is not relevant; I can always walk away. I have the privilege of walking away; they don't. If I call them to my office, "they" have that option; not me :) If I find something attractive, I can always follow up and get to know more.

Oracle demogrounds are even better. Not only I can meet Oracle PMs there; but the people who never come out to the public world - developers, development managers, architects and so on. These unsung heroes are mostly the reason why Oracle is what it is now. I meet the known faces, get to know new ones and establish new relationships. They hear from me what customers want and I learn the innards of some features I am curious about.

So, I spent almost the whole day yesterday navigating through demo grounds and exhibit halls. I could cover only a small fraction. In between I had to attend some meetings at work. Going to OOW is never "going away". I wish it was.

Labels: , , ,

Sunday, October 11, 2009

OOW09 - RAC Performance Tuning

For all those who came to my session - many, many thanks. There is no better sight for a presenter than to see a roomful of attendees, especially with people standing near the walls. The fire marshal was not amused probably; but I was grateful. The harrowing incident of a blue screen of death on my PC - not just once but twice - just before the presentation was about to start was enough to throw me into a panic mode; but the third time was a charm. It worked. Phew!

You can download the presentation here. And while you are there, look around and download some more of my sessions as well.

Thanks a lot once again. I'm off the keynote now.

Labels: , , , ,

ACE Directors Product Briefing '09

One of the most valuable benefits of being an Oracle ACE Director is the briefings by Oracle Product Managers at the Oracle HQ. This year the briefing was on Friday Oct 9th at Oracle conference center rather than the customary Hilton Hotel.

While I was a little disappointed at the coverage of the database topics, I quickly recovered from the alphabet soup that makes up the netherworld of middleware and tools. However, a surprise visit by Thomas Kurian to address questions from the audience about the various product roadmaps was testimonial that Oracle is dead serious about the ACE Program. That proves the commitment Oracle has made for the user community - very heartening.

As always, Vikky Lira and Lillian Buziak did a wonderful job of organizing the event. Considering about 100 ACE Directors from 20+ countries, that is no small task. Perhaps the highlight of the organization was the detailed briefing sheets Lillian prepared for each one individually, down to what car service one takes and when - simply superb! No amount of thanks will be enough. From the bottom of my heart, thank you, Vikky and Lillian. And, thank you Justin Kestelyn - for kicking off and running the event year after year.

Labels: , , ,

Open World 09 Starts

Oracle Open World 2009 has officially started with the User Group sessions today. I am presenting a session today. I started off by registering and getting my cool Blogger badge holder, hanging off the even cooler ACE Director lanyard.

I went off to the first session of today on the IOUG bucket - Workload Management by Alex Gorbachev. Alex is one of those people who know their stuff; so there is always something to be learned from there. Alex successfully demonstrated the difference between Connection Load Balancing and Server Side Listener Load Balancing, with pmon trace to show how the sessions are balanced. It sheds light on the question - why Oracle is not balancing the workload.

If you didn't attend this, you should definitely download the presentation and check it out later.

Labels: , , ,

Thursday, September 03, 2009

ASM Dynamic Volume Manager and ASM Clustered File System

Two of the top features in 11gR2 are the ASM Dynamic Volume Manager (ADVM) and ASM Clustered File System (ACFS). What is the big deal about these two?

ADVM allows you to create a volume from an ASM diskgroup. Here is an example where we created a volume called asm_vol1 of 100 MB on a diskgroup called DATA:


ASMCMD [+] > volcreate -G DATA -s 100M asm_vol1


Internally it issues the command


alter diskgroup DATA add volume 'asm_vol1' size 100M;


Now you enable the volume you just created:


ASMCMD [+] > volenable -G DATA asm_vol1


Internally it issues:
alter diskgroup DATA enable volume 'asm_vol1';


You can perform other commands like resize, delete, disable; but more on that later on a full length article.

Now that the volume is created, what can you do with it. Well, like all volumes, you can create a filesystem on it. Here is an example of creating a FS called acfs1:


[root@oradba2 ~]# mkdir /acfs1
[root@oradba2 ~]# /sbin/mkfs -t acfs /dev/asm/asm_vol1-207
mkfs.acfs: version = 11.2.0.1.0.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/asm_vol1-207
mkfs.acfs: volume size = 268435456


Register MountPoint Command:


[root@oradba2 ~]# /sbin/acfsutil registry -a -f /dev/asm/asm_vol1-207 /acfs1
acfsutil registry: mount point /acfs1 successfully added to Oracle Registry


If you get an error, use the force option:


[root@oradba2 /]# /sbin/mkfs.acfs -f /dev/asm/asm_vol1-207
mkfs.acfs: version = 11.2.0.1.0.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/asm_vol1-207
mkfs.acfs: volume size = 268435456
mkfs.acfs: Format complete.



Now you mount the the filesystem:


[root@oradba2 /]# /bin/mount -t acfs /dev/asm/asm_vol1-207 /acfs1


Now if you will check the filesystem, you will notice a new one - /acfs1


[root@oradba2 /]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
149313708 22429752 119176924 16% /
/dev/sda1 101086 11765 84102 13% /boot
tmpfs 907628 591640 315988 66% /dev/shm
/dev/asm/asm_vol1-207
262144 37632 224512 15% /acfs1


This new filesystem is actually carved out of the ASM diskspace. This can be used as a regular filesystem:


[root@oradba2 /]# cd /acfs1
[root@oradba2 acfs1]# ls
lost+found
[root@oradba2 acfs1]# touch 1


So, what't the big deal about it? Plenty.

First, this is part of the ASM management; so all the bells and whistles of ASM - such as asynch i/o, etc. - applies to this filesystem.

Second, this is now a "cluster" filesystem; it is visible across a cluster. So, now you have a fully functional generic filesystem visible across the cluster.

Third, this is now protected by the Grid infrastructure, if you have installed it. Remember from my earlier posts that in 11gR2 you can now have a restartable grid infrastructure even on a single instance.

More on ASM Dynamic Volume Manager later in a full length article. But I hope this makes you really interested.

Labels:

Wednesday, September 02, 2009

Oracle 11g R2 Features

Continuing on the previous posts, here is another gee-whiz feature of 11gR2 - the "deinstall" feature. Yes, that's right the deinstall one. Sometimes installations fail; sometimes you have to deinstall something to clean out the server for other use. Sometimes, I did, you have to clean out beta code to install production code. A deinstall utility stops all the processes, removes all the relevant software and components (such as diskgroups), updates all config files and make all necessary modifications to the other files. All these are done without you ever bothering about remnants that may cause issues later.

You have to download the deinstall software from 11gR2 download from OTN. Choose "see all" to get to that software.

Here is the demonstration of the deinstall utility:


[oracle@oradba2 deinstall]$ ./deinstall -home /opt/oracle/product/11.2/grid1
ORACLE_HOME = /opt/oracle/product/11.2/grid1
Location of logs /opt/oracle/oraInventory/logs/

############ ORACLE DEINSTALL & DECONFIG TOOL START ############


######################## CHECK OPERATION START ########################
Install check configuration START


Checking for existence of the Oracle home location /opt/oracle/product/11.2/grid
1
Oracle Home type selected for de-install is: SIHA
Oracle Base selected for de-install is: /opt/oracle
Checking for existence of central inventory location /opt/oracle/oraInventory
Checking for existence of the Oracle Grid Infrastructure home /opt/oracle/produc
t/11.2/grid1

Install check configuration END

Traces log file: /opt/oracle/oraInventory/logs//crsdc.log

Network Configuration check config START

Network de-configuration trace file location: /opt/oracle/oraInventory/logs/netd
c_check22387.log

Specify all Oracle Restart enabled listeners that are to be de-configured [LISTE
NER]:

Network Configuration check config END

Asm Check Configuration START

ASM de-configuration trace file location: /opt/oracle/oraInventory/logs/asmcadc_
check22388.log

Automatic Storage Management (ASM) instance is detected in this Oracle home /opt/oracle/product/
11.2/grid1.
ASM Diagnostic Destination : /opt/oracle
ASM Diskgroups : +DATA1,+FRA1
Diskgroups will be dropped
De-configuring ASM will drop all the diskgroups and it's contents at cleanup time. This will aff
ect all of the databases and ACFS that use this ASM instance(s).


After some initial question and answer it shows a summary of activities and prompts you for a confirmation:


####################### CHECK OPERATION SUMMARY #######################
Oracle Grid Infrastructure Home is: /opt/oracle/product/11.2/grid1
The cluster node(s) on which the Oracle home exists are: (Please input nodes seperated by ",", eg: node1,node2,...)null
Oracle Home selected for de-install is: /opt/oracle/product/11.2/grid1
Inventory Location where the Oracle home registered is: /opt/oracle/oraInventory
Following Oracle Restart enabled listener(s) will be de-configured: LISTENER
ASM instance will be de-configured from this Oracle home
Do you want to continue (y - yes, n - no)? [n]: y
A log of this session will be written to: '/opt/oracle/oraInventory/logs/deinstall_deconfig2009-09-02_02-12-22-PM.out'
Any error messages from this session will be written to: '/opt/oracle/oraInventory/logs/deinstall_deconfig2009-09-02_02-12-22-PM.err'



After you press "y", it starts the operation of a clean deinstallation. The output continues as shown below:


######################## CLEAN OPERATION START ########################
ASM de-configuration trace file location: /opt/oracle/oraInventory/logs/asmcadc_clean22389.log
ASM Clean Configuration START
ASM deletion in progress. This operation may take few minutes.
ASM Clean Configuration END

Network Configuration clean config START

Network de-configuration trace file location: /opt/oracle/oraInventory/logs/netdc_clean22390.log

De-configuring Oracle Restart enabled listener(s): LISTENER

De-configuring listener: LISTENER
Stopping listener: LISTENER
Listener stopped successfully.
Unregistering listener: LISTENER
Listener unregistered successfully.
Deleting listener: LISTENER
Listener deleted successfully.
Listener de-configured successfully.

De-configuring Listener configuration file...
Listener configuration file de-configured successfully.

De-configuring Naming Methods configuration file...
Naming Methods configuration file de-configured successfully.

De-configuring backup files...
Backup files de-configured successfully.

The network configuration has been cleaned up successfully.

Network Configuration clean config END


---------------------------------------->



At some point you will be asked to shutdown cssd, etc. which need root privileges. The deinstall utility shows a comamnd string you can run as root to accomplish this task:


Run the following command as the root user or the administrator on node "oradba2".

/opt/oracle/software/11gR2/deinstall/perl/bin/perl -I/opt/oracle/software/11gR2/deinstall/perl/lib -I/opt/oracle/software/11gR2/deinstall/crs/install /opt/oracle/software/11gR2/deinstall/crs/install/roothas.pl -force -delete -paramfile /opt/oracle/software/11gR2/deinstall/response/deinstall_Ora11g_gridinfrahome1.rsp

Press Enter after you finish running the above commands


Running the command on a different terminal as root:


[root@oradba2 ~]# /opt/oracle/software/11gR2/deinstall/perl/bin/perl -I/opt/oracle/software/11gR2/deinstall/perl/lib -I/opt/oracle/software/11gR2/deinstall/crs/install /opt/oracle/software/11gR2/deinstall/crs/install/roothas.pl -force -delete -paramfile /opt/oracle/software/11gR2/deinstall/response/deinstall_Ora11g_gridinfrahome1.rsp
2009-09-02 14:20:57: Checking for super user privileges
2009-09-02 14:20:57: User has super user privileges
2009-09-02 14:20:57: Parsing the host name
Using configuration parameter file: /opt/oracle/software/11gR2/deinstall/response/deinstall_Ora11g_gridinfrahome1.rsp
CRS-2673: Attempting to stop 'ora.cssd' on 'oradba2'
CRS-2677: Stop of 'ora.cssd' on 'oradba2' succeeded
CRS-4549: Stopping resources.
CRS-2673: Attempting to stop 'ora.diskmon' on 'oradba2'
CRS-2677: Stop of 'ora.diskmon' on 'oradba2' succeeded
CRS-4133: Oracle High Availability Services has been stopped.
ACFS-9200: Supported
Successfully deconfigured Oracle Restart stack


Now going back to the original terminal where deinstall was called from, press Enter. The output continues:


Oracle Universal Installer clean START

Detach Oracle home '/opt/oracle/product/11.2/grid1' from the central inventory on the local node : Done

Delete directory '/opt/oracle/product/11.2/grid1' on the local node : Done

The Oracle Base directory '/opt/oracle' will not be removed on local node. The directory is in use by Oracle Home '/opt/oracle/product/10.2/db1'.

The Oracle Base directory '/opt/oracle' will not be removed on local node. The directory is in use by central inventory.

Oracle Universal Installer cleanup was successful.

Oracle Universal Installer clean END


Oracle install clean START

Clean install operation removing temporary directory '/tmp/install' on node 'oradba2'

Oracle install clean END

Moved default properties file /opt/oracle/software/11gR2/deinstall/response/deinstall_Ora11g_gridinfrahome1.rsp as /opt/oracle/software/11gR2/deinstall/response/deinstall_Ora11g_gridinfrahome1.rsp1

######################### CLEAN OPERATION END #########################


####################### CLEAN OPERATION SUMMARY #######################
ASM instance was de-configured successfully from the Oracle home
Following Oracle Restart enabled listener(s) were de-configured successfully: LISTENER
Oracle Restart was already stopped and de-configured on node "oradba2"
Oracle Restart is stopped and de-configured successfully.
Successfully detached Oracle home '/opt/oracle/product/11.2/grid1' from the central inventory on the local node.
Successfully deleted directory '/opt/oracle/product/11.2/grid1' on the local node.
Oracle Universal Installer cleanup was successful.

Oracle install successfully cleaned up the temporary directories.
#######################################################################


############# ORACLE DEINSTALL & DECONFIG TOOL END #############


The components are cleanly deinstalled now. The directories have been cleaned up by this tool.

This was available in 11gR1 as well; but R2 just makes it very user friendly.

More on this later.

Labels:

Tuesday, September 01, 2009

Oracle 11g Release 2 is Finally Out

Finally, it's that time again - the birth of a new versionof Oracle - 11g Release 2. Being Release 2, it does not have as much bells and whistles as the 11g.

I downloaded it immediately and started installation. Some of the gee-whiz features of this release are:

(1) Editions
(2) ASM Filesystem
(3) Oracle Restart
(5) Columnar Compression

I have been beta testing this for some time; so I had seen previews of the release. Continuing the previous serieses, I will write the new features series for 11gR2 on OTN as well - it will be a 11 part series.

A little bit about Oracle Restart. It adds a lightweight clusterware functionality to a single instance database. If the instance crashes, OR brings it up, monitors it ans so on. And by the way, this is called "Grid Infrastructure". So you have to install two Oracle Homes - one each for grid and the rdbms.

When there is Grid, there is srvctl, of course. The grid infrastructure comes with srvctl. Here is how you check what is running from a specific Oracle Home:

oracle@oradba1 ~# srvctl status home -o /opt/oracle/product/11gR2/db1 -s state.txt
Database d112d1 is running on node oradba1


The above command create a file called state.txt.

oracle@oradba1 ~# cat state.txt
db-d112d1


It shows the database name - D112D1.

This is done on a single instance Oracle database; not a cluster. But the grid infrastructure looks and feels like a cluster. Here are some more commands to check status:

oracle@oradba1 ~# srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): oradba1
oracle@oradba1 ~# srvctl status asm -a
ASM is running on oradba1
ASM is enabled.


A bunch of new processes suppor this grid infrastructure:


oracle 19046 1 0 18:13 ? 00:00:03 /opt/oracle/product/11gR2/grid1/bin/ohasd.bin reboot
oracle 19487 1 0 18:15 ? 00:01:14 /opt/oracle/product/11gR2/grid1/bin/oraagent.bin
oracle 19502 1 0 18:15 ? 00:00:01 /opt/oracle/product/11gR2/grid1/bin/tnslsnr LISTENER -inherit
oracle 19656 1 0 18:15 ? 00:00:01 /opt/oracle/product/11gR2/grid1/bin/cssdagent
oracle 19658 1 0 18:15 ? 00:00:02 /opt/oracle/product/11gR2/grid1/bin/orarootagent.bin
oracle 19674 1 0 18:15 ? 00:00:01 /opt/oracle/product/11gR2/grid1/bin/ocssd.bin
oracle 19687 1 0 18:15 ? 00:00:00 /opt/oracle/product/11gR2/grid1/bin/diskmon.bin -d -f


Let's see what happens when you kill the instance.


oracle@oradba1 ~# ps -aefgrep pmon
oracle 14225 13768 0 23:15 pts/7 00:00:00 grep pmon
oracle 19866 1 0 18:16 ? 00:00:00 asm_pmon_+ASM
oracle 26965 1 0 20:53 ? 00:00:00 ora_pmon_D112D1
oracle@oradba1 ~# kill -9 26965


This will, of course, crash the instance. Let's chck after some time:


oracle@oradba1 ~# ps -aef|grep pmon
oracle 14315 1 0 23:15 ? 00:00:00 ora_pmon_D112D1
oracle 14686 11492 0 23:17 pts/2 00:00:00 grep pmon
oracle 19866 1 0 18:16 ? 00:00:00 asm_pmon_+ASM


Where did the pmon come from? Didn't the instance just crash?

The instance was restarted by Oracle Restart.

What if you want to just keep the instance down, e.g. during a maintenance. Well, just shutdown normally; the instance will stay down. When you are ready, start the instance using either SQL*Plus or srvctl:

oracle@oradba1 ~# srvctl start database -d d112d1

Remember, D112D1 is a single instance database.

More on this later, on OTN.

Labels:

Monday, May 11, 2009

Collaborate 09 Ends

I wrapped up a hectic three days at Collaborate this year. As if to echo the sentiments on the economy in general, and Oracle database technology related in particular, the attendance was way down this year. The vast expanse of Orange County Convention Center didn't make it easy either; it made it appear even smaller! Well, it was Orlando; the weather and traffic cooperated and I'm pretty sure some attendees picked up a few well deserved resting points either alone or with visiting family.

I presented three sessions:

(1) RAC Performance Tuning
(2) Real World Best Practices for DBAs (with Webcast)
(3) All About Deadlocks

I also participated in the RAC Experts' Panel as a Panelist.

As always, I used to opportunity to meet old friends and acquintances. Collaborate is all about knowledge sharing; I didn't lose when it came to getting some myself. I attended some highly useful sessions:

(1) Database Capacity Planning by Ashish Rege
(2) Partitioning for DW by Vincent
(3) Hints on Hints by Jonathan Lewis
(4) 11g Performance Tuning by Rich Niemic
(5) HA Directions from Oracle by Ashish Ray
(6) Storage Performance Diagnosis by Gaja
(7) Two sessions - on Database Replay and Breaking Oracle by Jeremiah Wilton

and some more. It has been a worthy conference.

Labels:

Friday, March 13, 2009

Excellent Information on Exadata

If you are interested to know more about the technology (not the marketing hype) behind Exadata - the datawarehouse appliance from Oracle, I just discovered a treasure trove of information on this blog:
http://www.pythian.com/news/1267/interview-kevin-closson-on-the-oracle-exadata-storage-server#comment-350433

Kevin Closson, one the lead architects of the Exadata machine has been interviewed by Christo Kutovsky and Paul Paul Vallée of Phythian. Kevin, in his usual detailed style explains the innards of the beast, aided by the excellent leading questioning by Christo.

I believe this is best writing I have seen on Exadata - all good stuff, no fluff. Thank you, Kevin.

Saturday, January 24, 2009

Ultra-Fast MV Alteration using Prebuilt Table Option

Here is an interesting question posed to me one time and I had found a solution. After 9 years, I encountered the same question and was shocked to find that many people still don't know about a little trick that could avoid a potential problem later.


Someone asked me how to modify a column of a Materialized View, e.g. from varchar2(20) to varchar2(25), or something similar. Drop and recreate? Not an option. We are talking about a several hundred GB MV with a very complex query that will take days to complete.

Problem
When you alter a materialized view to add a column or modify a column definition, unfortunately there is no command functionally equivalent to ALTER MATERIALIZED VIEW … ADD COLUMN. The only way to alter an MV is to completely drop and recreate it with the alteration. That approach may be acceptable for small MVs; but for larger MVs the cost of rebuilding can make the process quite infeasible. In addition to the time it will take to rebuild the entire MV (which could be days, depending on the size), the redo/undo generation and the surge in logical I/O due to the MV query may seriously affect the performance of the source database. In some cases, large MVs may even fail to be rebuilt as sometimes the undo segments may not have the undo information for long running queries – causing ORA-1555 errors.

So is there a better approach? Yes, there is. In this document I am going to explain a better approach for creating an MV that makes the alterations possible without rebuilding the MV – a task accomplished in mere seconds as opposed to potentially days.

Concept of Segments

Segments are stored units in Oracle. So, a table has a segment; not a view – since the contents of the view are not stored; only the view definition is. A Materialized View, however, stores the contents; so it is a segment.

Actually, the concept of segment goes a little bit further. If the table is partitioned, then each partition is a different segment. So, the relationship between tables and segments is one-to-many.

When you create an object that needs storage, such as a table, an MV or an index, Oracle first creates the corresponding segment. Once that is complete, the segment is shrouded by the cover of the object. The segment still continue to exist; but is now connected to the object. Until the segment is completely created and populated, the object technically does not exist. The segment may, in some cases, have a different name from the object. If the segment creation (or population) fails, Oracle automatically cleans up the remnants of the failed segment; but sometimes it may not be, leaving behind the chards that are eventually cleaned up by SMON process.

MVs and Segments

Anyway, how is this discussion about segments relevant to our objective here –the fast alteration of MViews?

Plenty. Remember, MVs are nothing but tables behind the covers? Property-wise, MVs and tables are like sisters, not even cousins. You can think of MVs are regular tables with some built in intelligence about how they were created (the defining query), how often they should be refreshed automatically by a job and how queries should be transformed to take advantage of the presence of the MVs. But apart from that, there is not much difference. You can directly insert into an MV, create indexes and so on. As far as a segment is concerned, there is no difference between an MV and a table. In fact Oracle stores the segment as a table:

SQL> select SEGMENT_TYPE
2 from user_segments
3 where SEGMENT_NAME = 'MV1';

SEGMENT_TYPE
------------------
TABLE

However, the biggest difference is the very issue we are discussing – you can’t add/modify columns of an MV while you can do that freely for a table. If I could attempt to logically represent tables and MVs, here is how it would look like.



The segment is the same. If it was created as an MV, the properties of MV take over the segment. If it was created as a table, the properties of a table take over the control.

Prebuilt Table

Since under the covers the segment is the same for both MV and table, can’t you take advantage of the fact? Suppose you have a table and you now want to convert that to an MV. In other words, you want to repoint that arrow initially pointed at the table to the MV properties:




Can you do it? Yes, of course you can. Since at the segment level it is the same, Oracle allows you to do it. When you create an MV, you can use a special clause ON PREBUILT TABLE. Here is how you create a MV in the regular approach:

create materialized view mv1
never refresh as
select cast(count (1) as number(10)) cnt from t1;


If you check the objects created:

SQL> select object_id, data_object_id, object_type
2 from user_objects
3 where object_name = 'MV1';

OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE
---------- -------------- -------------------
74842 74842 TABLE
74843 MATERIALIZED VIEW

So, it creates two objects – a table and an MV - anyway. Note a very important difference though: the DATA_OBJECT_ID for the MV object is null. If you drop the MV and check for the objects:

SQL> drop materialized view mv1;

Materialized View dropped.

SQL> select object_id, data_object_id, object_type
2 from user_objects
3 where object_name = 'MV1';

no rows selected

Even though there were two objects – a table and an MV, when you dropped the MV, both were dropped. The table object didn’t have an independent existence. Dropping the MV drops the table automatically.

Now, in the modified approach, you first create the table in the same name as the MV you are going to create:

SQL> create table mv1 (cnt number(10));

Next you create the MV by adding a new clause called ON PREBUILT TABLE shown below:

create materialized view mv1
on prebuilt table
never refresh
as
select cast(count (1) as number(10)) cnt from t1;

Now there will be two objects as well – one table and one MV. The MV simply took over the command over the segment but since the table already existed, it did not recreate the table object. So there are still only 2 objects.

One concern: since you created the table manually, can you accidentally drop it? Let’s see:

SQL> drop table mv1;
drop table mv1
*
ERROR at line 1:
ORA-12083: must use DROP MATERIALIZED VIEW to drop "ARUP"."MV1"

That answers it. The table simply loses its independent existence. However, see what happens when you drop the MV:

SQL> DROP MATERIALIZED VIEW mv1;

Materialized view dropped.

Now check the segment:

SQL> select segment_type
2 from user_segments
3 where segment_name = 'MV1';

SEGMENT_TYPE
------------------
TABLE

The segment still exists! When you dropped the MV, the segment was not dropped; it simply reverted to being a table. You can confirm that by checking the objects view:

OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE
---------- -------------- -------------------
77432 77432 TABLE

Voila! The object still exists as a table. Previously you saw dropping the MV removed all the objects and the segment. However, in this approach the segment was preserved. Since it reverted to a table, you can do all things possible in a table – select from it, create index, and - most important – modify the column. You can alter the column to make NUMBER(11).

SQL> alter table mv1 modify (cnt number(11));

Table altered.

Now, create the MV again:

create materialized view mv1
on prebuilt table
never refresh as
select cast(count (1) as number(11)) cnt from t1;

That’s it. The MV is altered. The whole process took about a few seconds, and since you didn’t have to recreate the segment, you saved enormous load on the database. Here a schematic representation of what happened.

Now you know how powerful prebuilt table option is. It only affects how you define the MV; nothing else. All other properties of the MV remain intact. The end users don’t even know about the prebuilt table option; but for the DBA it remains a powerful tool in the arsenal. As a best practice I recommend creating any MV, regardless of size, with the ON PREBUILT TABLE clause. In small tables you probably don’t see a huge advantage; but what if today’s small table grows to a large one tomorrow? It’s better to be safe than sorry.

Conversion to the New Approach

Now that you understand the power of the prebuilt option, you may be wondering how to convert the existing MVs to the new clause. Unfortunately there is no conversion path. You have to drop and recreate the MVs. That is why this time – when we are moving MVs to new tablespaces – we have the golden opportunity.

One approach is to create new tables with new names and then rename them. Here are the steps:

1. Create a table with nologging clause from the old MV
create table new_mv1
nologging
as
select * from mv1;

2. Capture the MV definition from the data dictionary:

select dbms_metadata.get_ddl ('MATERIALIZED_VIEW','MV1')
from dual ;

DBMS_METADATA.GET_DDL('MATERIALIZED_VIEW','MV1')
------------------------------------------------
CREATE MATERIALIZED VIEW "ARUP"."MV1" ("CNT")
ORGANIZATION HEAP PCTFREE 10
… and so on …

3. Spool this to a file to be executed later.

4. Edit this file to place ON PREBUILT TABLE CLAUSE.

CREATE MATERIALIZED VIEW "ARUP"."MV1" ("CNT")
ORGANIZATION HEAP ON PREBUILT TABLE PCTFREE 10

5. Take a Data Pump export with CONTENTS=METADATA_ONLY option. This creates all relevant privileges on the export dump file. Keep it aside.

6. Drop the Materialized View MV1.

7. Rename table NEW_MV1 to MV1

8. Execute the script you created earlier to recreate the MV.

9. Import the export dump file. It will recreate all the privileges.

This is slow; but the best approach since it generates minimum amount of redo and undo.

Hope this is helpful. You may look at an article I wrote http://www.dbazine.com/oracle/or-articles/nanda2 The article describes, with complete code, how to alter an MV where the refresh occurs across databases.


Labels:

Tuesday, January 20, 2009

The other day, we had a serious issue in the ASM diskgroups - one diskgroup refused to come up because one disk was missing; but it was not clear from the message which of the 283 devices was missing. This underscores the difficulty in diagnosing ASM discovery related issues. In this post, I have tried to present a way to diagnose this sort of issues through a real example.

We had planned to move from one storage device to another (EMC DMX-3 to DMX-4) using SRDF/A technology. The new storage was attached to a new server. The idea was to replicate data at the storage level using SRDF/A. At the end of the replication process, we shut the database and ASM down and brought up the ASM instance in the newer storage on the new server. Since the copy was disk level, the disk signatures were intact and the ASM disks retained their identity from the older storage. So, when the ASM instance was started, it recognized all the disks and mounted all the diskgroups (10 of them) except one.

While bringing up a disk group called APP_DG3 on the new server it complained that disk number “1” is missing; but it was not clear which particular disk was. In this blog the situation was diagnosed and performed.

Note: the asm disk paths changed on the storage. This was not really a problem; since we could simply define a new asm_diskstring. Remember: the diskstring initialization parameter simply tells the ASM instance which disks should be looked at while discovering. Once those disks are identified, ASM looks at its signature on the disk headers to check the properties - the disk number, the diskgroup it belongs to, the capacity, version compatibilty and so on. So as long as the correct asm_diskstring init parameter is provided, ASM can readily discover the disks and get the correct names.

Diagnosis

This issue arises when ASM does not find all the disks required for that disk group. There could be several problems:

(i) the disk itself is physically not present
(ii) it’s present but not allowed to be read/write at the SAN level
(iii) it’s present but permissions not present in the OS
(iv) it’s present but the disk is not mapped properly; so the disk header shows something else. ASM knows the disk number, group, etc. from the disk header. If the disk header is not readable; or is not an ASM disk, the header will not reveal anything to ASM and hence will not mount.

If an ASM diskgroup is not mounted, the group_number for that disk shows “0”. If it’s mounted, the group number shows whatever the group number of the disk group is. Please note: the disk numbers are dynamic. So, APP_DG1 may have a group number “1” but the number may change to “2” after the next recycle.

Since the issue involved APP_DG3, I checked the group number for the group APP_DG3 from the production ASM (the old SAN on the old server) by issuing the query:

ASM> select group_number, name
2 from v$asm_diskgroup
3 where name = 'APP_DG3'
4 /

GROUP_NUMBER NAME
------------ ------------------------------
4 ASM_DG3

This shows the group number is 4 for the APP_DG3 group. I will use this information later during the analsysis.

On the current production server, I checked the devices and disk number of group number 4:

ASM> select disk_number, path
2 from v$asm_disk
3 where group_number = 4;

DISK_NUMBER PATH
----------- --------------------
0 /dev/rdsk/c83t7d0
1 /dev/rdsk/c83t13d5
2 /dev/rdsk/c83t7d2
… and so on …
53 /dev/rdsk/c83t7d1

54 rows selected.

On the new server, I listed out the disks not mounted by the disk groups. Knowing that disks belonging to an unmounted diskgroup show a group number 0, the following query pulls the information:

ASM> select disk_number, path
2 from v$asm_disk
3 where group_number = 0;


DISK_NUMBER PATH
----------- --------------------
0 /dev/rdsk/c110t1d1
2 /dev/rdsk/c110t1d2

3 /dev/rdsk/c110t1d3
… and so on …
254 /dev/rdsk/c110t6d7

54 rows selected.

Carefully study the output. The results did not show anything for disk number “1”. The disks were numbered 0 followed by 2, 3 and so on. The final disk was numbered “254”, instead of 54. So, the disk number “1” was not discovered by ASM.

From the output we know that production disk /dev/rdsk/c83t7d0 mapped to new server disk /dev/rdsk/c110t1d1, since they have the same disk# (“0”). For disk# 2, production disk /dev/rdsk/c83t7d2 is mapped to /dev/rdsk/c110t1d2 and so on. However, production disk /dev/rdsk/c83t13d5 is not mapped to anything on the new server, since there is no disk #1 on the new server.

Next I asked the Storage Admin what he mapped for disk /dev/rdsk/c83t13d5 from production. He mentioned a disk called c110t6d25.

I checked in the new server, if that disk is even visible:

ASM> select path
2 from v$asm_disk
3 where path = '/dev/rdsk/c110t6d25'
4 /

no rows selected

It confirmed my suspicion – ASM can’t even read the disk. Again, the reasons could any of the above mentioned ones - disk is not presented, does not have correct permission, etc.

In this case the physical disk was actually present and was owned by “oracle”; but not accessible to ASM. The issue was with SRDF not making the disk read/write. It was still in sync mode, preventing the disk to be enabled for writing. ASM couldn't open the disk in read write mode; so it rejected it as a member of any diskgroup and assigned it a default disk number 254.

After Storage Admin fixed the issue by making the disk read write, I re-issued the discovery:

ASM> select path
2 from v$asm_disk
3 where path = '/dev/rdsk/c110t6d25'
4 /

PATH
-------
/dev/rdsk/c110t6d25

It returned with a value. Now ASM can read it correctly. I mounted the disk:

ASM> alter diskgroup APP_DG3 mount;

It mounted successfully; because it got all the disks to make up the complete group.

After that the disk# 254 also went away. Now the disks showed 0, 1, 2, 3, … 53 for the group on both prod and the new server.

Labels:

Monday, January 19, 2009

Making a Shell Variable Read Only

Being inherently lazy, I am always a sucker for shortcuts, neat tricks to cut my work and, most important, not to do the same thing again and again. Here is a tip I find useful.

Have you ever been frustrated to find that some line has changed some important shell variable such as ORACLE_BASE inside a shell script? The list of variables that are important to safety and efficiency of your shell is a long one - PS1, ORACLE_BASE, PATH, and so on. Using this little known command, you can easily "protect" a variable. The trick is to make it readonly. First, set the variable:

# export ORACLE_BASE=/opt/oracle

Then make it readonly:

# readonly ORACLE_BASE

Now if you want to set it:

# export ORACLE_BASE=/opt/oracle1
-bash: ORACLE_BASE: readonly variable

You can't. You can't even unset the variable:

# unset ORACLE_BASE
-bash: unset: ORACLE_BASE: cannot unset: readonly variable

This is a cool way to protect important variables.

To get a list of variables that are readonly, use

# declare -r
declare -ar BASH_VERSINFO='([0]="3" [1]="00" [2]="15" [3]="1" [4]="release" [5]="i386-redhat-linux-gnu")'
declare -ir EUID="500"
declare -rx ORACLE_BASE="/opt/oracle"
declare -ir PPID="13204"
declare -r SHELLOPTS="braceexpand:emacs:hashall:histexpand:history:interactive-comments:monitor"
declare -ir UID="500"

Unfortunately there is no comamnd to make it readwrite.

In the same way, you can also prevent a specific variable not to be set. LD_LIBRARY_PATH should not be set during some type of installations. To force it that way:

# export LD_LIBRARY_PATH
# readonly LD_LIBRARY_PATH

Now if you want to assign a value:

# export LD_LIBRARY_PATH=d
-bash: LD_LIBRARY_PATH: readonly variable


You will not be able to. You can also achieve the same goal by:

# declare -r LD_LIBRARY_PATH=

I hope you find it useful.

Labels:

Friday, October 10, 2008

About Clear Communications

Like everyone else I like to rant once in a while. I rant about the shortcomings in Oracle software, the tools and technologies I work with. But this time I want to rant about a decisively non-technical topic. Arguably it is something that everyone must have felt - at least once. It's about communicating clearly. Why don't people do it? Why don't they articulate whatever they are trying to say. Instead they spit out incoherently with thoughts coming across as sloppily slapped together expecting the other person to somehow put it all together. I am not talking about children; these are responsible adults who supposedly make up policies and act as thought leaders. Unclear thoughts and communication not only frustrates people; but is dangerous. It misdirects efforts leading to wastage and often utter failure.



Today I had on the receiving end of such a travesty. Earlier, a manager of an application team wrote to me this email (reproduced verbatim) about a requirement:



We are having a shortage of capabilities on the servers. So we want to increase the capabilities somehow. What do you recommend?



I was scratching my head. How can I comment or influence the capabilities of their applications? Perhaps they are asking about some limitations which might be solved by some Oracle technology features. So, I called them for a quick chat. After half hour I still wasn't clear about what limitations they are trying to solve.



And then, after one hour, I got it: they are talking about capacity; not capability! And not only that it's about the database server; not the app server. [Trying to pull my hair out at this time]



My recommendation would have been to send them to an English school; but, being occasionally wise, I kept it to myself.



OK; let's move on. I promised to have a DBA look at the capacity issue.



And a DBA did. Sme days went by and it apprently reached a boiling point. I was told nothing has been done by the DBA and, well, that's not acceptable. so, I intervened. I asked the DBA for her side of the story. It was pretty simple - the CPU, I/O are all normal, way below utilization. The growth projections were eight times. Yes, eight times. So, the DBA made a request for the increase in capacity and that's where the friction has started. No one anticipated the eight fold increase; so there is simply no room. Stalemate!

As the head of database architecture, I question any growth projections, especially ones that go up 8 times. And I did. Here was the response "we are running on 2 legs and we will run on 16 legs in the near future".

2 legs?!!! What is that? What is a leg?

As it turns out, the application is running 2 java virtual machines. the app architects are recommending to run 16 JVMs to add redundancy and distribute local processing. From the database perspective, that means 2 clients now will become 16; but the overall processing will *not* go up.

Instead of saying that in plain English, the App manager coined a term "leg" to describe the issue in apprently in some technical way. This was communiated to his management chain, which in turn interpreted it as 8 fold increase in processing and demanded that we create 7 more databases. They approved the new "databases" and allocated the budget. But since the friction came with a member of my team, I became involved. As I always do, I questioned the decision and that's how the truth came out, which in turn also cleared the mystery behind the "capability" story described above.

All these hoopla about people not communicating in a clear manner. Adding 16 more clients should have simple enough to connvery even to the mail guy; calling it "legs" confused every one, wasted time and increased frustration. If I hadn't questioned it, it would have been implemented too. 7 more databases doing nothing to solve the issues present.

Communication is all about articulation and presentation of thoughts. The key is empathy - put yourself in the recipient's shoes and try to look at what you are saying from that persepctive. It's not about English (or whatever the language used); it's about the clarity of thought process. Granted not everyone will be able to present the thought process equally coherently while speaking; but what about writing? There is no excuse for not writing coherently, is there? The only reason for being incoherent is just a I-don't-care attitude. Of course, there are other things - unfamilairity with the language used, lack of time, environment (typing on the blackberry with one finger while cooking with the other hand), state of mind (typing out the report while waiting for the third drink to arrive at the bar); but most of the time it's just plain lack of empathy. If you don't have that, you just don't communicate; at least not effectively. And when you don't communicate, you fail, regardless of your professional stature.

As Karen Morton once said during a HotSos Symposium, and a mantra I took to heart and live by the lines everyday:


Knowledge and Experience Makes you Credible
Ability to Communicate Effectively makes you Useful


Useful - that's what I and you want to be; not just credible. Thank you, Karen.

Sunday, October 05, 2008

Upgrade Experience for Patchkit 11.1.0.7

The first patchkit for 11g - 11.1.0.7 - came out a few weeks ago, for Linux systems. Unlike most other patchkits, this one contains some new functionalities. Oracle patchkits are usually only bugfixes with al patches regression tested collectively; not added features. This one does have some new (albeit minor) features.

The upgrade was relatively easy but took a long time - about 30 minutes. I encountered one issue and a roadblock. Here is the annoying roadblock

A Very Import Pre-req

Do not forget to check for this pre-requisite. This is the time_zone check, as specified in the patchkit readme file.

SQL> select version from v$timezone_file;

VERSION
----------
4

In my case it returned 4, which means no further action is necessary. Had it returned something else, I would have to follow the instructions mentioned in MetaLink Note 568125.1.




Asking for Email for Security Notification

It was interesting to note that the installer asked me to enter my Email address to send me updates on security. Funny; I get that already anyway. So I ignored and clicked "Next". Nope; it popped a little window asking me to confirm that I do not need email. Sure, I confirmed and pressed Next. No, the same issue, it asked me again to confirm and so on. Vicious circle!

There was a little checkbox below that said "track security via MetaLink" and asked my MetaLink password (not "id", which it presumed to be my email). I checked that box and entered my email and password, and it allowed me to go further.

This is a little awkward. Not everyone will want to get the email. And in any case, a mere email will explain little about the security issues. And why would anyone want to embed his/her MetaLink password in a response file?

Cool Feature: The Pre-upgrade Script

Oracle now provides a script utlu111i.sql in the $OH/rdbms/admin directory. Run this script before you shut the database down for upgrade. It will tell a lot about the issues you may need to fix before trying the upgrade.

The Issue

The issue I encountered was while running the catupgrd.sql script. After applyng the patch, I started up with the UPGRADE option and then executed catupgrd.sql script, which promptly failed. From the error message (which, I unfortunately, I couldn't capture), it was clear that the failure was due to the presence of Data Vault feature. I apparently installed Data Vault option while installing the software. The message says it clearly:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

Well, I need to remove it before I can upgrade. It needed a recomp of the Oracle software:

oracle@prolin1$ cd $ORACLE_HOME/rdbms/lib
oracle@prolin1$ make -f ins_rdbms.mk dv_off

Next, I relinked the Oracle software:

oracle@prolin1$ relink oracle

Now the catupgrd.sql script ran successfully as expected. It took about 1 hour to complete.

SQL> select * from v$version
2 /

BANNER
-----------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
PL/SQL Release 11.1.0.7.0 - Production
CORE 11.1.0.7.0 Production
TNS for Linux: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production

Labels:

Sunday, September 28, 2008

Most Striking Object at Open World 08

What was the most amazing thing I witnessed for the first time at OOW 08?

Hands down, it is the compostible drinking cup! Yes, compostible, not recyclable. It looks just like any other plastic (not paper) drink cup you see at almost any public water dispensers. The difference? It's made of corn syrup, I was told, not plastic and hence compostible. Wow!

I am not a green fanatic; but I consider myself a responsible adult concerned about the environment doing his share to reduce the landfills, pollutions and paper consumption. I do things that are practical: I don't print something I can read on the monitor; project emails, powerpoints on the screen/projector while conferring with colleagues rather than printing; use back sides of printouts to scribble; use 2-sided printing; donate kids' toys and cloths to charity rather than throw them in trash and so on. But there are some things I just couldn't jettison; at least not yet. One of them was the ubiquitous plastic drinking cup and the bottled water. The convenience of the water bottle was just too much to ignore and my lazy bones reigned over my conscience and I always gravitated, albeit a little guiltly, to the water bottle.

Not any more. I hope these compostible corn syrup based polymer material makes its way to all things plastic - bottles, cups, packaging and so on. The material is called polylactic acid (PLA), which is a polymer made from lactic acid from strachy produce like corn, wheat, patato and beet. However, due to its low melting point, it's not suitable for hot liquids, at least not yet. There is a compostible version - paper cups lines with PLA instea dof petroleum based products. But that's still paper; not 100% PLA.

According to a Smithsonian article, producing this PLA requires 65% less energy and emits 68% fewer greenhouse gases. Wow! That's good enough for me.

But, is it all rosy and smell nice? Well, afraid not. The biggest caveat: the PLA decomposes in a controlled composting facility, not the backyard composting bin. you need something of industrail strength - the sort used by municipalities and large industrial plants. Do they exist? Yes, for commercial use; but almost none for residential use. So, that's the catch. While the material is compostible; the facility to compost is not available.

But I am not going to look at it as glass half full. This is a major first step. Perhaps the ecological and political pressures will force the residential facilities to open up as well. Until then, let the power be with PLA.

Labels: , , , ,

OOW'08 Oracle 11g New Features for DBAs

It was my second session at Open World this year. It was full with 332 attendees with a whopping 277 attendees on wait list! the room capacity was 397. Of course, the room did have some fragmentation and not everyone could make it.

Here is the abstract:

There is a world outside the glittering marketing glitz surrounding Oracle 11g. In this session, a DBA and author of the popular 11g New Features series on OTN covers features that stand out in the real world and make your job easier, your actions more efficient and resilient, and so on. Learn the new features with working examples: how to use Database Replay and SQL Performance Analyzer to accurately predict the effect of changes and Recovery Manager (RMAN) Data Recovery Advisor to catch errors and corruption so new stats won't cause issues.

Thank you very much for those who decided to attend. I hope you found it useful. Here is the presentation. You can download it from the Open World site too. Please note, the companion site to see al working examples and a more detailed coverage is still my Oracle 11g New Features Series on Oracle Technology Network.

Labels: , , , , , , ,

OOW'08 : Real Life Best Practices for DBAs

Once more, it was a full session with 278 attendees and 251 waitlisted due to lack of space! I wish the organizers simply moved to a bigger room. Oh, well!

For those who attended, I truly wish to express my gratitude. As a speaker, I feel honored when people choose to attend my session over others. I wish you found something useful here.

If you haven't already downloaded it from Open World site, here is the presentation: http://www.proligence.com/OOW08_DBA.pdf.

Here is the abstract:

This session covers best practices for excelling at being a DBA in the field from someone who has been managing Oracle Database instances for 14 years; was named the DBA of the Year in 2003; and has been through it all, from modeling to performance tuning, disaster recovery, security, and beyond. Best practices should be justifiable and situation-aware, not just because someone said they were good. Hear several tips and tricks for surviving and succeeding in the part art/part wizardry of database administration. The session covers Oracle 11g.

I will highly appreciate if you post your comments.

Labels: , , , , , ,

OOW'08 Partitioning - Why, When, What and How

I had a full session (300+ in attendance with 120 wait-listed). There was a gentleman from the organizing team who was trying to let in people flow in to the room as unobtrusively as possible. He was the same one for my other two sessions (which happend to be in the same room - full and and some 100+ wait listed) and he was visibly flustered to conduct the same routine. "You must be quite popular" - he exasperated! I took it as a compliment. For those who attended, thank you for taking the time to spend with me. I truly feel honored. For those who couldn't come in, I assume you have found some other helpful session.

Here is the abstract:

How do you decide which partitioning schemes to use, which column to partition on, or whether or not to use more than one column for partitioning keys? These are questions you must answer to develop a good strategy. Too many architects and DBAs choose a wrong strategy, making partitioning a failure. In this session, you will *not* learn the syntax or types of partitioning but how to decide the best strategy, presented through real-life case studies and examples. This session covers Oracle 11g features in partitioning.

If you haven't already done so from the Oracle Open World site, you can download here: http://www.proligence.com/OOW08_Part_ppt.pdf

By the way, I have also written an article on the same presentation, which you can find here: http://www.proligence.com/OOW08_Part_doc.pdf

Please leave your comments on these two documents. As always, thank you visiting and offering your feedback.

Labels: , , , , , , , ,

OOW'08 ACE Directors Forum Session

Lewis Cunningham kindly took the reins of organizing a unique session - a panel of ACE Directors. But instead of a panel didcussing some topic, it welomed questions from the audience. sort of stump the chump with the questions. Lewis couldn;t have been a more perfect moderator/host. Most amazing, he came up with the idea to hand out blank index cards for the attendees to fill out with their questions, which forced the questions to be clearer.

On the panel we had Brad Brown (TUSC), Eddie Awad, Tim Hill, Mark Rittman, Hans Forbrich and your truly. Here is a coverage on Mark Rittman's blog (with a picture of the crew) http://www.rittmanmead.com/2008/09/26/oracle-open-world-2008-day-5-exadata-storage-server-and-ask-the-oracle-ace-directors/

I was nervous; and who wouldn't be, with the pressure? Fortunately, we, as a panel, with the expert moderation by Lewis, could ace the volleys. some of the questions I responded to with my answers:

  • Q: Will Transparent Tablespace Encryption (TTE) spell the doom for Tranparent data Encryption (TDE)?
  • A: Not at all. TDE encrypts a specific column or columns. TTE encrypts everything the tablespace - all tables and all columns. So, the performance definitely impacted. However, the biggest difference is the encryption in the database. Both technologies encrypt data in storage; but TTE decrypts the data in the SGA. So index scans do not suffer in case of TTE. TDE does *not* decrypt the values in SGA; so index scans are rather useless. So, in the case where a data value will most likely be found in SGA, the TTE option works well. The penalty is in the time when data is loaded from the database to the SGA. Since that happens a lot less, this will not cause a serious issue. In case data is frequently aged out of the buffer cache, the TTE option may prove expensive and TDE might become relatively attactive.
  • Q: What approach would you recommend for upgrading a 10 GB database to 11g from 10g - Data Pump, Exp/Imp, Transportable Tablespace?
  • A: None of the above. I would go for a Direct Path Insert (insert with the APPEND hint) over DB Link. This allows me several benefits - (i) I can do a database capture and replay it on 11g to minimize the risk of something breaking after upgrade. (ii) I can do a database reorg at the time of the move, i.e. partition unpartitioned objects, etc. (iii) have minimal time for migration.
  • Q: What is your least favorite new feature in Oracle?
  • A: I would rather answer it as most "unnecessary" new feature. It would be bigfile tablespaces - hands down. I always recommend creating smaller datafiles for tablespaces, no more thna 32 GB. This reduces the risk significantly in case of failures. If a block media recovery fails due to whatever reason, you can at least restore the file (or switch over to a copy) quickly. The bigger the file, the more time will be for restore and recovery. A large number of files increase the checkpoint time. so, try to find a balance. But in any case, dump bigfiles.
  • Q: How has life changed for you after being an OCP?
  • A: Not in the least. I have been an OCP since 8i and I finishd 9i, 10g and now 11g upgrade exams. However, no one ever bothered to ask me if I am an OCP.

Labels: , , , , ,

Wednesday, September 24, 2008

From the Trenches at OOW'08

I had two sessions today:

(1) I signed up a Unconference session - "Practical 11g Upgrade". I signed up for this on the spot; so I didn't have any slides or time for preparation. The unconference room had a whiteboard and several markers. So, there I was - explaining our 11g upgrade process using a whiteboard. Was a lot of fun. Someday, I will try to capture the events on paper and publish as a blog.

(2) I had signed for "Meet the ACEs" event at OTN Lounge. Wasn't much of an event. It was shaowed by the much anticipated iPod Touch drawing at OTN lounge. Nope; I didn't win.

Labels: , , , ,

Wednesday, September 17, 2008

Speaking on Partitioning at NY Metro Oracle Day

I recently delivered a session at New York Metro Area Oracle Day organized by User Groups in New York, New Jersey and Connecticut. If you attended, thank you for coming. There is nothing much satisfying than see people taking time off their daily grind and be in the room to listen to someone they may or may not know; but trust they can get something from. Thank you for all those who attended.

The session was about how to use partitioning, not what it is. So I didn't talk about syntax; but how to exploit partitioning techniques to accomplish common objectives.

Here is the powerpoint presentation deck. However, you may find a descriptive article more suitable for reading.

Congratulations to the lady who won my book RMAN Recipes for asking the best question. I apologize, I couldn't get your name. If you read this, please get back to me.

The question was: In a RAC envrionment does partitioning help or hurt?

Answer: Partitioning helps in these ways:
(1) it allows several segments to be create for the same object. Therefore the chances of cache buffer chains gets reduced, which ultimately reduces the global cache related waits.
(2) it reduces the chance that rows will be found on one block, leading to block contention among instances and reduces gc buffer busy waits.
(3) in hash partitioned indexes, it reduces contention for a few index blocks, leading to lesser gc current grant related waits.

The downside? Well, not much. Partition operations (drop/add/split) are DDL, not DML operations. So, when you perform one of these, the library cache must be synced up on all instances, whihc may create library cache lock/latch issues. So, as a rule, perform partition maintenance operations during off-hours or periods of light activity.

Labels: , , , ,

Monday, September 01, 2008

Magic of Block Change Tracking

The other day, one of the instances of our main 3-node RAC database crashed due to an I/O error. Apparently there were many "dead paths" from the host to the SAN, which made some LUNs in the ASM diskgroup not being recognizable by the ASM instance. Why? Well, that's the topic for another blog. The point behind this one has nothing to do with the reason. All I want to convey that there was a LUN problem on one of the nodes which brought the instance down. Being RAC, the database was serviced from other two nodes - praise God for RAC! - and the users didn't notice it terribly (or, I would like to think that way).

After a few days we noticed the incremental RMAN backup taking a long time. This caused major issues - it took a long time and I/O waits went through the roof. In fact it took increasingly longer every day that passed by that unfortunate collapse of the node. Everyone was quite intrigued - what could be the connection between an instance crash and instance crashing? All sorts of theories cropped up - from failed HBA cards to undiscovered RAC bugs.
This is where I got involved. The following chronicles the diagnosis of the issue and the resolution.

First, the increased length of time is obviously a result of the incemental backups doing more work, i.e. more changed blocks. What caused so many changed blocks? Interviews with stakeholdrs yielded no clear answer - there was absolutely no reason for increased activity. Since we are doing proper research, I decided to start with the facts. How much was the extra blocks processed by incrementals?

I started with this simple query:

select completion_time, datafile_blocks, blocks_read, blocks
from v$backup_datafile
where file# = 1
order by 1
/

Output:

COMPLETIO DATAFILE_BLOCKS BLOCKS_READ BLOCKS
--------- --------------- ----------- ----------
18-JUL-08 524288 32023 31713
19-JUL-08 524288 11652 10960
20-JUL-08 524288 524288 12764
21-JUL-08 524288 524288 5612
22-JUL-08 524288 524288 11089


The columns are:

DATAFILE_BLOCKS - the number of blocks in the datafile at that time
BLOCKS_READ - the exact number of blocks the RMAN incremental backup read
BLOCKS - the numberof blocks it actually backed up

From the above output, a pattern emerges - until Jul 20th, the backup read only a few blocks; but on July 20th, it started scanning the entire file - all the blocks! I checked for a few other datafiles and the story is the same everywhere. With a 4.5 TB database, if the incremental backup reads the datafiles in entirity, then I/O would obviously go for a toss. That explains the I/O and time issue.

But why did RMAN switch from reading a few blocks to the whole file that day? The #1 suspect is Block Change Tracking. The 10g feature BCT allows RMAN to scan only the changed blocks and not the entire file. We use that. So, did something happen to make that disappear?

to answer, I issued a modified query:

select completion_time, datafile_blocks, blocks_read, blocks, used_change_tracking
from v$backup_datafile
where file# = 1
order by 1
/


Output:

COMPLETIO DATAFILE_BLOCKS BLOCKS_READ BLOCKS USE
--------- --------------- ----------- ---------- ---
18-JUL-08 524288 32023 31713 YES
19-JUL-08 524288 11652 10960 YES
20-JUL-08 524288 524288 12764 NO
21-JUL-08 524288 524288 5612 NO
22-JUL-08 524288 524288 11089 NO


Bingo! The BCT use ceased from the 20th July date. That was what caused the whole file to be scanned. But why was it stopped? No one actually stopped it.

Investigating even further, I found from the alert log of Node 1:

Sun Jul 20 00:23:52 2008
CHANGE TRACKING ERROR in another instance, disabling change tracking
Block change tracking service stopping.


From Node 2:

Sun Jul 20 00:23:51 2008
CHANGE TRACKING ERROR in another instance, disabling change tracking
Block change tracking service stopping.

Alert log of Node 3 showed the issue:

Sun Jul 20 00:23:50 2008
Unexpected communication failure with ASM instance:
ORA-12549: TNS:operating system resource quota exceeded
CHANGE TRACKING ERROR 19755, disabling change tracking
Sun Jul 20 00:23:50 2008
Errors in file /xxx/oracle/admin/XXXX/bdump/xxx3_ctwr_20729.trc:
ORA-19755: could not open change tracking file
ORA-19750: change tracking file: '+DG1/change_tracking.dbf'
ORA-17503: ksfdopn:2 Failed to open file +DG1/change_tracking.dbf
ORA-12549: TNS:operating system resource quota exceeded
Block change tracking service stopping.

The last message shows the true error. The error was “operating system resource quota exceeded”, making the diskgroup unavailable. Since the ASM diskgroup was down, all the files were also not available, including BCT file. Surprisingly, Oracle decided to stop BCT altogether rather than report it as a problem and let the user decide the next steps. So block change tracking was silently disabled and the DBAs didn't get a hint of that. Ouch!

Resolution

Well, now that we discovered the issue, we took the necessary steps to correct it. Because of the usual change control process, it took some time to have the change approved and put in place. We executed the following to put the BCT file.

alter database enable block change tracking using file '+DG1/change_tracking.dbf'

The entry in alert log confirms it (all all nodes)

Block change tracking file is current.

But this does not solve the issue completely. to use block change tracking, there has to be a baseline, which is generally a full backup. We never take a full backup. We always take an incremental image copy and then merge to a full backup on a separate location. So, the first order of business was to take a full backup. After that we immediately took an incremental. It took just about an hour, down from some 18+ hours earlier.

Here is some analysis. Looking at the backup of just one file - file#1, i.e. SYSTEM datafile:

select COMPLETION_TIME, USED_CHANGE_TRACKING, BLOCKS, BLOCKS_READ
from v$backup_datafile
where file# = 1
order by 1
/

The output:

COMPLETIO USE BLOCKS BLOCKS_READ
--------- --- ---------- -----------
18-AUG-08 NO 31713 524288
18-AUG-08 NO 10960 524288
20-AUG-08 NO 12764 524288
21-AUG-08 NO 5612 524288
22-AUG-08 NO 11089 524288
23-AUG-08 NO 8217 524288
23-AUG-08 NO 8025 524288
25-AUG-08 NO 3230 524288
26-AUG-08 NO 6629 524288
27-AUG-08 NO 11094 524288 <= the filesize was increased 28-AUG-08 NO 3608 786432 29-AUG-08 NO 8199 786432 29-AUG-08 NO 12893 786432 31-AUG-08 YES 1798 6055 01-SEP-08 YES 7664 35411

Columns descriptions:

USE - was Block Change Tracking used?
BLOCKS - the number of blocks backed up
BLOCKS_READ - the number of blocks read by the backup

Note, when the BCT was not used, the *entire* file - 524288 blocks - were
being read every time. Of course only a percent of that was being backed up
since that percentage changed; but the whole file was being checked.
After BCT, note how the "blocks read" number dropped dramatically. That is
the magic behind the dropped time.

I wanted to find out exactly how much I/O savings BCT was bringing us. A simple query would show that:

select sum(BLOCKS_READ)/sum(DATAFILE_BLOCKS)
from v$backup_datafile
where USED_CHANGE_TRACKING = 'YES'
/

The output:
.09581342

That's just 9.58%. After BCT, only 9.58% of the blocks of the datafiles were scanned! Consider the impact of that. Before BCT, the entire file was scanned for changed blocks. After BCT, only about 9.58% of the blocks were scanned for changed blocks. Just 9.58%. How sweet is that?!!!

Here are three representative files:

File# Blocks Read Actual # of blocks Pct Read
----- ------------- ------------------- --------
985 109 1254400 .009
986 1 786432 .000
987 1 1048576 .000


Note, files 986 and 987 were virtually unread (only one block was read). Before BCT, all the 1048576 blocks were read; after BCT only 1 was. This makes perfect sense. These files are essentially older data; so nothing changes there. RMAN incremental is now blazing fast because it scans less than 10% of the blocks. The I/O problem disappered too, making the database performance even better.

So, we started with some random I/O issue, causing a node failure, which led to increased time for incremental, which was tracjed down to a block change tracking file being suspended by Oracle silently without raising an error.

Takeaways:

The single biggest takeway you should get is that just because it is defined, don't get the idea it is going to be there. So, a periodic check for the BCT file is a must. I will work on developing an automated tool to check for non-use of BCT file. The tool will essentially issue:

SELECT count(1)
FROM v$backup_datafile
where USED_CHANGE_TRACKING = 'NO'
/

If the output is >1, then an alert should be issued. Material for the next blog. Thanks for reading.

Labels: , , , , , , , ,

Saturday, August 23, 2008

Diagnosing Library Cache Latch Contention: A Real Case Study

The other day, our data warehouse database was effectively down. The database was up and running; but all connection attempts to the database would just hang. Since connections just failed, the DBA couldn’t even get in to see if the sessions connected were running normally or not. The ETL team reported that the processing slowed down to a crawl. The DBA, as instructed, went on to check the wait events; but since she couldn’t even log in, she couldn’t do that. Interestingly, CPU was around 70%, normal for the time of the day; and I/O was about 90%, again normal.

So, in simple terms – she was stuck and she applied the age-old technique: when in doubt, reboot. She asked the SA to reboot the server. It took about 30 minutes; and after it cam backup and the database started everything seemed normal for about 10 minutes. And, then the same problem stuck – the database got unresponsive.

At that point, I was called for help. In this blog I am sort of trying to paint a picture of what I did in the next half hour or so to resolve the issue. I hope by giving this running commentary of the events, facts and figures, you will follow my thoughts as I went about diagnosing the issue.

Symptoms

(1) Database connections were hanging
(2) Connections on the server using SQL*Plus as sysdba was also hanging; so we couldn’t even examine the issue.
(3) The system couldn’t be rebooted everytime; every reboot means more work for the ETL folks to clean up the failed sessions.

Initial Analysis

Obviously something is holding some resource back in the database; but what? Unless we connected to the database, we couldn’t check. And since couldn’t check, we couldn’t remediate. It was a classic case of Catch-22.

Action

This is one of those cases where a backdoor entry to the database instance comes really handy. Most people are not aware of the option in SQL*Plus called “prelim”. This option connects to the SGA; but does not open a session.

(1) So, the first thing I did is to open a sessionless SQL*Plus session using the Oracle user:

$ sqlplus -prelim
SQL>

Note, it didn’t say anything familiar like “Connected to Oracle Database 10.2.0.3”, etc. All it showed was the “SQL>” prompt. That was because it didn’t actually connect to the database.

(2) Then I used the “oradebug” utility to analyze the SGA

SQL> oradebug setmypid
SQL> oradebug hanganalyze 12

This produced a tracefile in the user_dump_dest directory. The file wasn’t difficult to find, since it was the last file created. Even if I didn’t find the file, I could have used the process ID to find the file. The file would have been named crmprd1_ora_13392.trc, assuming 13392 was the process ID.

(3) Let’s examine the file. Here are first few lines:

*** 2008-08-23 01:21:44.200
==============
HANG ANALYSIS:
==============
Found 163 objects waiting for
<0/226/17/0x1502dab8/16108/no>
Open chains found:
Chain 1 : :
<0/226/17/0x1502dab8/16108/no>
<0/146/1/0x1503e898/19923/latch:>

This tells me a lot. First it shows that the SID 146 Serial# 17 is waiting for library cache latch. The blocking session is SID 226 Serial# 17. The latter is not waiting for anything of blocking nature. I also noted the OS process IDs of these sessions – 16108 and 19923.

(4) Next I checked for two more tracefiles with these OS PIDs in their names.

crmprd1_ora_16108.trc
crmprd1_ora_19923.trc

(5) I opened the first one, the one that is the blocker. Here are the first few lines:

*** 2008-08-23 01:08:18.840
*** SERVICE NAME:(SYS$USERS) 2008-08-23 01:08:18.781
*** SESSION ID:(226.17) 2008-08-23 01:08:18.781
LIBRARY OBJECT HANDLE: handle=c0000008dc703810 mtx=c0000008dc703940(8000) cdp=32737
name=UPDATE DW_ETL.FRRS_PROFILER SET CONSUMER_LINK = :"SYS_B_0", ADDRESS_LINK = :"SYS_B_1", ADDRESS_MATCH = :"SYS_B_2", PROC
ESSED=:"SYS_B_3" WHERE RNUM = :"SYS_B_4"
hash=a029fce7bb89655493e7e51a544592a4 timestamp=08-23-2008 00:10:23
namespace=CRSR flags=RON/KGHP/TIM/OBS/PN0/MED/KST/DBN/MTX/[504100d0]
kkkk-dddd-llll=0000-0001-0001 lock=N pin=0 latch#=10 hpc=0058 hlc=0058
lwt=c0000008dc7038b8[c0000008dc7038b8,c0000008dc7038b8] ltm=c0000008dc7038c8[c0000008dc7038c8,c0000008dc7038c8]
pwt=c0000008dc703880[c0000008dc703880,c0000008dc703880] ptm=c0000008dc703890[c0000008dc703890,c0000008dc703890]
ref=c0000008dc7038e8[c0000008dc7038e8,c0000008dc7038e8] lnd=c0000008dc703900[c0000008dc703900,c0000008dc703900]
LOCK OWNERS:
lock user session count mode flags
---------------- ---------------- ---------------- ----- ---- ------------------------
c0000008d079f1b8 c0000006151744d8 c0000006151744d8 16 N [00]
c0000008d4e90c40 c0000006151bcb58 c0000006151bcb58 16 N [00]
c0000008d0812c40 c0000008151a0438 c0000008151a0438 16 N [00]

(6) This is a treasure trove of information for debugging. First it shows the SID and Serial# (226.17), which confirms the SID we identified earlier. It shows the exact SQL statement being used. Finally it shows all the locks. I didn’t particularly care about the specifics of locks; but it gave me enough information to prove that the SID 226 was causing a wait for a whole lot of other sessions.

(7) My investigation is not done; I need to find out the sessions waiting for this. So, I searched the file for a section called “PROCESS STATE”. Here is a snippet from the file:

PROCESS STATE
-------------
Process global information:
process: c00000081502dab8, call: c000000817167890, xact: 0000000000000000, curses: c00000081519ef88, usrses: c000000815
19ef88
----------------------------------------
SO: c00000081502dab8, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=370, calls cur/top: c000000817167890/c000000817167890, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 115 0 4
last post received-location: kslpsr
last process to post me: c000000615002038 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: c000000615002038 1 6
(latch info) wait_event=0 bits=20
holding (efd=4) c0000008d7b69598 Child library cache level=5 child#=10
Location from where latch is held: kglhdgc: child:: latch
Context saved from call: 13
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
291 (197, 1219468295, 197)
279 (197, 1219468295, 197)
374 (197, 1219468295, 197)
267 (197, 1219468295, 197)
372 (197, 1219468295, 197)
... several lines sniped ...
307 (15, 1219468295, 15)
181 (6, 1219468295, 6)
waiter count=58
Process Group: DEFAULT, pseudo proc: c0000008e03150d8
O/S info: user: oracrmp, term: UNKNOWN, ospid: 16108
OSD pid info: Unix process pid: 16108, image: oracle@sdwhpdb1

(8) This told me everything I needed to know. There were 58 sessions waiting for library cache latch held by SID 226. I also knew the OS Process ID and the SQL statement of the blocking session.

(9) At that time we engaged the Application Owner to explain what was going on. As he explained it, he issues the update statement in a loop. And that’s not all; he executes it in 8 different threads. No wonder we have had library cache latch contention. So, we had to track 8; not just one session. We trudged on. All the sessions dumped their information. So, I searched the directory for some other files with the same issues:

$ grep “UPDATE DW_ETL” *.trc

(10) And I found 9 more sessions (or, rather, processes). Here is a snippet from another file:

350 (167, 1219470122, 167)
197 (167, 1219470122, 167)
waiter count=185
Process Group: DEFAULT, pseudo proc: c0000008e03150d8
O/S info: user: oracrmp, term: UNKNOWN, ospid: 16114

This process had 185 waiters! Ouch!

(11) Now comes a decision point. I knew who is blocking and who were being blocked; although I didn’t yet know what latch exactly is being contented for. I could have dumped the library cache latches to get that information; but the application owner volunteered to terminate the sessions. The application, fortunately, was restartable. So, we decided to kill all of these errant sessions on the unix prompt.

$ kill -9

(12) After killing a few processes, the database started responding. After killing all of them, the database wait events came back to completely normal. Connections were established and applications started behaving normally.

After step 11, I could have used the library cache dump to examine the exact library element in the center of the contention; but that’s a topic for another blog.

Takeaways

(1) When something seems to be “hung”, don’t get hung up on that. A session almost always waits for something; rarely it is just “hung”. You should check what it is waiting for by selecting the EVENT column of V$SESSION (10g) or V$SESSION_WAIT (9i).
(2) When you can’t logon to the database to get the information, try using oradebug command.
(3) To use oradebug, you should use SQL*Plus. Since you can’t login, use “sqlplus -prelim" to get the SQL prompt.
(4) Use oradebug setmypid to start the oradebug session, and then use oradebug hanganalyze to create a dumpfile of all hang related issues
(5) Use oradebug help to see all oradebug commands

Labels: , , , , , , ,

Tuesday, August 19, 2008

Put External Procedures on a Different Listener

As a best practice, I recommend placing external procedures, if you use them, on a different listener.ora file. Why so?

External Procedures are actually O/S level programs executed by the listener, as the user "oracle" (or whatever the username for Oracle software is).

Let me repeat that: external procs allow database sessions to execute O/S level programs with the privilege level of the Oracle user! Do you see a problem here? Since the programs run as "oracle", they can do anything the user can do at the command line: "ls -l", "rm listener.log", or even "rm datafiles". Do you want that? Of course not.

Unfortunately there are several vulnerabilities that exist which exploit this particular feature. the CPU patches address some of them; but the bad guys respond by exposing even more holes. In this cat and mouse game, the best thing, in my opinion, is to reduce or even eliminate the possibility if feasible. If you don't use external procs (most people don't), why put them on the listener.ora file? Unfortunately the default config puts the external procs; so you should remove them and you can do that easily.

If you must use external procs, then I suggest using a different listener for that. Doing so allows you to shutdown that functionality while allowing normal database connectivity. If you perceive an imminent threat, you can take evasive action by shutting down the ext proc listener. You can't do that if the external jobs are on the same listener.

Again, consider this: what is the harm is doing that? Nothing. and what is the benefit? I just showed you, however small that may be. So, if you lost nothing and potentially gain something, why not do it?

Labels: , , , , , ,

Why should you set the ADMIN_RESTRICTIONS_LISTENER to ON

Recently someone probably went through the slides of my session on "Real Life DBA Best Practices" and had a question on OTN forum why I was recommending setting the parameter to ON, as a best practice. I responded on the forum; but I feel it's important enough to put it here as well.

As a best practice, I recommend setting this parameter to ON (the default is OFF). But as I profess, a best practice is not one without a clear explanation. Here is the explanation.

Over the period of time, the Oracle Database has encountered several security vulnerabilities, some of them on the listener. Some are related to buffer overflow. others involve unauthorized access into the listener process itself. Some of the listener access exploits come from external listener manipulations. Did you know that you do not need to even log into a server to connect to the listener? As long as the port the listener is listeneing on is open (and it will be, for obvious reasons) you can connect to the listener from a remote server.

In 10g, Oracle provided a default mechanism that does not require password from the oracle user manipulating the listener via online commands. Having said that, there have been bugs and there will be. Those vulnerabilities usually get fixed later; but most often the fix does not get to the software quickly enough.

So, what should you do to protect against these vulnerabilities? I consider a simple thing to do is to remove the possibilty altogether; and that's where the admin restrictions come into picture. After setting this parameter, you can't dynamically change the parameter. So, even though a connection is made somehow from an outside server - bug or not - eliminating the possibilty altogether mitigates the risk. And, that's why recommend it.

Let's ponder on the problem a little bit more. Is that a problem is setting the parameter? Absolutely not. When you need to change a parameter, you simply log on to the server, update the listener.ora and issue "lsnrctl reload". This reloads the parameter file dynamically. Since you never stopped the listener, you will not see unsuccessful conection requests from clients. So, it is dynamic. If you are the oracle user, then you can log on to the server; so there is no issue there.

I advocate this policy rather than dyanamic parameter changes, for these simple reasons:

(1) It plugs a potential hole dues to remote listener vulnerability attacks, regardless of the probabilty of that happening.
(2) It forces you to make changes to listener.ora file, which shows the timestamp.
(3) I ask my DBAs to put extensive comments on the parameter files, including the listener.ora file, to explain the change. I also ask them to comment a previous line and create a new line with the new value, rather than updating a value directly. This sort of documentation is a gem during debugging. Changing in the parameter file allows that, while dynamic change does not.

So, I don't see a single functionality I lose by this practice; and I just showed you some powerful reasons to adopt this practice. No loss, and some gain, however small you consider that to be - and that's why I suggest it.

As I mentioned earlier, a best practice is not one without a clear explanation. I hope this explanation makes it clear.

Labels: , , ,

Saturday, August 09, 2008

Resource Manager I/O Calibration in 11g

You are considering a new platform for the Oracle database; or perhaps worried about the I/O capabilities of the existing platform. How do you know if the I/O drivers, LUN creation, etc. are capableof handling the I/O load? This is a very important question; I/O is the single most important thing in an Oracle database that can cripple an application, however large or small.


Here comes the Calibrate I/O tool in Oracle Database 11g. There are some really cool features of 11g that never made into the blitzkrieg of Oracle Marketing. This is one of them. This is a procedure in the DBMS_RESOURCE_MANAGER package. It drives some I/O to the database files to test the throughput and much more metrics.


So, what is the big deal? This type of tests can be done by simple unix system as well; or even by little apps from the storage vendors.


The important difference is that the tool uses Oracle database software libraries to drive the I/O; not some generic I/O generator. This capability is important - you are testing the I/O for an Oracle database, right?

So, let's get on with the how-it-works part of the post. You need to have asynch I/O capability otherwise this will fail with the following message:



DECLARE*

ERROR at line 1:ORA-56708: Could not find any datafiles with asynchronous i/o capability

ORA-06512: at "SYS.DBMS_RMIN", line 453ORA-06512: at "SYS.DBMS_RESOURCE_MANAGER", line 1153ORA-06512: at line 7

To enable asynch I/O, set two values in the init.ora file (or spfile) and recycle the instances:



disk_asynch_io = true



filesystemio_options = asynch

The first one is default; so you may not have to change it. The default of the second one is NONE; so you will probably have to change it. After setting these values and recycling the instance, confirm at the file level:



select name,asynch_io from v$datafile f,v$iostat_file i 

where f.file#=i.file_no 

and (filetype_name='Data File' or filetype_name='Temp File')

/









NAME ASYNCH_IO


-------------------------------------------------- ---------
+DG1/labrac/datafile/system.261.648437429 ASYNC_ON

+DG1/labrac/datafile/system.261.648437429 ASYNC_ON

+DG1/labrac/datafile/sysaux.262.648437447 ASYNC_ON


+DG1/labrac/datafile/undotbs1.263.648437455 ASYNC_ON

+DG1/labrac/datafile/undotbs2.265.648437477 ASYNC_ON


+DG1/labrac/datafile/users.266.648437523 ASYNC_ON

+DG1/sras_01 ASYNC_ON

This shows the files have Asynch I/O enabled.

Caution: even if the above query returns ASYNC_ON, you may not be able to run the calibrate process. The culprit is the number of asynch I/O slots available in the system. If there are no free slots, the files will not use asynch I/O and you will get ORA-56708. To find out the maximum avalable asynch I/O slots:


$ cat /proc/sys/fs/aio-max-nr
65536

To find out how many are being used:



$ cat /proc/sys/fs/aio-nr
65536

Well, that sort of tells it all. Of the 65536, all are being used; so we are getting ORA-56708. The reason was I was runninganother database on the server, which was consuming all those slots. After shutting down the other database, I checked the utilization again:

$ cat /proc/sys/fs/aio-nr


35456


The value dropped substantially and allowed my calibration to proceed.

Now start the calibration process. The precedure CALIBRATE_IO of the package DBMS_RESOURCE_MANAGER accepts two parameters:

  • NUM_PHYSICAL_DISKS - the number of physical disks you have. Remember: this is the number for disks, not LUNs.
  • MAX_LATENCY - the maximum latency for the disk access (in milliseconds)

And it has three OUT parameters:

  • MAX_IOPS - the maximum number of I/O Operations per second
  • MAX_MBPS - the maximum Megabytes per second
  • ACTUAL_LATENCY - the actual latency observed during the test

Since this has OUT parameters, we will have to write a little PL/SQL program to get the values.

set serveroutput on

declare

l_latency integer;

l_iops integer;

l_mbps integer;

begin

dbms_resource_manager.calibrate_io (

18, /* # of disks */

10, /* max latency */

l_iops, /* I/O Ops/sec */

l_mbps, /* MBytes/sec */

l_latency /* actual latency */

);

dbms_output.put_line ('I/O Ops/sec = ' l_iops);

dbms_output.put_line ('Actual Latency = ' l_latency);

dbms_output.put_line('MB/sec = ' l_mbps);

end;

/


When this is going on, you can check the status:






SQL> select * from v$io_calibration_status

2 /

STATUS
-------------
CALIBRATION_TIME
----------------
IN PROGRESS


This shows that the operation is still going on. After some time you will see the status change to READY. Here is the output from my system (a RAC database with 2 nodes, each with 4 CPUs, with a Dell NAS, with 28 disks).


I/O Ops/sec = 122
Actual Latency = 15
MB/sec = 13

PL/SQL procedure successfully completed.


So, it shows us that the actual latency is somewhat like 15 milliseconds. The system can sustain a maximum of 122 I/O Ops per second with a throuput of 13 MB/sec.


You can check the results of the I/O Calibration run later from the data dictionary view DBA_RSRC_IO_CALIBRATE. Let's check this view. Since it has only one row and there are many columns, I used the very handy Print Table utility by Tom Kyte.



SQL> exec print.tbl('select * from dba_rsrc_io_calibrate');

START_TIME : 09-AUG-08 10.49.34.440392 AM
END_TIME : 09-AUG-08 11.04.33.446146 AM
MAX_IOPS : 122
MAX_MBPS : 13
MAX_PMBPS : 4
LATENCY : 15
NUM_PHYSICAL_DISKS : 28



The columns are self explanatory, in line with the previous output, except the MAX_PMBPS, which shows the maximum sustainable I/O in MB/s for a single process.

This output is great; but it is an aggregation. You can check another view V$IOSTAT_FILE for file-level metrics in calibration run:


select file_no,small_read_megabytes,small_read_reqs,

large_read_megabytes,large_read_reqs

from v$iostat_file






Output:







FILE_NO SMALL_READ_MEGABYTES SMALL_READ_REQS LARGE_READ_MEGABYTES LARGE_READ_REQS
---------- -------------------- --------------- -------------------- ---------------
1 267 33234 230 256 1 0 6 0 2
2 514 50337 318 331
3 121 15496 118 118
4 44 5661 43 43
5 1 83 0 0 6 46 5846 45 45

In addition to the columns, this view has a lot more columns. Here is the output for a single datafile:


SQL> exec print.tbl('select * from v$iostat_file where file_no = 1')


FILE_NO : 1


FILETYPE_ID : 2


FILETYPE_NAME : Data File


SMALL_READ_MEGABYTES : 267


SMALL_WRITE_MEGABYTES : 11


LARGE_READ_MEGABYTES : 230


LARGE_WRITE_MEGABYTES : 0


SMALL_READ_REQS : 33235


SMALL_WRITE_REQS : 1083


SMALL_SYNC_READ_REQS : 33013


LARGE_READ_REQS : 256


LARGE_WRITE_REQS : 2


SMALL_READ_SERVICETIME : 2650538


SMALL_WRITE_SERVICETIME : 10676


SMALL_SYNC_READ_LATENCY : 2647154


LARGE_READ_SERVICETIME : 2745759


LARGE_WRITE_SERVICETIME : 10


ASYNCH_IO : ASYNC_ON


RETRIES_ON_ERROR : 0


-----------------


Using these merics you can decide if the I/O is sufficient or still needs more tweaking.

Recently I used it on a system that required Oracle 10g. But I installed 11g on that server and ran this calibration test, which unearthed some inherent issues with the I/O system. Once rectified, and reconfirmed with the calibration tool, it was ready for the 10g installation.


Labels: , , ,

Thursday, August 07, 2008

Full Table Scans on RAC

The phone goes off, yet again, and this time it seems to be a serious problem. Here are the symptoms: all the applications seems to be working fine albeit at a lesser performance level, except one app, which has crawled to a halt. The app has abended since the queries are timing out. This is, by the a way a RAC database.

This is rather interesting - only one application is slow, instead of all the applications. If this were a systemwide issue, then all the apps would have been affected; not just one. The DBA checked the wait events, as she should have and the predminant events were "gc buffer busy". Well, this is a systemwide issue as a result of saturated interconnect, isn't it - she muses; then how come application A is the only one affected?

So, this is a big clue: the waits are RAC related - "gc buffer busy", yet only one application is slow. First, I got the SIDs of the sessions waiting for that event:

select SID, sql_id, sql_child_number
from v$session
where event = 'gc buffer busy';

After I got the SIDs, I checked the SQL statements issued by the sessions:

select sql_text
from v$sql
where sql_id = ''
and child_number = 24

The results were interesting - all of them were updating or selecting from a single table. Next I checked the execution plan for the SQLs:

select * from table (
dbms_xplan.display_cursor (
'9h1r67dtjxfbs',24,'BASIC'
))
/

PLAN_TABLE_OUTPUT
------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select ...

Plan hash value: 871755554
---------------------------------------------------------------------------------
Id Operation Name
---------------------------------------------------------------------------------
0 SELECT STATEMENT


The output tells it all - there is a full table scan going on for a 14 million row table. Well, why? I checked for the presence of indexes and yes, there is an index. Why isn't the index being used?

This is yet another case of "why my index is not being used". It happens at a relatively lesser frequency; but happens nevertheless.

Upon examination I discovered that the table is range partitioned and the index is local. Next I checked if the index has become unusable somehow:

select partition_name, status
from dba_ind_partitions
where index_name = 'the_inex_name';


PARTITION_NAME STATUS
------------------------------ --------

PT_SEP2009 USABLE
PT_OCT2009 UNUSABLE
PMAX UNUSABLE

Note the status of the last two partitions - UNUSABLE. Why so?

I checked the last_ddl_time:

select last_ddl_time
from dba_objects
where object_name = 'the_index_name';

And voila! The index was altered today. Why? The answer came quickly: there is a partition maintenance script that splits the PMAX partition to PMAX and a partition named PT_OCT2009. The split worked fine; but the DBA forgot to add a line like:

alter index this_index_name rebuild partition PMAX;

So, the partition remained unusable. Remember, this is a local index; so the clause MAINTAIN GLOBAL INDEXES will not do anything. The partition must be explicitly rebuilt. Since the partition was not usable, the query plan suddenly changed to full table scan, as I saw earlier. Since this caused a lot of buffer gets, the buffer pool of all the instances of the RAC cluster got flloded with the blocks of this table, causing gc buffer busy waits. So, even though the wait event is RAC related, the issue was due to abnormally large buffer gets and was limited to a specific application only. Eventually, the issue would have saturated the interconnect and affected other apps; but since we caught it early, that event was averted.

I rebuilt the index partitions by explicitly issuing the alter index commands. After that, the queries started using the index and, of course, the gc waits disappeared.

Lessons:

(1) just because you see "gc" waits, don't automatically assume it is RAC specific issue. The underlying issue may have manifested itself as the RAC component issues.
(2) Add to the list of "why oracle is not using my index" one more: unusable index partition(s)
(3) When you split partition of a table, in addition to the clause "maintain global indexes", issue the rebuild of the split partitions of local indexes.

Labels: , , , ,

Monday, August 04, 2008

Issues in RAC with Wrong Listener Configuration

The one thing that frustrates me the most is the lack of clear communication. Without clarity in what is being transpired, it's impossible for the receiver to understand the meaning - not just the superficial content but the subtle undertones as well. And it fails to materialize a manageable goal. I could be rambling about humans, of course; but the issue in communication goes beyond - to the things we create - writings, books, articles, poems, software programs, whatever. Oh, did I mention software programs?

Well, in this case I am talking about the software programs we all live by - the Oracle database; more specifically the RAC. Recently we suffered an outage. The reason turned out to be a simple issue of human forgetfulness; but the frustrating part was the lack of communication from the CRS about the issue. So, here is how it went.

This is a three node RAC. We applied the RAC bundle patch in a rolling fashion. Node 3 was brought down, patch applied and was brought up - everything was hunky dorey.

Then node 2 was brought down, patch applied and was brought up; but here was a little problem - the VIP didn't come up. The VIP relocated to node 3. But all the 23 services defined on the node were online. Strange. We assumed the issue was with the bundle patch ina rolling manner; so we decided to go ahead and do the deed on node 1, the remaining one.

When node 1 was brought up, we saw the real issues. The VIP for node 1 came up on node 3 and none of the services were up on node 1. Worse, the services that were up on node 2 suddenly went offline.

A lot of frantic calls to Oracle Support, request for esaclation to duty manager followed, along with the customary yelling and sioghing by the customer. At the end of 2 hours and with a Sev 1 SR, we hadn't gone anywhere.

At this time I took an executive decision. I asked the DBA to issue an ALTER SYSTEM comamnd on node 2 to bring up the services manually, bypassing CRS.

ALTER SYSTEM SET SERVICE_NAME = 'X','Y','Z',....

All the services where up on the instance 2 at this time, confirmed by SHOW PARAMETER SERVICE_NAMES command; but they still showed up as OFFLINE in crs_stat command. That meant, the instance had no issue with the services; the CRS had.

Then we issued a lsnrctl reload command to relaod the listener (note, we didn't use CRS to manipulate the listener - a very important clue you will see why later). The listener started fine with all service names. So far so good. We tried a few connections from the client and they connected fine. We did the same for node 1 and it worked too.

So, at this time, we had the services all up and running; but CRS had no idea about them. That meant the service failover probably wouldn't occur.

Next, we reloacted the VIP using the crs_relocate command. Al the VIPs (of node 1 and node 2) came back to the respective nodes. That was strange as well. If there was an issue with VIPs related to the networking software or hardware, how could they come back?

All these time we analyzed all the logs - the css, crs, racg alogs along with the alert log of the CRS (this is 10.2; so we have the alert log for CRS as well). Nothing there to indicate why it happened. And that's the basis of my rambling about lack of communication.

So, as always we went through the changes in the last few weeks. One of the crucial changes was the introduction of local_listener and remote_listener parameters in init.ora. These parameters can be placed as absolute values - with the host name, ports, etc. - or as the TNS alias entries. We had chosen the latter. The local_listener parameter was ok; but the remote_listener was listener_xxx.

Here was the entry for listener_xxx on tnsnames.ora:

listeners_xxx = (address_list = (address = (protocol=tcp)(host=sxxx1)(port=1521)) (address = (protocol=tcp)(host=sxxx2)(port=1521)) (address = (protocol=tcp)(host=sxxx3)(port=1521)))

There is nothing wrong with the syntax; but note the host value; sxxx1, sxxx2 and sxxx3 are all staging servers; and this is a production cluster. The communication does not work between prod and staging. So, the remote listener parameter pointed to hosts that were not even reachable! By the way, the entries were correct in node 3.

This caused the services to fail when started. Well, then how come they came up fine with ALTER SYSTEM?

My guess is since the ALTER SYSTEM command didn't go through the listener, it was not affected. CRS, on the other hand, goes through the listener.

Fine; but how come my lsnrctl reload command work? Again, my guess is, that comamnd simply starts the listener; the remote listener parameter in the database doesn't affect it. Since the local_listener parameter was correct, the services were successfully registered.

What about the VIP? How did the relocate work? I can only guess, of course. The conjecture is that the VIP is forced to go to the preferred node unless there is some sort of hardware issue, which was not in our case. So the relocate worked.

Phew! all these due to the sloppy work by a DBA. While this was deplorable, the issue with human error will continue to exist. Typos and wrong copy-and-paste will happen, no matter how much severe we make the punishment. I strongly feel CRS should have pointed out this error somewhere. In all that gets spewn out on CRS and CSS logs, how much is actually decipherable? Almost 99% has no meaning attached to it, at least for mere mortals. The idea behind the alert log of the CRS was to show the customer relevant issues in a common log; but there was no indication of this issue.

See the point about communication?

Labels: , , , ,

Wednesday, July 30, 2008

My Little World in Open World

Oracle Open World is coming up in just two months ... actually sooner. It will be quite active for me.

First, I am sharing the stage with a panel with some very distinguished folks - Lewis Cunningham, Eddie Awad, Mark Rittman, Tim Hall, Hans Forbich, Rich Niemiec, and Bradley Brown. This is a real session - S300480 "Oracle ACE Director Birds-of-a-Feather Tips and Techniques" but the format is free question and answer session.

On one hand I feel deeply honored and thrilled to be on the same forum that reads like a who's who in the Oracle community. But on the other hand, the abstract says "Attendees are encouraged to submit their hardest questions in a game of Stump the Chump" Huh? Anyway, it's going to be thrilling nonetheless! Come see us at 10:30 AM on Sep 25th Room 301 Moscone South.

Then, I have three sessions, yes three! All three were selected by user voting in Oracle Mix.

  1. Tue the 23rd at 5:00 PM: Real World Best Practices for DBAs (S301683) Room 236 Moscone South
  2. Wed the 24th at 5:00 PM: 11g New Features for DBAs (S301684) Room 236 Moscone South
  3. Thu the 25th at 12:00 noon: Partitioning: When, Which and How (S301710) Room 200 Moscone South
I also have an ACE Directors' Meeting someday, probably on Sunday. Well, I will have my hands full. As always, Exhibition hall will be a place I will spen a lot of time. I get to know about new products and technology, some of which actually debut during Open World. So, it leaves a tough question - which sessions I should attend.

Labels: , , , , ,

Wednesday, January 16, 2008

Comparing CLOBs

What if you have two CLOB columns to compare? A construct like

select someColumn
from tab1
where col1 != col2

will not work.

This is how you will need to compare:

select AUDIT_PAYLOAD_TEXT_DEC, AUDIT_PAYLOAD_TEXT
from audit_payload
where dbms_lob.compare(AUDIT_PAYLOAD_TEXT_DEC, AUDIT_PAYLOAD_TEXT) != 0

The return value is non-zero if the comparison fails.

Labels: , , ,

Tuesday, July 31, 2007

Auditing Nuances

I had posted it earlier on my other blog - Oracle Security Musings (http://www.proligence.com/blog/index.html). I reproduce here.

This came up in the Oracle-L maiilng list, which I used to frequent and I responded to this thread.

In regular Oracle auditing (via the AUDIT command), what happens when the statement that causes the audit trail is rolled back? Does the trail disapper, too?

The answer may startle you. It doesn't! Audit entry generation is not part of the transaction; it might have been implemented through autonomous transactions or some other means; but it is outside the scope of the tranasaction. Here is a little test to prove it.

Make sure that audit_trail is set to DB.

create table atest1 (col1 number, col2 number, col3 number, col4 number);
insert into atest1 values (1,1,1,1);
audit update on atest1 by access;
update atest1 set col1 = 2;

Do NOT commit.

From another session as user SYS,

select action_name, obj_name, ses_actions, returncode from dba_audit_trail;

ACTION_NAME OBJ_NAME SES_ACTIONS RETURNCODE
----------- -------- ----------- -----------------
UPDATE ATEST1 0

The entry is there even if the transaction is not committed.

Now rollback the update and check the audit trail; it will be there.

If the auditing option were BY SESSION, instead of action, the ACTION_NAME would have been SESSION REC and the column SES_ACTIONS would've been"----------S-----". Beware of this side-effect of the audit functionality. It may not be what you wanted to develop!

A recent picture

someone observed that my picture on OTN appears much younger than what I really look. Well, the OTN picture was taken some 4 years ago. Here is one, just days ago.

I hope I appear "old" enough!


RAT in 11g Replacing QA Team?

We are in the process of moving to a different hardware and cluster technology and the quesiton on everyone's mind is: will it perform.

Good question. The QA team has been helping us in definign some QA. Then, one day, the manager of the QA team sees my interview on eWeek: and sends me a quizzical email: "so, you think my team has outlived its useful life?" or words to that effect. and he also wants to know how RAT really is.

Here is my response:

It’s really, really cool! It does not replace what you are doing for [the current buisness project] or for that matter any future development efforts. It does however help in situations like the data center migrations as we are doing now where we can capture the workload from [primary site] and replay every piece in [new site] to see if the hardware and design keeps up or crumbles. Currently we don’t do that for a variety of reasons – the enormous time requirement to create synthetic transactions for [third party tool] being the primary one. RAT will alleviate that and make that testing possible and realistic.

So, your team’s job is more value added; not replaced.

He may have accepted it with a grumble.

Labels: , , , ,

Standby Database Using RMAN

I learn something about Oracle everyday and in the last few days the learnings have been rather painful. I have been trying to create a physical standby database on version 10.1.0.4 on HP-UX. The target is a two node RAC and the standby is a single instance database. the reason for the standby was to move to an ASM storage on a differerent server.

I will try to produce a whole cookbook on the effort; but in the meantime let me explain the pain points.

Like everyone else, I scanned the books, articles, MetaLink, the internet, the socks drawer and broom closet; but couldn't find a good write up on how to perform this task in real life.

For instance, most of the recipes suggested taking a cold backup, something that was impossible since the target is a production database.

Others recipes suggested taking a special backup of the database using RMAN with a special clause - WITH CONTROLFILECOPY AS STANDBY (or something like that; I forgot th exact syntax). Well, duh! the backup is already taken and I can't re-exeute this "special" backup. So down with Option 2.

There was no clear document on how to do it. So, here was my first attempt in a nutshell:

(1) Mount the RMAN backup filesystems on the standby server using NFS
(2) On the standby server, create the pfile
(3) start the instance (nomount)
(4) add the service to the listener
(5) add the service to the tnsnames.ora
(6) on the primary, add the tns entry for the standby
(7) On the primary issue the command

RMAN> backup current controlfile for standby format '/backup/c.ctl';

(8) create the rman file:

connect target /
connect auxiliary sys/manager@odssby
connect catalog rman/rman@catdb
run {
set until scn = 71196399221;
duplicate target database for standby
dorecover;
}

(9) and now comes the golden moment. I run it:

$ rman @r1.rman

and it fails after some time with the error:

RMAN 6024 no backup or copy of the controlfile found to restore

Well, this is wierd. The controlfile has been created on the primary as a standby controlfile. This is also recorded in the catalog. So, what's the deal with the "no backup of controlfile"?

Some Metalink articles even suggested something like

run {
restore controfile from '/backup/c.ctl';
replicate controlfile from ....
}

this could not have worked for me, since I am not creating a new database; I'm creating a standby one.

So, I delvedinto some research. I will spae you the details and cut straight to the chase. The issue was with the time when the controlfile backup was taken. In my case, here is what I did:

Time -> ----+----T1---------T2----------
Arc Log Controlfile
Backup Backup
Taken Created

This cuased the recovery to start at a point earlier to T1, where there was no controlfile backup for standby. Naturally the restore failed.

The solution: simple. Just took another backup of archived logs after time T2. Once the backups were succefully created on the disk, I ran the rman script again and voila! everything works.

The real issue is how RMAN displays messages. Instead of producing a relevant message that points to the issue, it produces a rather cryptic message that gives the impression that some controlfile backup is not present. Well, perhaps in 12g, becuase I know that is not available in 11g.

Resuscitating the Comatose

Well, my foray into world of bloging didn't have a great fanfare but I didn't expect it to die down so soon. Here is my attempt at Blogging version 2 and I intend to keep it up.

Monday, May 30, 2005

"What is a DBA?"

It has been a while since I last posted a blog. Lethargy is mostly the reason behind it; but lack of what to say has been the only one whenever I sat down to inscribe. Watching others post similar ideas is definitely an encouragement and right now the collective thrust of that encouragement is propelling me and here I go ....

This is a question my father asked me twelve years ago when I got my first production support DBA job in Oracle. "An administrator?", he sighed away, completely disapprovingly. He probably expected his only some to be something of value, not exactly a non-descript acronym which included the term "administrator", which is a glorified term for a clerk. My mother was more hopeful, which appeased me a bit; but it quickly evaporated when I learned she would have accepted me in any profession! Several years later I met the woman who would later be my wife, and the same question also puzzled her. Over the period of year several people - my family, friends, business acquaintances, colleagues, even some of my own bosses have fielded the question; and I don't think anyone has been satisfied with a complete comprehensible answer.

Do I develop program? No, not really! Oh, I do architectural stuff, no? Kinda; but not the entire system. So, where does that leave me? Taking care of database somewhat defines it; but for most people database is a black box; taking are of it as as clear as making sense of world hunger. Even for people who do understand what a database is, "taking care" is not something they fathom, or even want to. So time and gain, I have been confronted with the question of my usability and what I actually contribute. I don't produce nice webpages, nothing that even remotely resemble an "application" that people seem to accept; in fact, I don't produce anything. The database hums along just great; so, what exactly do I do every day?!!!

This is not a musing or even anecdotal comments; it has been asked to me and my superiors in a rather unflattering manner.

Over the years, the definition has changed a lot; my role has not necessarily remained the same. So, a rather simplistic definition of database-care is not going to cut it; I needed a more clear and universally acceptable explanation of what I do and what role I play in the business solution delivery process.

I understand that I'm not alone; there are a lot more like me who are asked this question, perhaps at this very moment. Even if they are not overtly expressed in so many words, the questions seems to pop up rather covertly during meetings, happy hours, company parties and sometimes during performance appraisals! No, this is not a rambling! I want to start a discussion on how we DBAs face this question from people all around us - people who matter in our lives and our careers. I will take the first stab in the next post. Please post your thoughts on this topic as you see fit.

Regards,

Arup

Saturday, January 11, 2003

So, here I am, with my very own blog. I hope to post here my thoughts, sentiments, fights and agreements on a varietyof Oracle DBA related work.