Tuesday, December 29, 2009

Instance_Number is busy Message during Standby Instance Startup

Recently I encountered a situation that demonstrates how, in an Oracle database, an error may stem from a very unrelated cause. A DBA was building a physical standby database for an upcominmg training. The two servers he was using were part of a RAC cluster; so the Oracle binaries were already there. He decided to use the same ORACLE_HOME for the new databases as well - a quite logical decision to savbe on space and administration issues. He created a primary database on server n1 and a standby database on the server n2. Follooing the usual manual procedure in building the standby database, he copied the pfile from primary database, modified the parameters and brought up the standby instance in nomount mode on the server n2.

SQL> startup nomount pfile=initSTBY.ora

But it refused to come up, with the following error:

ORA-00304: requested INSTANCE_NUMBER is busy

Alert log showed:

USER (ospid: 14210): terminating the instance due to error 304
Instance terminated by USER, pid = 14210

This was highly unusual. The primary and standby both were non-RAC; there was no instance number concept in a non-RAC database. By the way, the RAC instance on that server (or on the server n1) was not running; so there was no question of any conflict with the RAC instances either. The primary database was called PRIM while the standby was called STBY - eliminating the possibility of an instance name clash as well. And this error came while merely trying to start the instance, not even while mounting - eliminating the standby controlfile as a cause.

The error 304 showed:

00304, 00000, "requested INSTANCE_NUMBER is busy"
// *Cause: An instance tried to start by using a value of the
// initialization parameter INSTANCE_NUMBER that is already in use.
// *Action: Either
// a) specify another INSTANCE_NUMBER,
// b) shut down the running instance with this number
// c) wait for instance recovery to complete on the instance with
// this number.

Needless to say, being for a non-RAC database there was no "instance_number" parameter in the initialization parameter file of primary or the standby. So, the suggestions for the resolution seemed odd. MetaLink provided no help. All the ORA-304 errors were related to RAC with the instance_number mismatch.

As it always happens, it fell on my lap at this time. With just days to go live, I had to find a solution quickly. Long hours of troubleshooting, tracing the processes and examination of occasional trace files did not yield any clue. All the clue seemed to point to RAC, which this database was not. The Oracle Home was a RAC home, which meant the oracle binary was linked with the "rac" option.

So, the next logical step was to install a new Oracle Home without the rac option. After doing so, I tried to bring up the instance, using the new ORACLE_HOME and LD_LIBRARY_PATH variable; but, alas, the same error popped up.

Just to eliminate the possibility of some unknown bug, I decided to put an instance_number parameter, setting it to "1", from the default "0". The same error. I changed it to "2", again, the result was the same error.

Although this didn't help, at least it gave a clue that the error was not related to instance_number. The error message was clearly wrong. With this in mind, I went back to the basics. I went through the alert log with a fine toothed comb, scanning and analyzing each line.

The following line drew my attention:

DB_UNQIUE_NAME STBY is not in the Data Guard configuration

This was odd; the db_unique_name STBY was not defined in DG configuration. [BTW, note the spelling of "unique" in the message above. That is not what I typed; it was a copy and paste from the actual message in the alert log. Someone in Oracle Development should really pay atytention to typos in messages. This is clearly more than a nuisance; what if some process scans for db_unique_name related errors? It will not find the message at all!]

Checking the dg configuration, I found that the DBA has correctly defined the primary and standby names. In any case, Data Guard has not been started yet; this is merely at the instance startup - why is it complaining for data guard configuration at this time.

Perplexed, I resorted to a different approach. I renamed the pfile and all other relevant files. Then I built the standby myself, from scratch - using the same names - PRIM and STBY. And this time, everything worked fine. The instance STBY did come up.

While this solvbed the urgency problem, everyone, inclduing myself, wanted to know what the issue was in the earlier case where the DBA had failed to bring up the instance. To get the answer, I compared the files I created with the DBA created when tried and failed. Voila! The cause was immediately clear - the DBA forgot to put a vital parameter in the pfile of the standby instance:

db_unique_name = 'STBY'

This parameter was absent; so it took the default value as the db_name, which was "PRIM". This caused the instance to fail with a seemingly unrelated message - "ORA-304 Instance_number is busy"!

Learning Points
  1. In Oracle, most errors are obvious; but some are not. So, do not assume the error message is accurate. If all logic fails, assume the error messsage is wrong, or at least inaccurate.
  2. Instead of over-analyzing the process already followed, it may make sense to take a breather, wipe out everything and start fropm scratch. This is evben mor effective when someone else does it, offering a fresh approach and possibly not repeating the same mistakes.
  3. Finally, the issue at hand: if you do not define db_unique_name parameter in the standnby instance, you will receive ORA-304 during instance startup.

Hope this was helpful. Happy New Year, everybody.

7 comments:

Joel Garry said...

Sure enough:

/oracle/app/oracle/product/10.2.0/db_1/bin$ grep UNQIUE oracle
Using log_archive_dest parameter standby_archive_dest valueLOG_ARCHIVE_DUPLEX_DESTWarning: LOG_ARCHIVE_DUPLEX_DEST invalidLOG_ARCHIVE_DUPLEX_DEST destination string exceeds length limitLOG_ARCHIVE_DUPLEX_DEST destination cannot be translatedWarning: LOG_ARCHIVE_DUPLEX_DEST deferredLOG_ARCHIVE_DUPLEX_DEST destination duplicates %sbuffer for concatenated stringskcrrdini.1kcrrdini.2Archive destination %s invalidatedDB_UNQIUE_NAME %s is not in the Data Guard configurationDB_UNIQUE_NAME attribute is requiredenablealternateUsing %s parameter default value as %*s%s: Warning; less destinations available than specifiedby LOG_ARCHIVE_MIN_SUCCEED_DEST init.ora parameterUsing %s parameter default value as %.*sUsing %s parameter default value as %sLOCATION=LOG_ARCHIVE_CALLOUT buffer_LOG_ARCHIVE_CALLOUTChanging _LOG_ARCHIVE_CALLOUT from '%s' to '%s'

word: dimagie

The Human Fly said...

Interestingly, few weeks ago, I had similar story but in a different way than yours. I was trapped into similar situation when I was trying to restore a database twice on two different nodes of our 6 node RAC environment with similar db name. After wasting couple of hours, I set the db_unique_name to the second restore and it worked perfectly. I must agree with you that the ORA error definitely misleads.

Happy holidays.

Jaffar

20123 said...

我們不是因為快樂而歌唱,而是唱歌使我們快樂 ..................................................

Anonymous said...

很期待新的內容,幫你推個文^^ ........................................

Dmitry said...

you can quickly reproduce the import .sql into mssql process with another utility for data recovery

data recovery ohio said...

Just wanted to drop a comment and say I am new to your blog and really like what I am reading.Some recovery programs have interactive and intuitive user interface, easy to use and thus allows you to have successful recovery without any trouble.A good recovery program is capable of recovering all sorts of lost files from all file versions of Mac OS X and windows.

meldaresearchusa said...

Our non plagiarized college essays writers have been 14 years experience in the field where they have assisted thousands of returning clients in attaining high scores in their college research paper services.

Translate