Tuesday, April 22, 2014

Restoring Controlfile When AUTOBACKUP Fail

Allow me to present the snapshot of a day from the life of John--the DBA at Acme Bank. On this particular day a database John manages crashed entirely and had to be restored from the backup. He takes regular (backupset) RMAN backups to tape. Since everything--including the controlfile--had crashed, John had to first restore the controlfile and then restore the database. The controlfile is always backed up with the backup database command. John was sure of that. However, restore controlfile from autobackup gave the error:
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece
Without the controlfile, the recovery was stuck, even though all the valid pieces were there. It was a rather alarming situation. Others would have panicked; but not John. As always, he managed to resolve the situation by completing recovery. Interested to learn how? Read on.

Background

Since controlfile was also damaged, the first task at hand was to restore the controlfile. To restore the controlfile, John needs a very special information: the DBID--database identifier. This is not something that would be available until the database is at least mounted. In unmounted state--which is how the database is in right now--John couldn't just go and get it from the database. Fortunately, he follows a best practice: he records the DBID in a safe place.

This is  the command John used to restore the controlfile from the backup. The commands assume the usage of Data Domain Boost, the media management layer (MML) plugin for Data Domain backup appliance; but it could apply to any MML--NetBackup, TSM, etc.

SQL> startup nomount;

RMAN> run {
2>      allocate channel c1 type sbt_tape PARMS 
'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNIT=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>      set dbid = 2553024456;
4>      restore controlfile from autobackup;
5>      release channel c1;
6> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API
sent command to channel: c1

executing command: SET DBID

Starting restore at 22-APR-14

channel c1: looking for AUTOBACKUP on day: 20140422
channel c1: looking for AUTOBACKUP on day: 20140421
channel c1: looking for AUTOBACKUP on day: 20140420
channel c1: looking for AUTOBACKUP on day: 20140419
channel c1: looking for AUTOBACKUP on day: 20140418
channel c1: looking for AUTOBACKUP on day: 20140417
channel c1: looking for AUTOBACKUP on day: 20140416
channel c1: no AUTOBACKUP in 7 days found

released channel: c1

RMAN-00571:===========================================================
RMAN-00569:=============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571:===========================================================
RMAN-03002: failure of restore command at 04/22/2014 16:08:25
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece 

So, RMAN couldn't locate the backup of the controlfile. John knew that by default, RMAN searches only 7 days of backup. Thinking that perhaps the controlfile was somehow not backed up in the last seven days, he expanded the search to 20 days, using the special parameter maxdays, shown below:

RMAN> run {
2>      allocate channel c1 type sbt_tape PARMS 'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNIT=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>      send 'set username ddboostadmin password password servername prolin1.proligence.com';
3>      set dbid = 2553024456;
4>      restore controlfile from autobackup maxdays 20;
5>      release channel c1;
6> }

allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API
sent command to channel: c1
executing command: SET DBID

Starting restore at 22-APR-14
channel c1: looking for AUTOBACKUP on day: 20140422
channel c1: looking for AUTOBACKUP on day: 20140421
channel c1: looking for AUTOBACKUP on day: 20140420
channel c1: looking for AUTOBACKUP on day: 20140419
channel c1: looking for AUTOBACKUP on day: 20140418
channel c1: looking for AUTOBACKUP on day: 20140417
channel c1: looking for AUTOBACKUP on day: 20140416
channel c1: looking for AUTOBACKUP on day: 20140415
channel c1: looking for AUTOBACKUP on day: 20140414
channel c1: looking for AUTOBACKUP on day: 20140413
channel c1: looking for AUTOBACKUP on day: 20140412
channel c1: looking for AUTOBACKUP on day: 20140411
channel c1: looking for AUTOBACKUP on day: 20140410
channel c1: looking for AUTOBACKUP on day: 20140409
channel c1: looking for AUTOBACKUP on day: 20140408
channel c1: looking for AUTOBACKUP on day: 20140407
channel c1: looking for AUTOBACKUP on day: 20140406
channel c1: looking for AUTOBACKUP on day: 20140405
channel c1: looking for AUTOBACKUP on day: 20140404
channel c1: looking for AUTOBACKUP on day: 20140403
channel c1: no AUTOBACKUP in 20 days found

released channel: c1

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 04/22/2014 16:17:56
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece  
No luck; it gave the same error. So--John concluded--it was not an issue with the absence of controlfile backup. Something else caused the backup of controlfile to be invisible. He did, however, know that the controlfiles are backed up along with the regular backups. Without the database in mounted mode, he couldn't find out the location of those controlfile backups. If this database was registered to a catalog, he could have got that information from the catalog; but unfortunately, being a new database, it was not yet registered. That avenue was closed.

He did, however, follow another best practice--saving the rman log files. As a rule, he sends the RMAN output logs to the tape along with the backup. He recalled the most recent backup log and checked the log for the name of the backup piece. Here is an excerpt from the log:

... output truncated ...
channel c8: starting piece 1 at 21-APR-14
channel c8: finished piece 1 at 21-APR-14
piece handle=14p69u7q_1_1 tag=TAG20140421T141608 comment=API Version 2.0,MMS Version 1.1.1.0
channel c8: backup set complete, elapsed time: 00:00:01
channel c5: finished piece 1 at 21-APR-14
piece handle=10p69rhb_1_1 tag=TAG20140421T141608 comment=API Version 2.0,MMS Version 1.1.1.0
channel c5: backup set complete, elapsed time: 00:47:33
channel c6: finished piece 1 at 21-APR-14
... output truncated ... 

Looking at the output, John notes the names of the backup pieces created, listed next to "piece handle"--14p69u7q_1_1, 10p69rhb_1_1, etc.He still did not know exactly which one contained the controlfile backup; but it was not difficult to try them one by one. He tried to get the controlfile from the first backuppiece, using the following command where he used a special clause: restore controlfile from a location.

RMAN> run {
2>      allocate channel c1 type sbt_tape PARMS 'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNIT=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>      set dbid = 2553024456;
4>      restore controlfile from '14p69u7q_1_1';
5>      release channel c1;
6> }

allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API

sent command to channel: c1
executing command: SET DBID
Starting restore at 22-APR-14

channel c1: restoring control file
channel c1: restore complete, elapsed time: 00:01:25
output file name=+prodb3CTL1/prodb3/control01.ctl
output file name=+prodb3DATA1/prodb3/control02.ctl
output file name=+prodb3INDX1/prodb3/control03.ctl

Finished restore at 22-APR-14
released channel: c1  

It worked; the controlfile was restored! If it hadn't worked, John would have tried the other backup pieces one by one until he hit the one with the controlfile backup.

[Update April 27th, 2013 This tip came from a reader Kamil Stawiarski of Poland who is an Oracle Certified Master (http://education.oracle.com/education/otn/KStawiarski.html). Thank you, Kamil] If the specific location of the controlfile backup was not known and the backup is on a disk, John could have used a trick to locate it using the RMAN duplicate command:

c:\> rman auxiliary /

RMAN responds with:

connected to auxiliary database: PROQA3 (not mounted)

Next, John used the following command (he used the auxiliary database name as ORCL completely randomly; any name would have been fine, as long as there is no real instance with that name):

RMAN> duplicate database to orcl backup location='c:\temp\oraback';

Here is the output from RMAN:

Starting Duplicate Db at 27-APR-14
contents of Memory Script:
{
   sql clone "alter system set  db_name =
 ''PROQA3'' comment=
 ''Modified by RMAN duplicate'' scope=spfile";
   shutdown clone immediate;
   startup clone force nomount
   restore clone primary controlfile from  'C:\TEMP\oraback\05P6PAN2_1_1.RMAN';
   alter clone database mount;
}

executing Memory Script

Remember, he has no intention of actually duplication. All he wants to know is the location of the controlfile backup. He gets that from the above output:

restore clone primary controlfile from  'C:\TEMP\oraback\05P6PAN2_1_1.RMAN';

Now he knows the location of the controlfile. He presses Control-C to stop the process. With the location known, he uses the earlier used command restore controlfile from 'location' to restore the controlfile.

[Update Apr 27th, 2013: this tip came from Anuj Mohan, another reader from the US. Excellent tip, Anuj and thank you for sharing]. When RMAN starts, it creates a snaphot controlfile, whose default location is $ORACLE_HOME/dbs. The snapshot controlfile is usually named with .f as an extension.

With the controlfile restored, John mounted the database.

RMAN> alter database mount;
database mounted 

The rest was easy; all he had to do was to issue "restore database" and "recover database using backup controlfile". The first thing John did after the database was mounted was checking the controlfile autobackup setting:
RMAN> show CONTROLFILE AUTOBACKUP;

RMAN configuration parameters for database with db_unique_name prodb3 are:
CONFIGURE CONTROLFILE AUTOBACKUP OFF; #default 

Well, lesson learned. John immediately changed it to ON.

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP on;

new RMAN configuration parameters:

CONFIGURE CONTROLFILE AUTOBACKUP ON;
new RMAN configuration parameters are successfully stored 
Someone suggested that he could have tried to restore the controlfile from the TAG instead of the actual backup piece. Had he attempted the restore from the TAG, he would have got a different error:

RMAN> run {
2>      allocate channel c1 type sbt_tape PARMS 'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNIT=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>      send 'set username ddboostadmin password password servername prolin1.proligence.com';
4>      set dbid = 2553024456;
5>      restore controlfile from tag=TAG20140421T141608;
6>      release channel c1;
7> }

allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API

sent command to channel: c1
 
executing command: SET DBID
  
Starting restore at 22-APR-14
 
released channel: c1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 04/22/2014 16:10:04
RMAN-06563: control file or SPFILE must be restored using FROM AUTOBACKUP

So that command could not have helped John.

No Backup of Controlfile

Let's consider another scenario: there was no backup of controlfile. Dreaded as it sounds, it's still not end of the world. John could create the controlfile from a backup located somewhere else. This special backup could be created by two special commands:
SQL> alter database backup controlfile to '/tmp/cont.dbf';

Database altered.

The above command creates a copy of the controlfile with the data as
of the time of the command. Another command is:

SQL> alter database backup controlfile to trace as '/tmp/cbk.trc' reuse;

Database altered.

This command creates a text file that you can use as SQL statement (after some minor editing) to create a controlfile. The major difference between these two approaches is that the first approach produces a snapshot of the controlfile as of that time, along with all the data--the backup, the archived logs, etc. The second approach creates a brand new "blank" controfile that you must feed to bring up. John uses both options as a Plan B. On another post we will see how he saved the day using these two special controlfile backups.

Takeaways

What did you learn from the story. Here are some key takeaways:
  1. Always write down the DBID of all your databases somewhere. If you use a recovery catalog, it's there; but it's good to note it down separately. This number does not change unless you use NID utility; so recording once is enough.
  2. Always configure controlfile autobackup. The default is OFF; make it ON.
  3. Always save the backup log files. In a crunch, they yield valuable information otherwise not available.
  4. When controlfile is not found, you can use restore controlfile from 'location' syntax in RMAN to pull the controlfile from the location. If that location does not have a controlfile backup, don't worry; just try all available locations. One might contain what you are looking for. You have nothing to lose but everything to gain.
  5. Always use a script for this type RMAN restore activities instead of typing at the prompt. You will find changing data, e.g. various backup locations, easier and make less mistakes that way.
  6. Always create a backup controlfile everyday, even if you don't think you need it. You may someday and you will thank yourself when you do.
Post a Comment

Translate