Monday, September 01, 2008

Magic of Block Change Tracking

The other day, one of the instances of our main 3-node RAC database crashed due to an I/O error. Apparently there were many "dead paths" from the host to the SAN, which made some LUNs in the ASM diskgroup not being recognizable by the ASM instance. Why? Well, that's the topic for another blog. The point behind this one has nothing to do with the reason. All I want to convey that there was a LUN problem on one of the nodes which brought the instance down. Being RAC, the database was serviced from other two nodes - praise God for RAC! - and the users didn't notice it terribly (or, I would like to think that way).

After a few days we noticed the incremental RMAN backup taking a long time. This caused major issues - it took a long time and I/O waits went through the roof. In fact it took increasingly longer every day that passed by that unfortunate collapse of the node. Everyone was quite intrigued - what could be the connection between an instance crash and instance crashing? All sorts of theories cropped up - from failed HBA cards to undiscovered RAC bugs.
This is where I got involved. The following chronicles the diagnosis of the issue and the resolution.

First, the increased length of time is obviously a result of the incemental backups doing more work, i.e. more changed blocks. What caused so many changed blocks? Interviews with stakeholdrs yielded no clear answer - there was absolutely no reason for increased activity. Since we are doing proper research, I decided to start with the facts. How much was the extra blocks processed by incrementals?

I started with this simple query:

select completion_time, datafile_blocks, blocks_read, blocks
from v$backup_datafile
where file# = 1
order by 1
/

Output:

COMPLETIO DATAFILE_BLOCKS BLOCKS_READ BLOCKS
--------- --------------- ----------- ----------
18-JUL-08 524288 32023 31713
19-JUL-08 524288 11652 10960
20-JUL-08 524288 524288 12764
21-JUL-08 524288 524288 5612
22-JUL-08 524288 524288 11089


The columns are:

DATAFILE_BLOCKS - the number of blocks in the datafile at that time
BLOCKS_READ - the exact number of blocks the RMAN incremental backup read
BLOCKS - the numberof blocks it actually backed up

From the above output, a pattern emerges - until Jul 20th, the backup read only a few blocks; but on July 20th, it started scanning the entire file - all the blocks! I checked for a few other datafiles and the story is the same everywhere. With a 4.5 TB database, if the incremental backup reads the datafiles in entirity, then I/O would obviously go for a toss. That explains the I/O and time issue.

But why did RMAN switch from reading a few blocks to the whole file that day? The #1 suspect is Block Change Tracking. The 10g feature BCT allows RMAN to scan only the changed blocks and not the entire file. We use that. So, did something happen to make that disappear?

to answer, I issued a modified query:

select completion_time, datafile_blocks, blocks_read, blocks, used_change_tracking
from v$backup_datafile
where file# = 1
order by 1
/


Output:

COMPLETIO DATAFILE_BLOCKS BLOCKS_READ BLOCKS USE
--------- --------------- ----------- ---------- ---
18-JUL-08 524288 32023 31713 YES
19-JUL-08 524288 11652 10960 YES
20-JUL-08 524288 524288 12764 NO
21-JUL-08 524288 524288 5612 NO
22-JUL-08 524288 524288 11089 NO


Bingo! The BCT use ceased from the 20th July date. That was what caused the whole file to be scanned. But why was it stopped? No one actually stopped it.

Investigating even further, I found from the alert log of Node 1:

Sun Jul 20 00:23:52 2008
CHANGE TRACKING ERROR in another instance, disabling change tracking
Block change tracking service stopping.


From Node 2:

Sun Jul 20 00:23:51 2008
CHANGE TRACKING ERROR in another instance, disabling change tracking
Block change tracking service stopping.

Alert log of Node 3 showed the issue:

Sun Jul 20 00:23:50 2008
Unexpected communication failure with ASM instance:
ORA-12549: TNS:operating system resource quota exceeded
CHANGE TRACKING ERROR 19755, disabling change tracking
Sun Jul 20 00:23:50 2008
Errors in file /xxx/oracle/admin/XXXX/bdump/xxx3_ctwr_20729.trc:
ORA-19755: could not open change tracking file
ORA-19750: change tracking file: '+DG1/change_tracking.dbf'
ORA-17503: ksfdopn:2 Failed to open file +DG1/change_tracking.dbf
ORA-12549: TNS:operating system resource quota exceeded
Block change tracking service stopping.

The last message shows the true error. The error was “operating system resource quota exceeded”, making the diskgroup unavailable. Since the ASM diskgroup was down, all the files were also not available, including BCT file. Surprisingly, Oracle decided to stop BCT altogether rather than report it as a problem and let the user decide the next steps. So block change tracking was silently disabled and the DBAs didn't get a hint of that. Ouch!

Resolution

Well, now that we discovered the issue, we took the necessary steps to correct it. Because of the usual change control process, it took some time to have the change approved and put in place. We executed the following to put the BCT file.

alter database enable block change tracking using file '+DG1/change_tracking.dbf'

The entry in alert log confirms it (all all nodes)

Block change tracking file is current.

But this does not solve the issue completely. to use block change tracking, there has to be a baseline, which is generally a full backup. We never take a full backup. We always take an incremental image copy and then merge to a full backup on a separate location. So, the first order of business was to take a full backup. After that we immediately took an incremental. It took just about an hour, down from some 18+ hours earlier.

Here is some analysis. Looking at the backup of just one file - file#1, i.e. SYSTEM datafile:

select COMPLETION_TIME, USED_CHANGE_TRACKING, BLOCKS, BLOCKS_READ
from v$backup_datafile
where file# = 1
order by 1
/

The output:

COMPLETIO USE BLOCKS BLOCKS_READ
--------- --- ---------- -----------
18-AUG-08 NO 31713 524288
18-AUG-08 NO 10960 524288
20-AUG-08 NO 12764 524288
21-AUG-08 NO 5612 524288
22-AUG-08 NO 11089 524288
23-AUG-08 NO 8217 524288
23-AUG-08 NO 8025 524288
25-AUG-08 NO 3230 524288
26-AUG-08 NO 6629 524288
27-AUG-08 NO 11094 524288 <= the filesize was increased 28-AUG-08 NO 3608 786432 29-AUG-08 NO 8199 786432 29-AUG-08 NO 12893 786432 31-AUG-08 YES 1798 6055 01-SEP-08 YES 7664 35411

Columns descriptions:

USE - was Block Change Tracking used?
BLOCKS - the number of blocks backed up
BLOCKS_READ - the number of blocks read by the backup

Note, when the BCT was not used, the *entire* file - 524288 blocks - were
being read every time. Of course only a percent of that was being backed up
since that percentage changed; but the whole file was being checked.
After BCT, note how the "blocks read" number dropped dramatically. That is
the magic behind the dropped time.

I wanted to find out exactly how much I/O savings BCT was bringing us. A simple query would show that:

select sum(BLOCKS_READ)/sum(DATAFILE_BLOCKS)
from v$backup_datafile
where USED_CHANGE_TRACKING = 'YES'
/

The output:
.09581342

That's just 9.58%. After BCT, only 9.58% of the blocks of the datafiles were scanned! Consider the impact of that. Before BCT, the entire file was scanned for changed blocks. After BCT, only about 9.58% of the blocks were scanned for changed blocks. Just 9.58%. How sweet is that?!!!

Here are three representative files:

File# Blocks Read Actual # of blocks Pct Read
----- ------------- ------------------- --------
985 109 1254400 .009
986 1 786432 .000
987 1 1048576 .000


Note, files 986 and 987 were virtually unread (only one block was read). Before BCT, all the 1048576 blocks were read; after BCT only 1 was. This makes perfect sense. These files are essentially older data; so nothing changes there. RMAN incremental is now blazing fast because it scans less than 10% of the blocks. The I/O problem disappered too, making the database performance even better.

So, we started with some random I/O issue, causing a node failure, which led to increased time for incremental, which was tracjed down to a block change tracking file being suspended by Oracle silently without raising an error.

Takeaways:

The single biggest takeway you should get is that just because it is defined, don't get the idea it is going to be there. So, a periodic check for the BCT file is a must. I will work on developing an automated tool to check for non-use of BCT file. The tool will essentially issue:

SELECT count(1)
FROM v$backup_datafile
where USED_CHANGE_TRACKING = 'NO'
/

If the output is >1, then an alert should be issued. Material for the next blog. Thanks for reading.

21 comments:

Prem Khanna J said...

Arup ,

Thanks for sharing that.
Great Information.

Just curious about :

"Apparently there were many "dead paths" from the host to the SAN, which made some LUNs in the ASM diskgroup not being recognizable by the ASM instance. Why? Well, that's the topic for another blog."

Watching this space to hear more
on that :)

~ jp ~

Noons said...

This is actually a good example of a disturbing trend I'm seeing:

messages that are important being "dumped" on the alert log without an associated "ORA-***" that can be picked up by a "grep" or whatever equivalent in an alert script.

In large installations, it is completely unrealistic to expect dbas to pour through MBs of alert logs every day, for every instance, looking for errors hidden among the cacophony of otherwise normal messages!

I've seen this happen since 10g came out and I'm sincerely hoping it is not a new "trend"!

ANY abnormal message in the alert logs MUST be associated with a specific - or even generic - Oracle error code. Otherwise, it's impossible to "fish" them out in usable time!

This is even more relevant with RAC, as demonstrated here.

And no: dba2.0 tools do NOT help in cases like this nor are they the solution!

Arup said...

For Prem Khanna's comment: yes, I will dig more into with the SA and SAN admins and post it here. But please note, this is entirely within the SA area; something I may not be able to reproduce effectively or accurately.

Arup said...

for Nuno's comment:

Yes, it's disturbing; but fortunately it's not a trend. There are a few of these sorts of "errors" without ORA- error numbers. Incidentally, Oracle does not agree that this is indeed an error. Their argument: if BCT fails, the database should be allowed to continue. My counter: leave that option to the end user. Given a choice, I would have liked to raise an error in such a case. And, why would the BCT fail? I put that on my data disk group. So if it fails, that mean my data diskgroup had failed too, which is a bigger problem anyway. so, what do I get by masking the BCT failure? Nothing.

My fight continues with Oracle product managament to make it optional for the user to decide whether to regard BCT failure as a failure or warning. In any case, this should have an ORA- number. there are plenty of ORA- numbers for warning as well that make into alert log, e.g. ORA-1555, deadlocks and so on. Why not this one?

Vishal Gupta said...

Arup, thanks for sharing this with us. Its interesting to know that in case of problem with BCT file access, oracle simply disables it. I can understand the reasoning to disable it. Its quite possible that the datafiles and BCT might be on different normal/ASM disks. And it would not make sense to continue updating the BCT once it becomes avialable again. Because then it would not contain the changes done in between. But i agree, its definitely an error which should be reported as ORA- message. If one has not enabled BCT, then its fine. But if someone has enabled BCT then its for a reason. And silently disabling it is not good on Oracle's part.

Anonymous said...

Hell Yeah! I've been forcing to use BCT for our incremental backup but few believes in BCT becuase there might be problem like degradation of performance everytime it logs changes of blocks in datafile. Now because of your post, I have something to show them and support me to pursue the BCT.

Thanks a Lot Arup,
Jay

Arup said...

For Vishal - yes, absolutely. When the BCT file is available, it will not track the changes from the last time it successfully participated in incremental backup; but it should have certainly thrown up an ORA- error. There is no excuse for not doing that.

Arup said...

For Jay:

Oh no; BCT is a good thing. A very good thing actually. I observed no noticeable performance degradation as a result of enabling BCT. Or perhaps 10%, for argument sake.

But the benefits are huge. As I showed in the example, presence of BCT shaved 90% off the I/O. That's just sweet music to any DBA's ears.

I was merely peeved that Oracle silently shut down BCT without throwing an error.

Anonymous said...

Arup, Thanks for the great article. Do you have any articles on how to estimate the size of a BCT file. We have a 5 TB DW env and have a 6 node RAC cluster. I have seen a few articles but non of them very specific.

Any information would help
Thanks
Deepa

Arup said...

For Deepa: the calculation is really simple. It's 1 MB per 1 TB of the database. If the last TB is a fraction, i.e. less than a TB, it's still 1 MB for the BCT file. For instance a 4.5 TB database and a 5 TB one both will have the same size - 5 MB of BCT file.

Just curious: why do you want to find out the exact size of the BCT file. In contrast to your 5TB database, it's practically non-existent, isn't it?

Feroz Khan said...

hi,

We are using veritas netbackup to take rman backup.
i want to use BCT to make backup faster, is it possible to use BCT for backupsets (level o and level1).



Thanks
Feroz khan

Arup said...

For Feroz: yes, you can use BCT for any types of incremental backups. You are probably using Netbackup agent for Oracle (also called MML - Media Management Library). Behind the scenes, RMAN is in the play and BCT will be useful.

Feroz Khan said...

HI,

Thanks for your reply,

I have taken Rman backup (as backupset) to disk using BCT. i have given below the steps which i have followed. could you please review and advise me.

1. Enabled Block Change Tracking.

2. Taken a Level 0 backup.
Script
run { allocate channel t1 type disk;
backup format 'd:\oracle\bct\%d_t%t_s%s_p%p'
incremental level 0 database;
sql 'alter system archive log current'
release channel t1;
...archive log
...ctl file

}

starting backup at 05-AMR-09
chaneel1 t1: starting compressed full datafile backupset
channel t1: specifying datafile(s) in backupset
input datafile fno=000001 name=D:\oracle.....\system.dbf
....................\sysaux.dbf
....................\users.dbf
.....

then, from V$backup_datafile

datafile_blocks blks_read BLK USE
--------------- --------- ---- ---
62720 61513 46266 YES
56800 47433 45620 YES
30720 30665 19265 YES
12800 10057 8295 YES
430 430 430 NO(ARCH files)
430 430 430 NO (ctl files)

***********************************3. Updated some tables

4. Taken Level 1 backup

{backup incremental level 1 database;}

Starting backup at 05-AMR-09
input datafile fno=000001 name=d:\oracle....\system
..............................
....

v$backup_datafile output
-------------------------
datafile_blocks blks_read BLK USE
--------------- -------- ---- ---
62720 77 11 YES
56800 669 129 YES
30720 17 4 YES
12800 1 1 YES
4480 230 33 YES
430 430 430 NO

Is BCT is working fine or not..

Hope to hear from you

Thanks & Regards
Feroz

Feroz Khan said...

ONE OF OUR CLIENT IS HAVING TWO 10G DATABASE, DEV1 AND DEV2 WITH SAME SETUP.

CREATED A TABLE SAMPLE IN BOTH THE DATABASE. DAILY (50 - 60) THOUSAND'S OF ROWS ARE INSERTED and deleted.

WHATEVER THE PROCESS (INSERT OR DELETE OR UPDATE) WE do iN Dev1 DATABASE, WE USE TO RUN THE SAME SCRIPT IN DEV2 DATABASE ALSO.

but in the end of the day if we check the size of the table, it varies.

for example,

size of the sample table in DEV1 = 5 gb
size of the sample table in dev2 = 6 gb

PCTfree is set to = 10 (it is enough)

what could be the reason..what are the storage clause i have to check.

THANKSIN ADVANCE

FEROZ

Vipul said...

Hi Arup,
This is great info. One of the questions that I always come across as a DBA is what % of data is changing in the database on a daily basis. I often wonder if I can use v$block_change_tracking and v$backup_database views to get this details. I do take level 0 image copy backups over the weendend and then take a level 1 backup daily (which I then merge with level 0). What SQL statement can I use to get this info which shows % block changes. Thank you for the wonderful post.

vipul.d.kapadia@gmail.com

Arup said...

For Vipul - thank you for the compliments. Much appreciated.

About your question, there are better ways to calculate the block changes. The easiest is merely getting the AWR report for the day. The top section "Load Section" will show the db block chages as well as that per second and per transaction.

Vipul said...

Arup,
Thanks for your quick reponse. I appreciate it. Unfortunately, we have not purchased management packs so we can not use AWR reports. Do you think the following query would do the job? (extracted from your blog):

select (sum(BLOCKS_READ)/sum(DATAFILE_BLOCKS)) * 100 as "Blocks scanned after BCT %"
from v$backup_datafile
where USED_CHANGE_TRACKING = 'YES'
/

Thank you.

Sara Reid said...

Prior to Oracle 10g, one of the most important advices given to VLDB administrators was to store their non-changing static data segments within read-only tablespaces for backup performance. Since RMAN just skips any read-only tablespace during incremental backups, read-only tablespace usage had significant impact on incremental backup performance for VLDB sites. This was so-called pseudo incremental backup technique, which was strongly coupled with read-only tablespaces.

4gb micro sd karte

Dmitry said...

you can quickly reproduce the repair sql database file process with another utility for data recovery

Kyle Hailey said...

Wondering how to calculate/predict the size of incremental backups.
Wondering if there is a way to query the number of blocks changed since the backup.
Of course there is the statistic for db block changes, but the same blocks could be changed over and over again, thus this stat is going to he the high side, but the actual size, number of blocks, that will be used in the incremental backup could be much lower

data recovery ohio said...

Marvelous work.Just wanted to drop a comment and say I am new to your blog and really like what I am reading.

Translate