The Arup Nanda Blog: 09/01/2008

Sunday, September 28, 2008

Most Striking Object at Open World 08

What was the most amazing thing I witnessed for the first time at OOW 08?

Hands down, it is the compostible drinking cup! Yes, compostible, not recyclable. It looks just like any other plastic (not paper) drink cup you see at almost any public water dispensers. The difference? It's made of corn syrup, I was told, not plastic and hence compostible. Wow!

I am not a green fanatic; but I consider myself a responsible adult concerned about the environment doing his share to reduce the landfills, pollutions and paper consumption. I do things that are practical: I don't print something I can read on the monitor; project emails, powerpoints on the screen/projector while conferring with colleagues rather than printing; use back sides of printouts to scribble; use 2-sided printing; donate kids' toys and cloths to charity rather than throw them in trash and so on. But there are some things I just couldn't jettison; at least not yet. One of them was the ubiquitous plastic drinking cup and the bottled water. The convenience of the water bottle was just too much to ignore and my lazy bones reigned over my conscience and I always gravitated, albeit a little guiltly, to the water bottle.

Not any more. I hope these compostible corn syrup based polymer material makes its way to all things plastic - bottles, cups, packaging and so on. The material is called polylactic acid (PLA), which is a polymer made from lactic acid from strachy produce like corn, wheat, patato and beet. However, due to its low melting point, it's not suitable for hot liquids, at least not yet. There is a compostible version - paper cups lines with PLA instea dof petroleum based products. But that's still paper; not 100% PLA.

According to a Smithsonian article, producing this PLA requires 65% less energy and emits 68% fewer greenhouse gases. Wow! That's good enough for me.

But, is it all rosy and smell nice? Well, afraid not. The biggest caveat: the PLA decomposes in a controlled composting facility, not the backyard composting bin. you need something of industrail strength - the sort used by municipalities and large industrial plants. Do they exist? Yes, for commercial use; but almost none for residential use. So, that's the catch. While the material is compostible; the facility to compost is not available.

But I am not going to look at it as glass half full. This is a major first step. Perhaps the ecological and political pressures will force the residential facilities to open up as well. Until then, let the power be with PLA.

OOW'08 Oracle 11g New Features for DBAs

It was my second session at Open World this year. It was full with 332 attendees with a whopping 277 attendees on wait list! the room capacity was 397. Of course, the room did have some fragmentation and not everyone could make it.

Here is the abstract:

There is a world outside the glittering marketing glitz surrounding Oracle 11g. In this session, a DBA and author of the popular 11g New Features series on OTN covers features that stand out in the real world and make your job easier, your actions more efficient and resilient, and so on. Learn the new features with working examples: how to use Database Replay and SQL Performance Analyzer to accurately predict the effect of changes and Recovery Manager (RMAN) Data Recovery Advisor to catch errors and corruption so new stats won't cause issues.

Thank you very much for those who decided to attend. I hope you found it useful. Here is the presentation. You can download it from the Open World site too. Please note, the companion site to see al working examples and a more detailed coverage is still my Oracle 11g New Features Series on Oracle Technology Network.

OOW'08 : Real Life Best Practices for DBAs

Once more, it was a full session with 278 attendees and 251 waitlisted due to lack of space! I wish the organizers simply moved to a bigger room. Oh, well!

For those who attended, I truly wish to express my gratitude. As a speaker, I feel honored when people choose to attend my session over others. I wish you found something useful here.

If you haven't already downloaded it from Open World site, here is the presentation: http://www.proligence.com/OOW08_DBA.pdf.

Here is the abstract:

This session covers best practices for excelling at being a DBA in the field from someone who has been managing Oracle Database instances for 14 years; was named the DBA of the Year in 2003; and has been through it all, from modeling to performance tuning, disaster recovery, security, and beyond. Best practices should be justifiable and situation-aware, not just because someone said they were good. Hear several tips and tricks for surviving and succeeding in the part art/part wizardry of database administration. The session covers Oracle 11g.

I will highly appreciate if you post your comments.

OOW'08 Partitioning - Why, When, What and How

I had a full session (300+ in attendance with 120 wait-listed). There was a gentleman from the organizing team who was trying to let in people flow in to the room as unobtrusively as possible. He was the same one for my other two sessions (which happend to be in the same room - full and and some 100+ wait listed) and he was visibly flustered to conduct the same routine. "You must be quite popular" - he exasperated! I took it as a compliment. For those who attended, thank you for taking the time to spend with me. I truly feel honored. For those who couldn't come in, I assume you have found some other helpful session.

Here is the abstract:

How do you decide which partitioning schemes to use, which column to partition on, or whether or not to use more than one column for partitioning keys? These are questions you must answer to develop a good strategy. Too many architects and DBAs choose a wrong strategy, making partitioning a failure. In this session, you will *not* learn the syntax or types of partitioning but how to decide the best strategy, presented through real-life case studies and examples. This session covers Oracle 11g features in partitioning.

If you haven't already done so from the Oracle Open World site, you can download here: http://www.proligence.com/OOW08_Part_ppt.pdf

By the way, I have also written an article on the same presentation, which you can find here: http://www.proligence.com/OOW08_Part_doc.pdf

Please leave your comments on these two documents. As always, thank you visiting and offering your feedback.

OOW'08 ACE Directors Forum Session

Lewis Cunningham kindly took the reins of organizing a unique session - a panel of ACE Directors. But instead of a panel didcussing some topic, it welomed questions from the audience. sort of stump the chump with the questions. Lewis couldn;t have been a more perfect moderator/host. Most amazing, he came up with the idea to hand out blank index cards for the attendees to fill out with their questions, which forced the questions to be clearer.

On the panel we had Brad Brown (TUSC), Eddie Awad, Tim Hill, Mark Rittman, Hans Forbrich and your truly. Here is a coverage on Mark Rittman's blog (with a picture of the crew) http://www.rittmanmead.com/2008/09/26/oracle-open-world-2008-day-5-exadata-storage-server-and-ask-the-oracle-ace-directors/

I was nervous; and who wouldn't be, with the pressure? Fortunately, we, as a panel, with the expert moderation by Lewis, could ace the volleys. some of the questions I responded to with my answers:

Q: Will Transparent Tablespace Encryption (TTE) spell the doom for Tranparent data Encryption (TDE)?
A: Not at all. TDE encrypts a specific column or columns. TTE encrypts everything the tablespace - all tables and all columns. So, the performance definitely impacted. However, the biggest difference is the encryption in the database. Both technologies encrypt data in storage; but TTE decrypts the data in the SGA. So index scans do not suffer in case of TTE. TDE does *not* decrypt the values in SGA; so index scans are rather useless. So, in the case where a data value will most likely be found in SGA, the TTE option works well. The penalty is in the time when data is loaded from the database to the SGA. Since that happens a lot less, this will not cause a serious issue. In case data is frequently aged out of the buffer cache, the TTE option may prove expensive and TDE might become relatively attactive.
Q: What approach would you recommend for upgrading a 10 GB database to 11g from 10g - Data Pump, Exp/Imp, Transportable Tablespace?
A: None of the above. I would go for a Direct Path Insert (insert with the APPEND hint) over DB Link. This allows me several benefits - (i) I can do a database capture and replay it on 11g to minimize the risk of something breaking after upgrade. (ii) I can do a database reorg at the time of the move, i.e. partition unpartitioned objects, etc. (iii) have minimal time for migration.
Q: What is your least favorite new feature in Oracle?
A: I would rather answer it as most "unnecessary" new feature. It would be bigfile tablespaces - hands down. I always recommend creating smaller datafiles for tablespaces, no more thna 32 GB. This reduces the risk significantly in case of failures. If a block media recovery fails due to whatever reason, you can at least restore the file (or switch over to a copy) quickly. The bigger the file, the more time will be for restore and recovery. A large number of files increase the checkpoint time. so, try to find a balance. But in any case, dump bigfiles.
Q: How has life changed for you after being an OCP?
A: Not in the least. I have been an OCP since 8i and I finishd 9i, 10g and now 11g upgrade exams. However, no one ever bothered to ask me if I am an OCP.

Wednesday, September 24, 2008

From the Trenches at OOW'08

I had two sessions today:

(1) I signed up a Unconference session - "Practical 11g Upgrade". I signed up for this on the spot; so I didn't have any slides or time for preparation. The unconference room had a whiteboard and several markers. So, there I was - explaining our 11g upgrade process using a whiteboard. Was a lot of fun. Someday, I will try to capture the events on paper and publish as a blog.

(2) I had signed for "Meet the ACEs" event at OTN Lounge. Wasn't much of an event. It was shaowed by the much anticipated iPod Touch drawing at OTN lounge. Nope; I didn't win.

Wednesday, September 17, 2008

Speaking on Partitioning at NY Metro Oracle Day

I recently delivered a session at New York Metro Area Oracle Day organized by User Groups in New York, New Jersey and Connecticut. If you attended, thank you for coming. There is nothing much satisfying than see people taking time off their daily grind and be in the room to listen to someone they may or may not know; but trust they can get something from. Thank you for all those who attended.

The session was about how to use partitioning, not what it is. So I didn't talk about syntax; but how to exploit partitioning techniques to accomplish common objectives.

Here is the powerpoint presentation deck. However, you may find a descriptive article more suitable for reading.

Congratulations to the lady who won my book RMAN Recipes for asking the best question. I apologize, I couldn't get your name. If you read this, please get back to me.

The question was: In a RAC envrionment does partitioning help or hurt?

Answer: Partitioning helps in these ways:
(1) it allows several segments to be create for the same object. Therefore the chances of cache buffer chains gets reduced, which ultimately reduces the global cache related waits.
(2) it reduces the chance that rows will be found on one block, leading to block contention among instances and reduces gc buffer busy waits.
(3) in hash partitioned indexes, it reduces contention for a few index blocks, leading to lesser gc current grant related waits.

The downside? Well, not much. Partition operations (drop/add/split) are DDL, not DML operations. So, when you perform one of these, the library cache must be synced up on all instances, whihc may create library cache lock/latch issues. So, as a rule, perform partition maintenance operations during off-hours or periods of light activity.

Monday, September 01, 2008

Magic of Block Change Tracking

The other day, one of the instances of our main 3-node RAC database crashed due to an I/O error. Apparently there were many "dead paths" from the host to the SAN, which made some LUNs in the ASM diskgroup not being recognizable by the ASM instance. Why? Well, that's the topic for another blog. The point behind this one has nothing to do with the reason. All I want to convey that there was a LUN problem on one of the nodes which brought the instance down. Being RAC, the database was serviced from other two nodes - praise God for RAC! - and the users didn't notice it terribly (or, I would like to think that way).

After a few days we noticed the incremental RMAN backup taking a long time. This caused major issues - it took a long time and I/O waits went through the roof. In fact it took increasingly longer every day that passed by that unfortunate collapse of the node. Everyone was quite intrigued - what could be the connection between an instance crash and instance crashing? All sorts of theories cropped up - from failed HBA cards to undiscovered RAC bugs.
This is where I got involved. The following chronicles the diagnosis of the issue and the resolution.

First, the increased length of time is obviously a result of the incemental backups doing more work, i.e. more changed blocks. What caused so many changed blocks? Interviews with stakeholdrs yielded no clear answer - there was absolutely no reason for increased activity. Since we are doing proper research, I decided to start with the facts. How much was the extra blocks processed by incrementals?

I started with this simple query:

select completion_time, datafile_blocks, blocks_read, blocks
from v$backup_datafile
where file# = 1
order by 1
/

Output:

COMPLETIO DATAFILE_BLOCKS BLOCKS_READ BLOCKS
--------- --------------- ----------- ----------
18-JUL-08 524288 32023 31713
19-JUL-08 524288 11652 10960
20-JUL-08 524288 524288 12764
21-JUL-08 524288 524288 5612
22-JUL-08 524288 524288 11089

The columns are:

DATAFILE_BLOCKS - the number of blocks in the datafile at that time
BLOCKS_READ - the exact number of blocks the RMAN incremental backup read
BLOCKS - the numberof blocks it actually backed up

From the above output, a pattern emerges - until Jul 20th, the backup read only a few blocks; but on July 20th, it started scanning the entire file - all the blocks! I checked for a few other datafiles and the story is the same everywhere. With a 4.5 TB database, if the incremental backup reads the datafiles in entirity, then I/O would obviously go for a toss. That explains the I/O and time issue.

But why did RMAN switch from reading a few blocks to the whole file that day? The #1 suspect is Block Change Tracking. The 10g feature BCT allows RMAN to scan only the changed blocks and not the entire file. We use that. So, did something happen to make that disappear?

to answer, I issued a modified query:

select completion_time, datafile_blocks, blocks_read, blocks, used_change_tracking
from v$backup_datafile
where file# = 1
order by 1
/

Output:

COMPLETIO DATAFILE_BLOCKS BLOCKS_READ BLOCKS USE
--------- --------------- ----------- ---------- ---
18-JUL-08 524288 32023 31713 YES
19-JUL-08 524288 11652 10960 YES
20-JUL-08 524288 524288 12764 NO
21-JUL-08 524288 524288 5612 NO
22-JUL-08 524288 524288 11089 NO

Bingo! The BCT use ceased from the 20th July date. That was what caused the whole file to be scanned. But why was it stopped? No one actually stopped it.

Investigating even further, I found from the alert log of Node 1:

Sun Jul 20 00:23:52 2008
CHANGE TRACKING ERROR in another instance, disabling change tracking
Block change tracking service stopping.

From Node 2:

Sun Jul 20 00:23:51 2008
CHANGE TRACKING ERROR in another instance, disabling change tracking
Block change tracking service stopping.

Alert log of Node 3 showed the issue:

Sun Jul 20 00:23:50 2008
Unexpected communication failure with ASM instance:
ORA-12549: TNS:operating system resource quota exceeded
CHANGE TRACKING ERROR 19755, disabling change tracking
Sun Jul 20 00:23:50 2008
Errors in file /xxx/oracle/admin/XXXX/bdump/xxx3_ctwr_20729.trc:
ORA-19755: could not open change tracking file
ORA-19750: change tracking file: '+DG1/change_tracking.dbf'
ORA-17503: ksfdopn:2 Failed to open file +DG1/change_tracking.dbf
ORA-12549: TNS:operating system resource quota exceeded
Block change tracking service stopping.

The last message shows the true error. The error was “operating system resource quota exceeded”, making the diskgroup unavailable. Since the ASM diskgroup was down, all the files were also not available, including BCT file. Surprisingly, Oracle decided to stop BCT altogether rather than report it as a problem and let the user decide the next steps. So block change tracking was silently disabled and the DBAs didn't get a hint of that. Ouch!

Resolution

Well, now that we discovered the issue, we took the necessary steps to correct it. Because of the usual change control process, it took some time to have the change approved and put in place. We executed the following to put the BCT file.

alter database enable block change tracking using file '+DG1/change_tracking.dbf'

The entry in alert log confirms it (all all nodes)

Block change tracking file is current.

But this does not solve the issue completely. to use block change tracking, there has to be a baseline, which is generally a full backup. We never take a full backup. We always take an incremental image copy and then merge to a full backup on a separate location. So, the first order of business was to take a full backup. After that we immediately took an incremental. It took just about an hour, down from some 18+ hours earlier.

Here is some analysis. Looking at the backup of just one file - file#1, i.e. SYSTEM datafile:

select COMPLETION_TIME, USED_CHANGE_TRACKING, BLOCKS, BLOCKS_READ
from v$backup_datafile
where file# = 1
order by 1
/

The output:

COMPLETIO USE BLOCKS BLOCKS_READ
--------- --- ---------- -----------
18-AUG-08 NO 31713 524288
18-AUG-08 NO 10960 524288
20-AUG-08 NO 12764 524288
21-AUG-08 NO 5612 524288
22-AUG-08 NO 11089 524288
23-AUG-08 NO 8217 524288
23-AUG-08 NO 8025 524288
25-AUG-08 NO 3230 524288
26-AUG-08 NO 6629 524288
27-AUG-08 NO 11094 524288 <= the filesize was increased 28-AUG-08 NO 3608 786432 29-AUG-08 NO 8199 786432 29-AUG-08 NO 12893 786432 31-AUG-08 YES 1798 6055 01-SEP-08 YES 7664 35411
Columns descriptions:

USE - was Block Change Tracking used?
BLOCKS - the number of blocks backed up
BLOCKS_READ - the number of blocks read by the backup

Note, when the BCT was not used, the *entire* file - 524288 blocks - were
being read every time. Of course only a percent of that was being backed up
since that percentage changed; but the whole file was being checked.
After BCT, note how the "blocks read" number dropped dramatically. That is
the magic behind the dropped time.

I wanted to find out exactly how much I/O savings BCT was bringing us. A simple query would show that:

select sum(BLOCKS_READ)/sum(DATAFILE_BLOCKS)
from v$backup_datafile
where USED_CHANGE_TRACKING = 'YES'
/

The output:
.09581342

That's just 9.58%. After BCT, only 9.58% of the blocks of the datafiles were scanned! Consider the impact of that. Before BCT, the entire file was scanned for changed blocks. After BCT, only about 9.58% of the blocks were scanned for changed blocks. Just 9.58%. How sweet is that?!!!

Here are three representative files:

File# Blocks Read Actual # of blocks Pct Read
----- ------------- ------------------- --------
985 109 1254400 .009
986 1 786432 .000
987 1 1048576 .000

Note, files 986 and 987 were virtually unread (only one block was read). Before BCT, all the 1048576 blocks were read; after BCT only 1 was. This makes perfect sense. These files are essentially older data; so nothing changes there. RMAN incremental is now blazing fast because it scans less than 10% of the blocks. The I/O problem disappered too, making the database performance even better.

So, we started with some random I/O issue, causing a node failure, which led to increased time for incremental, which was tracjed down to a block change tracking file being suspended by Oracle silently without raising an error.

Takeaways:

The single biggest takeway you should get is that just because it is defined, don't get the idea it is going to be there. So, a periodic check for the BCT file is a must. I will work on developing an automated tool to check for non-use of BCT file. The tool will essentially issue:

SELECT count(1)
FROM v$backup_datafile
where USED_CHANGE_TRACKING = 'NO'
/

If the output is >1, then an alert should be issued. Material for the next blog. Thanks for reading.

The Arup Nanda Blog

Pages