The Arup Nanda Blog: 2008

Friday, October 10, 2008

About Clear Communications

Like everyone else I like to rant once in a while. I rant about the shortcomings in Oracle software, the tools and technologies I work with. But this time I want to rant about a decisively non-technical topic. Arguably it is something that everyone must have felt - at least once. It's about communicating clearly. Why don't people do it? Why don't they articulate whatever they are trying to say. Instead they spit out incoherently with thoughts coming across as sloppily slapped together expecting the other person to somehow put it all together. I am not talking about children; these are responsible adults who supposedly make up policies and act as thought leaders. Unclear thoughts and communication not only frustrates people; but is dangerous. It misdirects efforts leading to wastage and often utter failure.

Today I had on the receiving end of such a travesty. Earlier, a manager of an application team wrote to me this email (reproduced verbatim) about a requirement:

We are having a shortage of capabilities on the servers. So we want to increase the capabilities somehow. What do you recommend?

I was scratching my head. How can I comment or influence the capabilities of their applications? Perhaps they are asking about some limitations which might be solved by some Oracle technology features. So, I called them for a quick chat. After half hour I still wasn't clear about what limitations they are trying to solve.

And then, after one hour, I got it: they are talking about capacity; not capability! And not only that it's about the database server; not the app server. [Trying to pull my hair out at this time]

My recommendation would have been to send them to an English school; but, being occasionally wise, I kept it to myself.

OK; let's move on. I promised to have a DBA look at the capacity issue.

And a DBA did. Sme days went by and it apprently reached a boiling point. I was told nothing has been done by the DBA and, well, that's not acceptable. so, I intervened. I asked the DBA for her side of the story. It was pretty simple - the CPU, I/O are all normal, way below utilization. The growth projections were eight times. Yes, eight times. So, the DBA made a request for the increase in capacity and that's where the friction has started. No one anticipated the eight fold increase; so there is simply no room. Stalemate!

As the head of database architecture, I question any growth projections, especially ones that go up 8 times. And I did. Here was the response "we are running on 2 legs and we will run on 16 legs in the near future".

2 legs?!!! What is that? What is a leg?

As it turns out, the application is running 2 java virtual machines. the app architects are recommending to run 16 JVMs to add redundancy and distribute local processing. From the database perspective, that means 2 clients now will become 16; but the overall processing will *not* go up.

Instead of saying that in plain English, the App manager coined a term "leg" to describe the issue in apprently in some technical way. This was communiated to his management chain, which in turn interpreted it as 8 fold increase in processing and demanded that we create 7 more databases. They approved the new "databases" and allocated the budget. But since the friction came with a member of my team, I became involved. As I always do, I questioned the decision and that's how the truth came out, which in turn also cleared the mystery behind the "capability" story described above.

All these hoopla about people not communicating in a clear manner. Adding 16 more clients should have simple enough to connvery even to the mail guy; calling it "legs" confused every one, wasted time and increased frustration. If I hadn't questioned it, it would have been implemented too. 7 more databases doing nothing to solve the issues present.

Communication is all about articulation and presentation of thoughts. The key is empathy - put yourself in the recipient's shoes and try to look at what you are saying from that persepctive. It's not about English (or whatever the language used); it's about the clarity of thought process. Granted not everyone will be able to present the thought process equally coherently while speaking; but what about writing? There is no excuse for not writing coherently, is there? The only reason for being incoherent is just a I-don't-care attitude. Of course, there are other things - unfamilairity with the language used, lack of time, environment (typing on the blackberry with one finger while cooking with the other hand), state of mind (typing out the report while waiting for the third drink to arrive at the bar); but most of the time it's just plain lack of empathy. If you don't have that, you just don't communicate; at least not effectively. And when you don't communicate, you fail, regardless of your professional stature.

As Karen Morton once said during a HotSos Symposium, and a mantra I took to heart and live by the lines everyday:

Knowledge and Experience Makes you Credible
Ability to Communicate Effectively makes you Useful

Useful - that's what I and you want to be; not just credible. Thank you, Karen.

Sunday, October 05, 2008

Upgrade Experience for Patchkit 11.1.0.7

The first patchkit for 11g - 11.1.0.7 - came out a few weeks ago, for Linux systems. Unlike most other patchkits, this one contains some new functionalities. Oracle patchkits are usually only bugfixes with al patches regression tested collectively; not added features. This one does have some new (albeit minor) features.

The upgrade was relatively easy but took a long time - about 30 minutes. I encountered one issue and a roadblock. Here is the annoying roadblock

A Very Import Pre-req

Do not forget to check for this pre-requisite. This is the time_zone check, as specified in the patchkit readme file.

SQL> select version from v$timezone_file;

VERSION
----------
4

In my case it returned 4, which means no further action is necessary. Had it returned something else, I would have to follow the instructions mentioned in MetaLink Note 568125.1.

Asking for Email for Security Notification

It was interesting to note that the installer asked me to enter my Email address to send me updates on security. Funny; I get that already anyway. So I ignored and clicked "Next". Nope; it popped a little window asking me to confirm that I do not need email. Sure, I confirmed and pressed Next. No, the same issue, it asked me again to confirm and so on. Vicious circle!

There was a little checkbox below that said "track security via MetaLink" and asked my MetaLink password (not "id", which it presumed to be my email). I checked that box and entered my email and password, and it allowed me to go further.

This is a little awkward. Not everyone will want to get the email. And in any case, a mere email will explain little about the security issues. And why would anyone want to embed his/her MetaLink password in a response file?

Cool Feature: The Pre-upgrade Script

Oracle now provides a script utlu111i.sql in the $OH/rdbms/admin directory. Run this script before you shut the database down for upgrade. It will tell a lot about the issues you may need to fix before trying the upgrade.

The Issue

The issue I encountered was while running the catupgrd.sql script. After applyng the patch, I started up with the UPGRADE option and then executed catupgrd.sql script, which promptly failed. From the error message (which, I unfortunately, I couldn't capture), it was clear that the failure was due to the presence of Data Vault feature. I apparently installed Data Vault option while installing the software. The message says it clearly:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

Well, I need to remove it before I can upgrade. It needed a recomp of the Oracle software:

oracle@prolin1$ cd $ORACLE_HOME/rdbms/lib
oracle@prolin1$ make -f ins_rdbms.mk dv_off

Next, I relinked the Oracle software:

oracle@prolin1$ relink oracle

Now the catupgrd.sql script ran successfully as expected. It took about 1 hour to complete.

SQL> select * from v$version
2 /

BANNER
-----------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
PL/SQL Release 11.1.0.7.0 - Production
CORE 11.1.0.7.0 Production
TNS for Linux: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production

Sunday, September 28, 2008

Most Striking Object at Open World 08

What was the most amazing thing I witnessed for the first time at OOW 08?

Hands down, it is the compostible drinking cup! Yes, compostible, not recyclable. It looks just like any other plastic (not paper) drink cup you see at almost any public water dispensers. The difference? It's made of corn syrup, I was told, not plastic and hence compostible. Wow!

I am not a green fanatic; but I consider myself a responsible adult concerned about the environment doing his share to reduce the landfills, pollutions and paper consumption. I do things that are practical: I don't print something I can read on the monitor; project emails, powerpoints on the screen/projector while conferring with colleagues rather than printing; use back sides of printouts to scribble; use 2-sided printing; donate kids' toys and cloths to charity rather than throw them in trash and so on. But there are some things I just couldn't jettison; at least not yet. One of them was the ubiquitous plastic drinking cup and the bottled water. The convenience of the water bottle was just too much to ignore and my lazy bones reigned over my conscience and I always gravitated, albeit a little guiltly, to the water bottle.

Not any more. I hope these compostible corn syrup based polymer material makes its way to all things plastic - bottles, cups, packaging and so on. The material is called polylactic acid (PLA), which is a polymer made from lactic acid from strachy produce like corn, wheat, patato and beet. However, due to its low melting point, it's not suitable for hot liquids, at least not yet. There is a compostible version - paper cups lines with PLA instea dof petroleum based products. But that's still paper; not 100% PLA.

According to a Smithsonian article, producing this PLA requires 65% less energy and emits 68% fewer greenhouse gases. Wow! That's good enough for me.

But, is it all rosy and smell nice? Well, afraid not. The biggest caveat: the PLA decomposes in a controlled composting facility, not the backyard composting bin. you need something of industrail strength - the sort used by municipalities and large industrial plants. Do they exist? Yes, for commercial use; but almost none for residential use. So, that's the catch. While the material is compostible; the facility to compost is not available.

But I am not going to look at it as glass half full. This is a major first step. Perhaps the ecological and political pressures will force the residential facilities to open up as well. Until then, let the power be with PLA.

OOW'08 Oracle 11g New Features for DBAs

It was my second session at Open World this year. It was full with 332 attendees with a whopping 277 attendees on wait list! the room capacity was 397. Of course, the room did have some fragmentation and not everyone could make it.

Here is the abstract:

There is a world outside the glittering marketing glitz surrounding Oracle 11g. In this session, a DBA and author of the popular 11g New Features series on OTN covers features that stand out in the real world and make your job easier, your actions more efficient and resilient, and so on. Learn the new features with working examples: how to use Database Replay and SQL Performance Analyzer to accurately predict the effect of changes and Recovery Manager (RMAN) Data Recovery Advisor to catch errors and corruption so new stats won't cause issues.

Thank you very much for those who decided to attend. I hope you found it useful. Here is the presentation. You can download it from the Open World site too. Please note, the companion site to see al working examples and a more detailed coverage is still my Oracle 11g New Features Series on Oracle Technology Network.

OOW'08 : Real Life Best Practices for DBAs

Once more, it was a full session with 278 attendees and 251 waitlisted due to lack of space! I wish the organizers simply moved to a bigger room. Oh, well!

For those who attended, I truly wish to express my gratitude. As a speaker, I feel honored when people choose to attend my session over others. I wish you found something useful here.

If you haven't already downloaded it from Open World site, here is the presentation: http://www.proligence.com/OOW08_DBA.pdf.

Here is the abstract:

This session covers best practices for excelling at being a DBA in the field from someone who has been managing Oracle Database instances for 14 years; was named the DBA of the Year in 2003; and has been through it all, from modeling to performance tuning, disaster recovery, security, and beyond. Best practices should be justifiable and situation-aware, not just because someone said they were good. Hear several tips and tricks for surviving and succeeding in the part art/part wizardry of database administration. The session covers Oracle 11g.

I will highly appreciate if you post your comments.

OOW'08 Partitioning - Why, When, What and How

I had a full session (300+ in attendance with 120 wait-listed). There was a gentleman from the organizing team who was trying to let in people flow in to the room as unobtrusively as possible. He was the same one for my other two sessions (which happend to be in the same room - full and and some 100+ wait listed) and he was visibly flustered to conduct the same routine. "You must be quite popular" - he exasperated! I took it as a compliment. For those who attended, thank you for taking the time to spend with me. I truly feel honored. For those who couldn't come in, I assume you have found some other helpful session.

Here is the abstract:

How do you decide which partitioning schemes to use, which column to partition on, or whether or not to use more than one column for partitioning keys? These are questions you must answer to develop a good strategy. Too many architects and DBAs choose a wrong strategy, making partitioning a failure. In this session, you will *not* learn the syntax or types of partitioning but how to decide the best strategy, presented through real-life case studies and examples. This session covers Oracle 11g features in partitioning.

If you haven't already done so from the Oracle Open World site, you can download here: http://www.proligence.com/OOW08_Part_ppt.pdf

By the way, I have also written an article on the same presentation, which you can find here: http://www.proligence.com/OOW08_Part_doc.pdf

Please leave your comments on these two documents. As always, thank you visiting and offering your feedback.

OOW'08 ACE Directors Forum Session

Lewis Cunningham kindly took the reins of organizing a unique session - a panel of ACE Directors. But instead of a panel didcussing some topic, it welomed questions from the audience. sort of stump the chump with the questions. Lewis couldn;t have been a more perfect moderator/host. Most amazing, he came up with the idea to hand out blank index cards for the attendees to fill out with their questions, which forced the questions to be clearer.

On the panel we had Brad Brown (TUSC), Eddie Awad, Tim Hill, Mark Rittman, Hans Forbrich and your truly. Here is a coverage on Mark Rittman's blog (with a picture of the crew) http://www.rittmanmead.com/2008/09/26/oracle-open-world-2008-day-5-exadata-storage-server-and-ask-the-oracle-ace-directors/

I was nervous; and who wouldn't be, with the pressure? Fortunately, we, as a panel, with the expert moderation by Lewis, could ace the volleys. some of the questions I responded to with my answers:

Q: Will Transparent Tablespace Encryption (TTE) spell the doom for Tranparent data Encryption (TDE)?
A: Not at all. TDE encrypts a specific column or columns. TTE encrypts everything the tablespace - all tables and all columns. So, the performance definitely impacted. However, the biggest difference is the encryption in the database. Both technologies encrypt data in storage; but TTE decrypts the data in the SGA. So index scans do not suffer in case of TTE. TDE does *not* decrypt the values in SGA; so index scans are rather useless. So, in the case where a data value will most likely be found in SGA, the TTE option works well. The penalty is in the time when data is loaded from the database to the SGA. Since that happens a lot less, this will not cause a serious issue. In case data is frequently aged out of the buffer cache, the TTE option may prove expensive and TDE might become relatively attactive.
Q: What approach would you recommend for upgrading a 10 GB database to 11g from 10g - Data Pump, Exp/Imp, Transportable Tablespace?
A: None of the above. I would go for a Direct Path Insert (insert with the APPEND hint) over DB Link. This allows me several benefits - (i) I can do a database capture and replay it on 11g to minimize the risk of something breaking after upgrade. (ii) I can do a database reorg at the time of the move, i.e. partition unpartitioned objects, etc. (iii) have minimal time for migration.
Q: What is your least favorite new feature in Oracle?
A: I would rather answer it as most "unnecessary" new feature. It would be bigfile tablespaces - hands down. I always recommend creating smaller datafiles for tablespaces, no more thna 32 GB. This reduces the risk significantly in case of failures. If a block media recovery fails due to whatever reason, you can at least restore the file (or switch over to a copy) quickly. The bigger the file, the more time will be for restore and recovery. A large number of files increase the checkpoint time. so, try to find a balance. But in any case, dump bigfiles.
Q: How has life changed for you after being an OCP?
A: Not in the least. I have been an OCP since 8i and I finishd 9i, 10g and now 11g upgrade exams. However, no one ever bothered to ask me if I am an OCP.

Wednesday, September 24, 2008

From the Trenches at OOW'08

I had two sessions today:

(1) I signed up a Unconference session - "Practical 11g Upgrade". I signed up for this on the spot; so I didn't have any slides or time for preparation. The unconference room had a whiteboard and several markers. So, there I was - explaining our 11g upgrade process using a whiteboard. Was a lot of fun. Someday, I will try to capture the events on paper and publish as a blog.

(2) I had signed for "Meet the ACEs" event at OTN Lounge. Wasn't much of an event. It was shaowed by the much anticipated iPod Touch drawing at OTN lounge. Nope; I didn't win.

Wednesday, September 17, 2008

Speaking on Partitioning at NY Metro Oracle Day

I recently delivered a session at New York Metro Area Oracle Day organized by User Groups in New York, New Jersey and Connecticut. If you attended, thank you for coming. There is nothing much satisfying than see people taking time off their daily grind and be in the room to listen to someone they may or may not know; but trust they can get something from. Thank you for all those who attended.

The session was about how to use partitioning, not what it is. So I didn't talk about syntax; but how to exploit partitioning techniques to accomplish common objectives.

Here is the powerpoint presentation deck. However, you may find a descriptive article more suitable for reading.

Congratulations to the lady who won my book RMAN Recipes for asking the best question. I apologize, I couldn't get your name. If you read this, please get back to me.

The question was: In a RAC envrionment does partitioning help or hurt?

Answer: Partitioning helps in these ways:
(1) it allows several segments to be create for the same object. Therefore the chances of cache buffer chains gets reduced, which ultimately reduces the global cache related waits.
(2) it reduces the chance that rows will be found on one block, leading to block contention among instances and reduces gc buffer busy waits.
(3) in hash partitioned indexes, it reduces contention for a few index blocks, leading to lesser gc current grant related waits.

The downside? Well, not much. Partition operations (drop/add/split) are DDL, not DML operations. So, when you perform one of these, the library cache must be synced up on all instances, whihc may create library cache lock/latch issues. So, as a rule, perform partition maintenance operations during off-hours or periods of light activity.

Monday, September 01, 2008

Magic of Block Change Tracking

The other day, one of the instances of our main 3-node RAC database crashed due to an I/O error. Apparently there were many "dead paths" from the host to the SAN, which made some LUNs in the ASM diskgroup not being recognizable by the ASM instance. Why? Well, that's the topic for another blog. The point behind this one has nothing to do with the reason. All I want to convey that there was a LUN problem on one of the nodes which brought the instance down. Being RAC, the database was serviced from other two nodes - praise God for RAC! - and the users didn't notice it terribly (or, I would like to think that way).

After a few days we noticed the incremental RMAN backup taking a long time. This caused major issues - it took a long time and I/O waits went through the roof. In fact it took increasingly longer every day that passed by that unfortunate collapse of the node. Everyone was quite intrigued - what could be the connection between an instance crash and instance crashing? All sorts of theories cropped up - from failed HBA cards to undiscovered RAC bugs.
This is where I got involved. The following chronicles the diagnosis of the issue and the resolution.

First, the increased length of time is obviously a result of the incemental backups doing more work, i.e. more changed blocks. What caused so many changed blocks? Interviews with stakeholdrs yielded no clear answer - there was absolutely no reason for increased activity. Since we are doing proper research, I decided to start with the facts. How much was the extra blocks processed by incrementals?

I started with this simple query:

select completion_time, datafile_blocks, blocks_read, blocks
from v$backup_datafile
where file# = 1
order by 1
/

Output:

COMPLETIO DATAFILE_BLOCKS BLOCKS_READ BLOCKS
--------- --------------- ----------- ----------
18-JUL-08 524288 32023 31713
19-JUL-08 524288 11652 10960
20-JUL-08 524288 524288 12764
21-JUL-08 524288 524288 5612
22-JUL-08 524288 524288 11089

The columns are:

DATAFILE_BLOCKS - the number of blocks in the datafile at that time
BLOCKS_READ - the exact number of blocks the RMAN incremental backup read
BLOCKS - the numberof blocks it actually backed up

From the above output, a pattern emerges - until Jul 20th, the backup read only a few blocks; but on July 20th, it started scanning the entire file - all the blocks! I checked for a few other datafiles and the story is the same everywhere. With a 4.5 TB database, if the incremental backup reads the datafiles in entirity, then I/O would obviously go for a toss. That explains the I/O and time issue.

But why did RMAN switch from reading a few blocks to the whole file that day? The #1 suspect is Block Change Tracking. The 10g feature BCT allows RMAN to scan only the changed blocks and not the entire file. We use that. So, did something happen to make that disappear?

to answer, I issued a modified query:

select completion_time, datafile_blocks, blocks_read, blocks, used_change_tracking
from v$backup_datafile
where file# = 1
order by 1
/

Output:

COMPLETIO DATAFILE_BLOCKS BLOCKS_READ BLOCKS USE
--------- --------------- ----------- ---------- ---
18-JUL-08 524288 32023 31713 YES
19-JUL-08 524288 11652 10960 YES
20-JUL-08 524288 524288 12764 NO
21-JUL-08 524288 524288 5612 NO
22-JUL-08 524288 524288 11089 NO

Bingo! The BCT use ceased from the 20th July date. That was what caused the whole file to be scanned. But why was it stopped? No one actually stopped it.

Investigating even further, I found from the alert log of Node 1:

Sun Jul 20 00:23:52 2008
CHANGE TRACKING ERROR in another instance, disabling change tracking
Block change tracking service stopping.

From Node 2:

Sun Jul 20 00:23:51 2008
CHANGE TRACKING ERROR in another instance, disabling change tracking
Block change tracking service stopping.

Alert log of Node 3 showed the issue:

Sun Jul 20 00:23:50 2008
Unexpected communication failure with ASM instance:
ORA-12549: TNS:operating system resource quota exceeded
CHANGE TRACKING ERROR 19755, disabling change tracking
Sun Jul 20 00:23:50 2008
Errors in file /xxx/oracle/admin/XXXX/bdump/xxx3_ctwr_20729.trc:
ORA-19755: could not open change tracking file
ORA-19750: change tracking file: '+DG1/change_tracking.dbf'
ORA-17503: ksfdopn:2 Failed to open file +DG1/change_tracking.dbf
ORA-12549: TNS:operating system resource quota exceeded
Block change tracking service stopping.

The last message shows the true error. The error was “operating system resource quota exceeded”, making the diskgroup unavailable. Since the ASM diskgroup was down, all the files were also not available, including BCT file. Surprisingly, Oracle decided to stop BCT altogether rather than report it as a problem and let the user decide the next steps. So block change tracking was silently disabled and the DBAs didn't get a hint of that. Ouch!

Resolution

Well, now that we discovered the issue, we took the necessary steps to correct it. Because of the usual change control process, it took some time to have the change approved and put in place. We executed the following to put the BCT file.

alter database enable block change tracking using file '+DG1/change_tracking.dbf'

The entry in alert log confirms it (all all nodes)

Block change tracking file is current.

But this does not solve the issue completely. to use block change tracking, there has to be a baseline, which is generally a full backup. We never take a full backup. We always take an incremental image copy and then merge to a full backup on a separate location. So, the first order of business was to take a full backup. After that we immediately took an incremental. It took just about an hour, down from some 18+ hours earlier.

Here is some analysis. Looking at the backup of just one file - file#1, i.e. SYSTEM datafile:

select COMPLETION_TIME, USED_CHANGE_TRACKING, BLOCKS, BLOCKS_READ
from v$backup_datafile
where file# = 1
order by 1
/

The output:

COMPLETIO USE BLOCKS BLOCKS_READ
--------- --- ---------- -----------
18-AUG-08 NO 31713 524288
18-AUG-08 NO 10960 524288
20-AUG-08 NO 12764 524288
21-AUG-08 NO 5612 524288
22-AUG-08 NO 11089 524288
23-AUG-08 NO 8217 524288
23-AUG-08 NO 8025 524288
25-AUG-08 NO 3230 524288
26-AUG-08 NO 6629 524288
27-AUG-08 NO 11094 524288 <= the filesize was increased 28-AUG-08 NO 3608 786432 29-AUG-08 NO 8199 786432 29-AUG-08 NO 12893 786432 31-AUG-08 YES 1798 6055 01-SEP-08 YES 7664 35411
Columns descriptions:

USE - was Block Change Tracking used?
BLOCKS - the number of blocks backed up
BLOCKS_READ - the number of blocks read by the backup

Note, when the BCT was not used, the *entire* file - 524288 blocks - were
being read every time. Of course only a percent of that was being backed up
since that percentage changed; but the whole file was being checked.
After BCT, note how the "blocks read" number dropped dramatically. That is
the magic behind the dropped time.

I wanted to find out exactly how much I/O savings BCT was bringing us. A simple query would show that:

select sum(BLOCKS_READ)/sum(DATAFILE_BLOCKS)
from v$backup_datafile
where USED_CHANGE_TRACKING = 'YES'
/

The output:
.09581342

That's just 9.58%. After BCT, only 9.58% of the blocks of the datafiles were scanned! Consider the impact of that. Before BCT, the entire file was scanned for changed blocks. After BCT, only about 9.58% of the blocks were scanned for changed blocks. Just 9.58%. How sweet is that?!!!

Here are three representative files:

File# Blocks Read Actual # of blocks Pct Read
----- ------------- ------------------- --------
985 109 1254400 .009
986 1 786432 .000
987 1 1048576 .000

Note, files 986 and 987 were virtually unread (only one block was read). Before BCT, all the 1048576 blocks were read; after BCT only 1 was. This makes perfect sense. These files are essentially older data; so nothing changes there. RMAN incremental is now blazing fast because it scans less than 10% of the blocks. The I/O problem disappered too, making the database performance even better.

So, we started with some random I/O issue, causing a node failure, which led to increased time for incremental, which was tracjed down to a block change tracking file being suspended by Oracle silently without raising an error.

Takeaways:

The single biggest takeway you should get is that just because it is defined, don't get the idea it is going to be there. So, a periodic check for the BCT file is a must. I will work on developing an automated tool to check for non-use of BCT file. The tool will essentially issue:

SELECT count(1)
FROM v$backup_datafile
where USED_CHANGE_TRACKING = 'NO'
/

If the output is >1, then an alert should be issued. Material for the next blog. Thanks for reading.

Saturday, August 23, 2008

Diagnosing Library Cache Latch Contention: A Real Case Study

The other day, our data warehouse database was effectively down. The database was up and running; but all connection attempts to the database would just hang. Since connections just failed, the DBA couldn’t even get in to see if the sessions connected were running normally or not. The ETL team reported that the processing slowed down to a crawl. The DBA, as instructed, went on to check the wait events; but since she couldn’t even log in, she couldn’t do that. Interestingly, CPU was around 70%, normal for the time of the day; and I/O was about 90%, again normal.

So, in simple terms – she was stuck and she applied the age-old technique: when in doubt, reboot. She asked the SA to reboot the server. It took about 30 minutes; and after it cam backup and the database started everything seemed normal for about 10 minutes. And, then the same problem stuck – the database got unresponsive.

At that point, I was called for help. In this blog I am sort of trying to paint a picture of what I did in the next half hour or so to resolve the issue. I hope by giving this running commentary of the events, facts and figures, you will follow my thoughts as I went about diagnosing the issue.

Symptoms

(1) Database connections were hanging
(2) Connections on the server using SQL*Plus as sysdba was also hanging; so we couldn’t even examine the issue.
(3) The system couldn’t be rebooted everytime; every reboot means more work for the ETL folks to clean up the failed sessions.

Initial Analysis

Obviously something is holding some resource back in the database; but what? Unless we connected to the database, we couldn’t check. And since couldn’t check, we couldn’t remediate. It was a classic case of Catch-22.

Action

This is one of those cases where a backdoor entry to the database instance comes really handy. Most people are not aware of the option in SQL*Plus called “prelim”. This option connects to the SGA; but does not open a session.

(1) So, the first thing I did is to open a sessionless SQL*Plus session using the Oracle user:

$ sqlplus -prelim
SQL>

Note, it didn’t say anything familiar like “Connected to Oracle Database 10.2.0.3”, etc. All it showed was the “SQL>” prompt. That was because it didn’t actually connect to the database.

(2) Then I used the “oradebug” utility to analyze the SGA

SQL> oradebug setmypid
SQL> oradebug hanganalyze 12

This produced a tracefile in the user_dump_dest directory. The file wasn’t difficult to find, since it was the last file created. Even if I didn’t find the file, I could have used the process ID to find the file. The file would have been named crmprd1_ora_13392.trc, assuming 13392 was the process ID.

(3) Let’s examine the file. Here are first few lines:

*** 2008-08-23 01:21:44.200
==============
HANG ANALYSIS:
==============
Found 163 objects waiting for
<0/226/17/0x1502dab8/16108/no>
Open chains found:
Chain 1 : :
<0/226/17/0x1502dab8/16108/no>
<0/146/1/0x1503e898/19923/latch:>

This tells me a lot. First it shows that the SID 146 Serial# 17 is waiting for library cache latch. The blocking session is SID 226 Serial# 17. The latter is not waiting for anything of blocking nature. I also noted the OS process IDs of these sessions – 16108 and 19923.

(4) Next I checked for two more tracefiles with these OS PIDs in their names.

crmprd1_ora_16108.trc
crmprd1_ora_19923.trc

(5) I opened the first one, the one that is the blocker. Here are the first few lines:

*** 2008-08-23 01:08:18.840
*** SERVICE NAME:(SYS$USERS) 2008-08-23 01:08:18.781
*** SESSION ID:(226.17) 2008-08-23 01:08:18.781
LIBRARY OBJECT HANDLE: handle=c0000008dc703810 mtx=c0000008dc703940(8000) cdp=32737
name=UPDATE DW_ETL.FRRS_PROFILER SET CONSUMER_LINK = :"SYS_B_0", ADDRESS_LINK = :"SYS_B_1", ADDRESS_MATCH = :"SYS_B_2", PROC
ESSED=:"SYS_B_3" WHERE RNUM = :"SYS_B_4"
hash=a029fce7bb89655493e7e51a544592a4 timestamp=08-23-2008 00:10:23
namespace=CRSR flags=RON/KGHP/TIM/OBS/PN0/MED/KST/DBN/MTX/[504100d0]
kkkk-dddd-llll=0000-0001-0001 lock=N pin=0 latch#=10 hpc=0058 hlc=0058
lwt=c0000008dc7038b8[c0000008dc7038b8,c0000008dc7038b8] ltm=c0000008dc7038c8[c0000008dc7038c8,c0000008dc7038c8]
pwt=c0000008dc703880[c0000008dc703880,c0000008dc703880] ptm=c0000008dc703890[c0000008dc703890,c0000008dc703890]
ref=c0000008dc7038e8[c0000008dc7038e8,c0000008dc7038e8] lnd=c0000008dc703900[c0000008dc703900,c0000008dc703900]
LOCK OWNERS:
lock user session count mode flags
---------------- ---------------- ---------------- ----- ---- ------------------------
c0000008d079f1b8 c0000006151744d8 c0000006151744d8 16 N [00]
c0000008d4e90c40 c0000006151bcb58 c0000006151bcb58 16 N [00]
c0000008d0812c40 c0000008151a0438 c0000008151a0438 16 N [00]

(6) This is a treasure trove of information for debugging. First it shows the SID and Serial# (226.17), which confirms the SID we identified earlier. It shows the exact SQL statement being used. Finally it shows all the locks. I didn’t particularly care about the specifics of locks; but it gave me enough information to prove that the SID 226 was causing a wait for a whole lot of other sessions.

(7) My investigation is not done; I need to find out the sessions waiting for this. So, I searched the file for a section called “PROCESS STATE”. Here is a snippet from the file:

PROCESS STATE
-------------
Process global information:
process: c00000081502dab8, call: c000000817167890, xact: 0000000000000000, curses: c00000081519ef88, usrses: c000000815
19ef88
----------------------------------------
SO: c00000081502dab8, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=370, calls cur/top: c000000817167890/c000000817167890, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 115 0 4
last post received-location: kslpsr
last process to post me: c000000615002038 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: c000000615002038 1 6
(latch info) wait_event=0 bits=20
holding (efd=4) c0000008d7b69598 Child library cache level=5 child#=10
Location from where latch is held: kglhdgc: child:: latch
Context saved from call: 13
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
291 (197, 1219468295, 197)
279 (197, 1219468295, 197)
374 (197, 1219468295, 197)
267 (197, 1219468295, 197)
372 (197, 1219468295, 197)
... several lines sniped ...
307 (15, 1219468295, 15)
181 (6, 1219468295, 6)
waiter count=58
Process Group: DEFAULT, pseudo proc: c0000008e03150d8
O/S info: user: oracrmp, term: UNKNOWN, ospid: 16108
OSD pid info: Unix process pid: 16108, image: oracle@sdwhpdb1

(8) This told me everything I needed to know. There were 58 sessions waiting for library cache latch held by SID 226. I also knew the OS Process ID and the SQL statement of the blocking session.

(9) At that time we engaged the Application Owner to explain what was going on. As he explained it, he issues the update statement in a loop. And that’s not all; he executes it in 8 different threads. No wonder we have had library cache latch contention. So, we had to track 8; not just one session. We trudged on. All the sessions dumped their information. So, I searched the directory for some other files with the same issues:

$ grep “UPDATE DW_ETL” *.trc

(10) And I found 9 more sessions (or, rather, processes). Here is a snippet from another file:

350 (167, 1219470122, 167)
197 (167, 1219470122, 167)
waiter count=185
Process Group: DEFAULT, pseudo proc: c0000008e03150d8
O/S info: user: oracrmp, term: UNKNOWN, ospid: 16114

This process had 185 waiters! Ouch!

(11) Now comes a decision point. I knew who is blocking and who were being blocked; although I didn’t yet know what latch exactly is being contented for. I could have dumped the library cache latches to get that information; but the application owner volunteered to terminate the sessions. The application, fortunately, was restartable. So, we decided to kill all of these errant sessions on the unix prompt.

$ kill -9

(12) After killing a few processes, the database started responding. After killing all of them, the database wait events came back to completely normal. Connections were established and applications started behaving normally.

After step 11, I could have used the library cache dump to examine the exact library element in the center of the contention; but that’s a topic for another blog.

Takeaways

(1) When something seems to be “hung”, don’t get hung up on that. A session almost always waits for something; rarely it is just “hung”. You should check what it is waiting for by selecting the EVENT column of V$SESSION (10g) or V$SESSION_WAIT (9i).
(2) When you can’t logon to the database to get the information, try using oradebug command.
(3) To use oradebug, you should use SQL*Plus. Since you can’t login, use “sqlplus -prelim" to get the SQL prompt.
(4) Use oradebug setmypid to start the oradebug session, and then use oradebug hanganalyze to create a dumpfile of all hang related issues
(5) Use oradebug help to see all oradebug commands

Tuesday, August 19, 2008

Put External Procedures on a Different Listener

As a best practice, I recommend placing external procedures, if you use them, on a different listener.ora file. Why so?

External Procedures are actually O/S level programs executed by the listener, as the user "oracle" (or whatever the username for Oracle software is).

Let me repeat that: external procs allow database sessions to execute O/S level programs with the privilege level of the Oracle user! Do you see a problem here? Since the programs run as "oracle", they can do anything the user can do at the command line: "ls -l", "rm listener.log", or even "rm datafiles". Do you want that? Of course not.

Unfortunately there are several vulnerabilities that exist which exploit this particular feature. the CPU patches address some of them; but the bad guys respond by exposing even more holes. In this cat and mouse game, the best thing, in my opinion, is to reduce or even eliminate the possibility if feasible. If you don't use external procs (most people don't), why put them on the listener.ora file? Unfortunately the default config puts the external procs; so you should remove them and you can do that easily.

If you must use external procs, then I suggest using a different listener for that. Doing so allows you to shutdown that functionality while allowing normal database connectivity. If you perceive an imminent threat, you can take evasive action by shutting down the ext proc listener. You can't do that if the external jobs are on the same listener.

Again, consider this: what is the harm is doing that? Nothing. and what is the benefit? I just showed you, however small that may be. So, if you lost nothing and potentially gain something, why not do it?

Why should you set the ADMIN_RESTRICTIONS_LISTENER to ON

Recently someone probably went through the slides of my session on "Real Life DBA Best Practices" and had a question on OTN forum why I was recommending setting the parameter to ON, as a best practice. I responded on the forum; but I feel it's important enough to put it here as well.

As a best practice, I recommend setting this parameter to ON (the default is OFF). But as I profess, a best practice is not one without a clear explanation. Here is the explanation.

Over the period of time, the Oracle Database has encountered several security vulnerabilities, some of them on the listener. Some are related to buffer overflow. others involve unauthorized access into the listener process itself. Some of the listener access exploits come from external listener manipulations. Did you know that you do not need to even log into a server to connect to the listener? As long as the port the listener is listeneing on is open (and it will be, for obvious reasons) you can connect to the listener from a remote server.

In 10g, Oracle provided a default mechanism that does not require password from the oracle user manipulating the listener via online commands. Having said that, there have been bugs and there will be. Those vulnerabilities usually get fixed later; but most often the fix does not get to the software quickly enough.

So, what should you do to protect against these vulnerabilities? I consider a simple thing to do is to remove the possibilty altogether; and that's where the admin restrictions come into picture. After setting this parameter, you can't dynamically change the parameter. So, even though a connection is made somehow from an outside server - bug or not - eliminating the possibilty altogether mitigates the risk. And, that's why recommend it.

Let's ponder on the problem a little bit more. Is that a problem is setting the parameter? Absolutely not. When you need to change a parameter, you simply log on to the server, update the listener.ora and issue "lsnrctl reload". This reloads the parameter file dynamically. Since you never stopped the listener, you will not see unsuccessful conection requests from clients. So, it is dynamic. If you are the oracle user, then you can log on to the server; so there is no issue there.

I advocate this policy rather than dyanamic parameter changes, for these simple reasons:

(1) It plugs a potential hole dues to remote listener vulnerability attacks, regardless of the probabilty of that happening.
(2) It forces you to make changes to listener.ora file, which shows the timestamp.
(3) I ask my DBAs to put extensive comments on the parameter files, including the listener.ora file, to explain the change. I also ask them to comment a previous line and create a new line with the new value, rather than updating a value directly. This sort of documentation is a gem during debugging. Changing in the parameter file allows that, while dynamic change does not.

So, I don't see a single functionality I lose by this practice; and I just showed you some powerful reasons to adopt this practice. No loss, and some gain, however small you consider that to be - and that's why I suggest it.

As I mentioned earlier, a best practice is not one without a clear explanation. I hope this explanation makes it clear.

Saturday, August 09, 2008

Resource Manager I/O Calibration in 11g

You are considering a new platform for the Oracle database; or perhaps worried about the I/O capabilities of the existing platform. How do you know if the I/O drivers, LUN creation, etc. are capableof handling the I/O load? This is a very important question; I/O is the single most important thing in an Oracle database that can cripple an application, however large or small.

Here comes the Calibrate I/O tool in Oracle Database 11g. There are some really cool features of 11g that never made into the blitzkrieg of Oracle Marketing. This is one of them. This is a procedure in the DBMS_RESOURCE_MANAGER package. It drives some I/O to the database files to test the throughput and much more metrics.

So, what is the big deal? This type of tests can be done by simple unix system as well; or even by little apps from the storage vendors.

The important difference is that the tool uses Oracle database software libraries to drive the I/O; not some generic I/O generator. This capability is important - you are testing the I/O for an Oracle database, right?

So, let's get on with the how-it-works part of the post. You need to have asynch I/O capability otherwise this will fail with the following message:

DECLARE*

ERROR at line 1:ORA-56708: Could not find any datafiles with asynchronous i/o capability

ORA-06512: at "SYS.DBMS_RMIN", line 453ORA-06512: at "SYS.DBMS_RESOURCE_MANAGER", line 1153ORA-06512: at line 7

To enable asynch I/O, set two values in the init.ora file (or spfile) and recycle the instances:

disk_asynch_io = true



filesystemio_options = asynch

The first one is default; so you may not have to change it. The default of the second one is NONE; so you will probably have to change it. After setting these values and recycling the instance, confirm at the file level:

select name,asynch_io from v$datafile f,v$iostat_file i

where f.file#=i.file_no

and (filetype_name='Data File' or filetype_name='Temp File')

/

NAME ASYNCH_IO

-------------------------------------------------- ---------
+DG1/labrac/datafile/system.261.648437429 ASYNC_ON

+DG1/labrac/datafile/system.261.648437429 ASYNC_ON

+DG1/labrac/datafile/sysaux.262.648437447 ASYNC_ON

+DG1/labrac/datafile/undotbs1.263.648437455 ASYNC_ON

+DG1/labrac/datafile/undotbs2.265.648437477 ASYNC_ON

+DG1/labrac/datafile/users.266.648437523 ASYNC_ON

+DG1/sras_01 ASYNC_ON

This shows the files have Asynch I/O enabled.

Caution: even if the above query returns ASYNC_ON, you may not be able to run the calibrate process. The culprit is the number of asynch I/O slots available in the system. If there are no free slots, the files will not use asynch I/O and you will get ORA-56708. To find out the maximum avalable asynch I/O slots:


$ cat /proc/sys/fs/aio-max-nr
65536

To find out how many are being used:

$ cat /proc/sys/fs/aio-nr
65536

Well, that sort of tells it all. Of the 65536, all are being used; so we are getting ORA-56708. The reason was I was runninganother database on the server, which was consuming all those slots. After shutting down the other database, I checked the utilization again:

$ cat /proc/sys/fs/aio-nr

35456

The value dropped substantially and allowed my calibration to proceed.

Now start the calibration process. The precedure CALIBRATE_IO of the package DBMS_RESOURCE_MANAGER accepts two parameters:

NUM_PHYSICAL_DISKS - the number of physical disks you have. Remember: this is the number for disks, not LUNs.
MAX_LATENCY - the maximum latency for the disk access (in milliseconds)

And it has three OUT parameters:

MAX_IOPS - the maximum number of I/O Operations per second
MAX_MBPS - the maximum Megabytes per second
ACTUAL_LATENCY - the actual latency observed during the test

Since this has OUT parameters, we will have to write a little PL/SQL program to get the values.


set serveroutput on
declare
    l_latency   integer;
     l_iops      integer;
     l_mbps      integer;
 begin
    dbms_resource_manager.calibrate_io (
         18, /* # of disks */
         10, /* max latency */
         l_iops, /* I/O Ops/sec */
         l_mbps, /* MBytes/sec */
         l_latency /* actual latency */
     );
       dbms_output.put_line ('I/O Ops/sec = '  l_iops);
     dbms_output.put_line ('Actual Latency = '  l_latency);
     dbms_output.put_line('MB/sec = '  l_mbps);
end;
/

When this is going on, you can check the status:


SQL> select * from v$io_calibration_status 

2 /

STATUS
-------------
CALIBRATION_TIME
----------------
IN PROGRESS

This shows that the operation is still going on. After some time you will see the status change to READY. Here is the output from my system (a RAC database with 2 nodes, each with 4 CPUs, with a Dell NAS, with 28 disks).


I/O Ops/sec = 122
Actual Latency = 15
MB/sec = 13

PL/SQL procedure successfully completed.

So, it shows us that the actual latency is somewhat like 15 milliseconds. The system can sustain a maximum of 122 I/O Ops per second with a throuput of 13 MB/sec.

You can check the results of the I/O Calibration run later from the data dictionary view DBA_RSRC_IO_CALIBRATE. Let's check this view. Since it has only one row and there are many columns, I used the very handy Print Table utility by Tom Kyte.

SQL> exec print.tbl('select * from dba_rsrc_io_calibrate');

START_TIME : 09-AUG-08 10.49.34.440392 AM
END_TIME : 09-AUG-08 11.04.33.446146 AM
MAX_IOPS : 122
MAX_MBPS : 13
MAX_PMBPS : 4
LATENCY : 15
NUM_PHYSICAL_DISKS : 28

The columns are self explanatory, in line with the previous output, except the MAX_PMBPS, which shows the maximum sustainable I/O in MB/s for a single process.

This output is great; but it is an aggregation. You can check another view V$IOSTAT_FILE for file-level metrics in calibration run:

select file_no,small_read_megabytes,small_read_reqs,

large_read_megabytes,large_read_reqs

from v$iostat_file

Output:

FILE_NO SMALL_READ_MEGABYTES SMALL_READ_REQS LARGE_READ_MEGABYTES LARGE_READ_REQS
---------- -------------------- --------------- -------------------- ---------------
1 267 33234 230 256 1 0 6 0 2
2 514 50337 318 331
3 121 15496 118 118
4 44 5661 43 43
5 1 83 0 0 6 46 5846 45 45

In addition to the columns, this view has a lot more columns. Here is the output for a single datafile:


SQL> exec print.tbl('select * from v$iostat_file where file_no = 1')

FILE_NO                       : 1

FILETYPE_ID                   : 2

FILETYPE_NAME                 : Data File

SMALL_READ_MEGABYTES          : 267

SMALL_WRITE_MEGABYTES         : 11

LARGE_READ_MEGABYTES          : 230

LARGE_WRITE_MEGABYTES         : 0

SMALL_READ_REQS               : 33235

SMALL_WRITE_REQS              : 1083

SMALL_SYNC_READ_REQS          : 33013

LARGE_READ_REQS               : 256

LARGE_WRITE_REQS              : 2

SMALL_READ_SERVICETIME        : 2650538

SMALL_WRITE_SERVICETIME       : 10676

SMALL_SYNC_READ_LATENCY       : 2647154

LARGE_READ_SERVICETIME        : 2745759

LARGE_WRITE_SERVICETIME       : 10

ASYNCH_IO                     : ASYNC_ON

RETRIES_ON_ERROR              : 0

-----------------


 
 
 
 
 
 
 
 
Using these merics you can decide if the I/O is sufficient or still needs more tweaking. 
Recently I used it on a system that required Oracle 10g. But I installed 11g on that server and ran this calibration test, which unearthed some inherent issues with the I/O system. Once rectified, and reconfirmed with the calibration tool, it was ready for the 10g installation.

Thursday, August 07, 2008

Full Table Scans on RAC

The phone goes off, yet again, and this time it seems to be a serious problem. Here are the symptoms: all the applications seems to be working fine albeit at a lesser performance level, except one app, which has crawled to a halt. The app has abended since the queries are timing out. This is, by the a way a RAC database.

This is rather interesting - only one application is slow, instead of all the applications. If this were a systemwide issue, then all the apps would have been affected; not just one. The DBA checked the wait events, as she should have and the predminant events were "gc buffer busy". Well, this is a systemwide issue as a result of saturated interconnect, isn't it - she muses; then how come application A is the only one affected?

So, this is a big clue: the waits are RAC related - "gc buffer busy", yet only one application is slow. First, I got the SIDs of the sessions waiting for that event:

select SID, sql_id, sql_child_number
from v$session
where event = 'gc buffer busy';

After I got the SIDs, I checked the SQL statements issued by the sessions:

select sql_text
from v$sql
where sql_id = ''
and child_number = 24

The results were interesting - all of them were updating or selecting from a single table. Next I checked the execution plan for the SQLs:

select * from table (
dbms_xplan.display_cursor (
'9h1r67dtjxfbs',24,'BASIC'
))
/

PLAN_TABLE_OUTPUT
------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select ...

Plan hash value: 871755554
---------------------------------------------------------------------------------
Id Operation Name
---------------------------------------------------------------------------------
0 SELECT STATEMENT

The output tells it all - there is a full table scan going on for a 14 million row table. Well, why? I checked for the presence of indexes and yes, there is an index. Why isn't the index being used?

This is yet another case of "why my index is not being used". It happens at a relatively lesser frequency; but happens nevertheless.

Upon examination I discovered that the table is range partitioned and the index is local. Next I checked if the index has become unusable somehow:

select partition_name, status
from dba_ind_partitions
where index_name = 'the_inex_name';

PARTITION_NAME STATUS
------------------------------ --------

PT_SEP2009 USABLE
PT_OCT2009 UNUSABLE
PMAX UNUSABLE

Note the status of the last two partitions - UNUSABLE. Why so?

I checked the last_ddl_time:

select last_ddl_time
from dba_objects
where object_name = 'the_index_name';

And voila! The index was altered today. Why? The answer came quickly: there is a partition maintenance script that splits the PMAX partition to PMAX and a partition named PT_OCT2009. The split worked fine; but the DBA forgot to add a line like:

alter index this_index_name rebuild partition PMAX;

So, the partition remained unusable. Remember, this is a local index; so the clause MAINTAIN GLOBAL INDEXES will not do anything. The partition must be explicitly rebuilt. Since the partition was not usable, the query plan suddenly changed to full table scan, as I saw earlier. Since this caused a lot of buffer gets, the buffer pool of all the instances of the RAC cluster got flloded with the blocks of this table, causing gc buffer busy waits. So, even though the wait event is RAC related, the issue was due to abnormally large buffer gets and was limited to a specific application only. Eventually, the issue would have saturated the interconnect and affected other apps; but since we caught it early, that event was averted.

I rebuilt the index partitions by explicitly issuing the alter index commands. After that, the queries started using the index and, of course, the gc waits disappeared.

Lessons:

(1) just because you see "gc" waits, don't automatically assume it is RAC specific issue. The underlying issue may have manifested itself as the RAC component issues.
(2) Add to the list of "why oracle is not using my index" one more: unusable index partition(s)
(3) When you split partition of a table, in addition to the clause "maintain global indexes", issue the rebuild of the split partitions of local indexes.

Monday, August 04, 2008

Issues in RAC with Wrong Listener Configuration

The one thing that frustrates me the most is the lack of clear communication. Without clarity in what is being transpired, it's impossible for the receiver to understand the meaning - not just the superficial content but the subtle undertones as well. And it fails to materialize a manageable goal. I could be rambling about humans, of course; but the issue in communication goes beyond - to the things we create - writings, books, articles, poems, software programs, whatever. Oh, did I mention software programs?

Well, in this case I am talking about the software programs we all live by - the Oracle database; more specifically the RAC. Recently we suffered an outage. The reason turned out to be a simple issue of human forgetfulness; but the frustrating part was the lack of communication from the CRS about the issue. So, here is how it went.

This is a three node RAC. We applied the RAC bundle patch in a rolling fashion. Node 3 was brought down, patch applied and was brought up - everything was hunky dorey.

Then node 2 was brought down, patch applied and was brought up; but here was a little problem - the VIP didn't come up. The VIP relocated to node 3. But all the 23 services defined on the node were online. Strange. We assumed the issue was with the bundle patch ina rolling manner; so we decided to go ahead and do the deed on node 1, the remaining one.

When node 1 was brought up, we saw the real issues. The VIP for node 1 came up on node 3 and none of the services were up on node 1. Worse, the services that were up on node 2 suddenly went offline.

A lot of frantic calls to Oracle Support, request for esaclation to duty manager followed, along with the customary yelling and sioghing by the customer. At the end of 2 hours and with a Sev 1 SR, we hadn't gone anywhere.

At this time I took an executive decision. I asked the DBA to issue an ALTER SYSTEM comamnd on node 2 to bring up the services manually, bypassing CRS.

ALTER SYSTEM SET SERVICE_NAME = 'X','Y','Z',....

All the services where up on the instance 2 at this time, confirmed by SHOW PARAMETER SERVICE_NAMES command; but they still showed up as OFFLINE in crs_stat command. That meant, the instance had no issue with the services; the CRS had.

Then we issued a lsnrctl reload command to relaod the listener (note, we didn't use CRS to manipulate the listener - a very important clue you will see why later). The listener started fine with all service names. So far so good. We tried a few connections from the client and they connected fine. We did the same for node 1 and it worked too.

So, at this time, we had the services all up and running; but CRS had no idea about them. That meant the service failover probably wouldn't occur.

Next, we reloacted the VIP using the crs_relocate command. Al the VIPs (of node 1 and node 2) came back to the respective nodes. That was strange as well. If there was an issue with VIPs related to the networking software or hardware, how could they come back?

All these time we analyzed all the logs - the css, crs, racg alogs along with the alert log of the CRS (this is 10.2; so we have the alert log for CRS as well). Nothing there to indicate why it happened. And that's the basis of my rambling about lack of communication.

So, as always we went through the changes in the last few weeks. One of the crucial changes was the introduction of local_listener and remote_listener parameters in init.ora. These parameters can be placed as absolute values - with the host name, ports, etc. - or as the TNS alias entries. We had chosen the latter. The local_listener parameter was ok; but the remote_listener was listener_xxx.

Here was the entry for listener_xxx on tnsnames.ora:

listeners_xxx = (address_list = (address = (protocol=tcp)(host=sxxx1)(port=1521)) (address = (protocol=tcp)(host=sxxx2)(port=1521)) (address = (protocol=tcp)(host=sxxx3)(port=1521)))

There is nothing wrong with the syntax; but note the host value; sxxx1, sxxx2 and sxxx3 are all staging servers; and this is a production cluster. The communication does not work between prod and staging. So, the remote listener parameter pointed to hosts that were not even reachable! By the way, the entries were correct in node 3.

This caused the services to fail when started. Well, then how come they came up fine with ALTER SYSTEM?

My guess is since the ALTER SYSTEM command didn't go through the listener, it was not affected. CRS, on the other hand, goes through the listener.

Fine; but how come my lsnrctl reload command work? Again, my guess is, that comamnd simply starts the listener; the remote listener parameter in the database doesn't affect it. Since the local_listener parameter was correct, the services were successfully registered.

What about the VIP? How did the relocate work? I can only guess, of course. The conjecture is that the VIP is forced to go to the preferred node unless there is some sort of hardware issue, which was not in our case. So the relocate worked.

Phew! all these due to the sloppy work by a DBA. While this was deplorable, the issue with human error will continue to exist. Typos and wrong copy-and-paste will happen, no matter how much severe we make the punishment. I strongly feel CRS should have pointed out this error somewhere. In all that gets spewn out on CRS and CSS logs, how much is actually decipherable? Almost 99% has no meaning attached to it, at least for mere mortals. The idea behind the alert log of the CRS was to show the customer relevant issues in a common log; but there was no indication of this issue.

See the point about communication?

Wednesday, July 30, 2008

My Little World in Open World

Oracle Open World is coming up in just two months ... actually sooner. It will be quite active for me.

First, I am sharing the stage with a panel with some very distinguished folks - Lewis Cunningham, Eddie Awad, Mark Rittman, Tim Hall, Hans Forbich, Rich Niemiec, and Bradley Brown. This is a real session - S300480 "Oracle ACE Director Birds-of-a-Feather Tips and Techniques" but the format is free question and answer session.

On one hand I feel deeply honored and thrilled to be on the same forum that reads like a who's who in the Oracle community. But on the other hand, the abstract says "Attendees are encouraged to submit their hardest questions in a game of Stump the Chump" Huh? Anyway, it's going to be thrilling nonetheless! Come see us at 10:30 AM on Sep 25th Room 301 Moscone South.

Then, I have three sessions, yes three! All three were selected by user voting in Oracle Mix.

Tue the 23rd at 5:00 PM: Real World Best Practices for DBAs (S301683) Room 236 Moscone South
Wed the 24th at 5:00 PM: 11g New Features for DBAs (S301684) Room 236 Moscone South
Thu the 25th at 12:00 noon: Partitioning: When, Which and How (S301710) Room 200 Moscone South

I also have an ACE Directors' Meeting someday, probably on Sunday. Well, I will have my hands full. As always, Exhibition hall will be a place I will spen a lot of time. I get to know about new products and technology, some of which actually debut during Open World. So, it leaves a tough question - which sessions I should attend.

Wednesday, January 16, 2008

Comparing CLOBs

What if you have two CLOB columns to compare? A construct like

select someColumn
from tab1
where col1 != col2

will not work.

This is how you will need to compare:

select AUDIT_PAYLOAD_TEXT_DEC, AUDIT_PAYLOAD_TEXT
from audit_payload
where dbms_lob.compare(AUDIT_PAYLOAD_TEXT_DEC, AUDIT_PAYLOAD_TEXT) != 0

The return value is non-zero if the comparison fails.

The Arup Nanda Blog

Pages