As a best practice, I recommend placing external procedures, if you use them, on a different listener.ora file. Why so?
External Procedures are actually O/S level programs executed by the listener, as the user "oracle" (or whatever the username for Oracle software is).
Let me repeat that: external procs allow database sessions to execute O/S level programs with the privilege level of the Oracle user! Do you see a problem here? Since the programs run as "oracle", they can do anything the user can do at the command line: "ls -l", "rm listener.log", or even "rm datafiles". Do you want that? Of course not.
Unfortunately there are several vulnerabilities that exist which exploit this particular feature. the CPU patches address some of them; but the bad guys respond by exposing even more holes. In this cat and mouse game, the best thing, in my opinion, is to reduce or even eliminate the possibility if feasible. If you don't use external procs (most people don't), why put them on the listener.ora file? Unfortunately the default config puts the external procs; so you should remove them and you can do that easily.
If you must use external procs, then I suggest using a different listener for that. Doing so allows you to shutdown that functionality while allowing normal database connectivity. If you perceive an imminent threat, you can take evasive action by shutting down the ext proc listener. You can't do that if the external jobs are on the same listener.
Again, consider this: what is the harm is doing that? Nothing. and what is the benefit? I just showed you, however small that may be. So, if you lost nothing and potentially gain something, why not do it?
Confessions of an Oracle Database Junkie - Arup Nanda The opinions expressed here are mine and mine alone. They may not necessarily reflect that of my employers and customers - both past or present. The comments left by the reviewers are theirs alone and may not reflect my opinion whether implied or not. None of the advice is warranted to be free of errors and ommision. Please use at your own risk and after thorough testing in your environment.
Showing posts with label Listeners. Show all posts
Showing posts with label Listeners. Show all posts
Tuesday, August 19, 2008
Why should you set the ADMIN_RESTRICTIONS_LISTENER to ON
Recently someone probably went through the slides of my session on "Real Life DBA Best Practices" and had a question on OTN forum why I was recommending setting the parameter to ON, as a best practice. I responded on the forum; but I feel it's important enough to put it here as well.
As a best practice, I recommend setting this parameter to ON (the default is OFF). But as I profess, a best practice is not one without a clear explanation. Here is the explanation.
Over the period of time, the Oracle Database has encountered several security vulnerabilities, some of them on the listener. Some are related to buffer overflow. others involve unauthorized access into the listener process itself. Some of the listener access exploits come from external listener manipulations. Did you know that you do not need to even log into a server to connect to the listener? As long as the port the listener is listeneing on is open (and it will be, for obvious reasons) you can connect to the listener from a remote server.
In 10g, Oracle provided a default mechanism that does not require password from the oracle user manipulating the listener via online commands. Having said that, there have been bugs and there will be. Those vulnerabilities usually get fixed later; but most often the fix does not get to the software quickly enough.
So, what should you do to protect against these vulnerabilities? I consider a simple thing to do is to remove the possibilty altogether; and that's where the admin restrictions come into picture. After setting this parameter, you can't dynamically change the parameter. So, even though a connection is made somehow from an outside server - bug or not - eliminating the possibilty altogether mitigates the risk. And, that's why recommend it.
Let's ponder on the problem a little bit more. Is that a problem is setting the parameter? Absolutely not. When you need to change a parameter, you simply log on to the server, update the listener.ora and issue "lsnrctl reload". This reloads the parameter file dynamically. Since you never stopped the listener, you will not see unsuccessful conection requests from clients. So, it is dynamic. If you are the oracle user, then you can log on to the server; so there is no issue there.
I advocate this policy rather than dyanamic parameter changes, for these simple reasons:
(1) It plugs a potential hole dues to remote listener vulnerability attacks, regardless of the probabilty of that happening.
(2) It forces you to make changes to listener.ora file, which shows the timestamp.
(3) I ask my DBAs to put extensive comments on the parameter files, including the listener.ora file, to explain the change. I also ask them to comment a previous line and create a new line with the new value, rather than updating a value directly. This sort of documentation is a gem during debugging. Changing in the parameter file allows that, while dynamic change does not.
So, I don't see a single functionality I lose by this practice; and I just showed you some powerful reasons to adopt this practice. No loss, and some gain, however small you consider that to be - and that's why I suggest it.
As I mentioned earlier, a best practice is not one without a clear explanation. I hope this explanation makes it clear.
As a best practice, I recommend setting this parameter to ON (the default is OFF). But as I profess, a best practice is not one without a clear explanation. Here is the explanation.
Over the period of time, the Oracle Database has encountered several security vulnerabilities, some of them on the listener. Some are related to buffer overflow. others involve unauthorized access into the listener process itself. Some of the listener access exploits come from external listener manipulations. Did you know that you do not need to even log into a server to connect to the listener? As long as the port the listener is listeneing on is open (and it will be, for obvious reasons) you can connect to the listener from a remote server.
In 10g, Oracle provided a default mechanism that does not require password from the oracle user manipulating the listener via online commands. Having said that, there have been bugs and there will be. Those vulnerabilities usually get fixed later; but most often the fix does not get to the software quickly enough.
So, what should you do to protect against these vulnerabilities? I consider a simple thing to do is to remove the possibilty altogether; and that's where the admin restrictions come into picture. After setting this parameter, you can't dynamically change the parameter. So, even though a connection is made somehow from an outside server - bug or not - eliminating the possibilty altogether mitigates the risk. And, that's why recommend it.
Let's ponder on the problem a little bit more. Is that a problem is setting the parameter? Absolutely not. When you need to change a parameter, you simply log on to the server, update the listener.ora and issue "lsnrctl reload". This reloads the parameter file dynamically. Since you never stopped the listener, you will not see unsuccessful conection requests from clients. So, it is dynamic. If you are the oracle user, then you can log on to the server; so there is no issue there.
I advocate this policy rather than dyanamic parameter changes, for these simple reasons:
(1) It plugs a potential hole dues to remote listener vulnerability attacks, regardless of the probabilty of that happening.
(2) It forces you to make changes to listener.ora file, which shows the timestamp.
(3) I ask my DBAs to put extensive comments on the parameter files, including the listener.ora file, to explain the change. I also ask them to comment a previous line and create a new line with the new value, rather than updating a value directly. This sort of documentation is a gem during debugging. Changing in the parameter file allows that, while dynamic change does not.
So, I don't see a single functionality I lose by this practice; and I just showed you some powerful reasons to adopt this practice. No loss, and some gain, however small you consider that to be - and that's why I suggest it.
As I mentioned earlier, a best practice is not one without a clear explanation. I hope this explanation makes it clear.
Monday, August 04, 2008
Issues in RAC with Wrong Listener Configuration
The one thing that frustrates me the most is the lack of clear communication. Without clarity in what is being transpired, it's impossible for the receiver to understand the meaning - not just the superficial content but the subtle undertones as well. And it fails to materialize a manageable goal. I could be rambling about humans, of course; but the issue in communication goes beyond - to the things we create - writings, books, articles, poems, software programs, whatever. Oh, did I mention software programs?
Well, in this case I am talking about the software programs we all live by - the Oracle database; more specifically the RAC. Recently we suffered an outage. The reason turned out to be a simple issue of human forgetfulness; but the frustrating part was the lack of communication from the CRS about the issue. So, here is how it went.
This is a three node RAC. We applied the RAC bundle patch in a rolling fashion. Node 3 was brought down, patch applied and was brought up - everything was hunky dorey.
Then node 2 was brought down, patch applied and was brought up; but here was a little problem - the VIP didn't come up. The VIP relocated to node 3. But all the 23 services defined on the node were online. Strange. We assumed the issue was with the bundle patch ina rolling manner; so we decided to go ahead and do the deed on node 1, the remaining one.
When node 1 was brought up, we saw the real issues. The VIP for node 1 came up on node 3 and none of the services were up on node 1. Worse, the services that were up on node 2 suddenly went offline.
A lot of frantic calls to Oracle Support, request for esaclation to duty manager followed, along with the customary yelling and sioghing by the customer. At the end of 2 hours and with a Sev 1 SR, we hadn't gone anywhere.
At this time I took an executive decision. I asked the DBA to issue an ALTER SYSTEM comamnd on node 2 to bring up the services manually, bypassing CRS.
ALTER SYSTEM SET SERVICE_NAME = 'X','Y','Z',....
All the services where up on the instance 2 at this time, confirmed by SHOW PARAMETER SERVICE_NAMES command; but they still showed up as OFFLINE in crs_stat command. That meant, the instance had no issue with the services; the CRS had.
Then we issued a lsnrctl reload command to relaod the listener (note, we didn't use CRS to manipulate the listener - a very important clue you will see why later). The listener started fine with all service names. So far so good. We tried a few connections from the client and they connected fine. We did the same for node 1 and it worked too.
So, at this time, we had the services all up and running; but CRS had no idea about them. That meant the service failover probably wouldn't occur.
Next, we reloacted the VIP using the crs_relocate command. Al the VIPs (of node 1 and node 2) came back to the respective nodes. That was strange as well. If there was an issue with VIPs related to the networking software or hardware, how could they come back?
All these time we analyzed all the logs - the css, crs, racg alogs along with the alert log of the CRS (this is 10.2; so we have the alert log for CRS as well). Nothing there to indicate why it happened. And that's the basis of my rambling about lack of communication.
So, as always we went through the changes in the last few weeks. One of the crucial changes was the introduction of local_listener and remote_listener parameters in init.ora. These parameters can be placed as absolute values - with the host name, ports, etc. - or as the TNS alias entries. We had chosen the latter. The local_listener parameter was ok; but the remote_listener was listener_xxx.
Here was the entry for listener_xxx on tnsnames.ora:
listeners_xxx = (address_list = (address = (protocol=tcp)(host=sxxx1)(port=1521)) (address = (protocol=tcp)(host=sxxx2)(port=1521)) (address = (protocol=tcp)(host=sxxx3)(port=1521)))
There is nothing wrong with the syntax; but note the host value; sxxx1, sxxx2 and sxxx3 are all staging servers; and this is a production cluster. The communication does not work between prod and staging. So, the remote listener parameter pointed to hosts that were not even reachable! By the way, the entries were correct in node 3.
This caused the services to fail when started. Well, then how come they came up fine with ALTER SYSTEM?
My guess is since the ALTER SYSTEM command didn't go through the listener, it was not affected. CRS, on the other hand, goes through the listener.
Fine; but how come my lsnrctl reload command work? Again, my guess is, that comamnd simply starts the listener; the remote listener parameter in the database doesn't affect it. Since the local_listener parameter was correct, the services were successfully registered.
What about the VIP? How did the relocate work? I can only guess, of course. The conjecture is that the VIP is forced to go to the preferred node unless there is some sort of hardware issue, which was not in our case. So the relocate worked.
Phew! all these due to the sloppy work by a DBA. While this was deplorable, the issue with human error will continue to exist. Typos and wrong copy-and-paste will happen, no matter how much severe we make the punishment. I strongly feel CRS should have pointed out this error somewhere. In all that gets spewn out on CRS and CSS logs, how much is actually decipherable? Almost 99% has no meaning attached to it, at least for mere mortals. The idea behind the alert log of the CRS was to show the customer relevant issues in a common log; but there was no indication of this issue.
See the point about communication?
Well, in this case I am talking about the software programs we all live by - the Oracle database; more specifically the RAC. Recently we suffered an outage. The reason turned out to be a simple issue of human forgetfulness; but the frustrating part was the lack of communication from the CRS about the issue. So, here is how it went.
This is a three node RAC. We applied the RAC bundle patch in a rolling fashion. Node 3 was brought down, patch applied and was brought up - everything was hunky dorey.
Then node 2 was brought down, patch applied and was brought up; but here was a little problem - the VIP didn't come up. The VIP relocated to node 3. But all the 23 services defined on the node were online. Strange. We assumed the issue was with the bundle patch ina rolling manner; so we decided to go ahead and do the deed on node 1, the remaining one.
When node 1 was brought up, we saw the real issues. The VIP for node 1 came up on node 3 and none of the services were up on node 1. Worse, the services that were up on node 2 suddenly went offline.
A lot of frantic calls to Oracle Support, request for esaclation to duty manager followed, along with the customary yelling and sioghing by the customer. At the end of 2 hours and with a Sev 1 SR, we hadn't gone anywhere.
At this time I took an executive decision. I asked the DBA to issue an ALTER SYSTEM comamnd on node 2 to bring up the services manually, bypassing CRS.
ALTER SYSTEM SET SERVICE_NAME = 'X','Y','Z',....
All the services where up on the instance 2 at this time, confirmed by SHOW PARAMETER SERVICE_NAMES command; but they still showed up as OFFLINE in crs_stat command. That meant, the instance had no issue with the services; the CRS had.
Then we issued a lsnrctl reload command to relaod the listener (note, we didn't use CRS to manipulate the listener - a very important clue you will see why later). The listener started fine with all service names. So far so good. We tried a few connections from the client and they connected fine. We did the same for node 1 and it worked too.
So, at this time, we had the services all up and running; but CRS had no idea about them. That meant the service failover probably wouldn't occur.
Next, we reloacted the VIP using the crs_relocate command. Al the VIPs (of node 1 and node 2) came back to the respective nodes. That was strange as well. If there was an issue with VIPs related to the networking software or hardware, how could they come back?
All these time we analyzed all the logs - the css, crs, racg alogs along with the alert log of the CRS (this is 10.2; so we have the alert log for CRS as well). Nothing there to indicate why it happened. And that's the basis of my rambling about lack of communication.
So, as always we went through the changes in the last few weeks. One of the crucial changes was the introduction of local_listener and remote_listener parameters in init.ora. These parameters can be placed as absolute values - with the host name, ports, etc. - or as the TNS alias entries. We had chosen the latter. The local_listener parameter was ok; but the remote_listener was listener_xxx.
Here was the entry for listener_xxx on tnsnames.ora:
listeners_xxx = (address_list = (address = (protocol=tcp)(host=sxxx1)(port=1521)) (address = (protocol=tcp)(host=sxxx2)(port=1521)) (address = (protocol=tcp)(host=sxxx3)(port=1521)))
There is nothing wrong with the syntax; but note the host value; sxxx1, sxxx2 and sxxx3 are all staging servers; and this is a production cluster. The communication does not work between prod and staging. So, the remote listener parameter pointed to hosts that were not even reachable! By the way, the entries were correct in node 3.
This caused the services to fail when started. Well, then how come they came up fine with ALTER SYSTEM?
My guess is since the ALTER SYSTEM command didn't go through the listener, it was not affected. CRS, on the other hand, goes through the listener.
Fine; but how come my lsnrctl reload command work? Again, my guess is, that comamnd simply starts the listener; the remote listener parameter in the database doesn't affect it. Since the local_listener parameter was correct, the services were successfully registered.
What about the VIP? How did the relocate work? I can only guess, of course. The conjecture is that the VIP is forced to go to the preferred node unless there is some sort of hardware issue, which was not in our case. So the relocate worked.
Phew! all these due to the sloppy work by a DBA. While this was deplorable, the issue with human error will continue to exist. Typos and wrong copy-and-paste will happen, no matter how much severe we make the punishment. I strongly feel CRS should have pointed out this error somewhere. In all that gets spewn out on CRS and CSS logs, how much is actually decipherable? Almost 99% has no meaning attached to it, at least for mere mortals. The idea behind the alert log of the CRS was to show the customer relevant issues in a common log; but there was no indication of this issue.
See the point about communication?
Subscribe to:
Comments (Atom)