For all those attended my session at Northeast Oracle User Group at Babson College in Wellesley, MA. Have you ever wondered how Cache Fusion knows where to get the block from? Or, how block locks vary from row locks? Or you are confused about the meaning and purpose of various Global Cache Service (GCS), Global Resource Directory (GRD) and Global Enqueue Service (GES). The session was meant to explain how all these actually work under the covers with live demos.

I hope you found the session useful and entertaining. As with all my sessions, I use slides as an aid to presentation; not to communicate the concepts. I have written an accompanying paper which explains these in detail. You can download the paper here. Or, the presentation here. All the scripts I used can be downloaded here.

As always, your feedback will be greatly appreciated.

I don’t usually write about specific products; but once in a while I encounter something worth talking about, especially if it addresses a very common issue anyone with datacenter management responsibilities will understand: avoiding the unavoidable loss of last minute data changes in database after a disaster but without expensive synchronous replication. This product solves that problem with an innovative out of the box approach. The product is Phoenix Systems from Axxana.

Replication and Last Mile Transactions

Imagine this scenario: the datacenter was on fire. All components—servers, storage, network—are destroyed. You can’t salvage anything of value. What happens to the data you need to continue your business? Fortunately you have a disaster recovery plan. You replicate the database to a remote datacenter using either Data Guard (from Oracle), or some storage level technology such as EMC’s SRDF. So the database comes up on the remote site; but there is a problem. Since you used asynchronous replication (such as maximum performance mode in Data Guard), not all changes have made it to remote site. The final few changes, usually known as last mile transactions were yet to be shipped to the remote site when the disaster struck. These are lost forever in the fire. While you do have a database at the remote datacenter, and that database is almost complete—the operative word is “almost”; not 100%, mind you—some data is inevitably lost. What if you can’t afford to be “almost” complete? Organizations such as financial institutions, hospitals, manufacturing concerns and airlines where losing data is simply not an option, this is not a luxury they can afford. Even in case of other organizations where data loss may not be as unacceptable, the loss of data creates a sense of uncertainty, since you don’t know exactly what was lost.

If you need to be 100% certain that all data is available at the remote site, what would you do?

It’s easy, using synchronous replication. All changes on the primary database are reflected on the remote database site in real time. If you used Data Guard, you would have run it in maximum protection mode. In that case, Oracle would have committed the changes in the remote database before confirming the commit at the local site. When the primary site is gone, the remote site is available with 100% of the data. Bye bye, “almost”.

Brilliant. So, why didn’t you do it? Better yet, why doesn’t everyone do it, since it is that simple? Data loss could be unacceptable in the worst case and unsettling at best. Why accept it?

Synchronous Replication

Well, there is a pesky little detail. Synchronous replication means the network connectivity has to be rock solid. Networks have three general characteristics: throughput (how much data they can pump. Imagine a three lane highway compared to a single lane local road), latency (how much time passes to process the data; not the actual speed. For instance, the car on the highway may travel at 70 miles per hour; but it will be a few minutes spent on the ramp to the highway) and reliability (does the data move from point A to point B with 100% accuracy or it has to be re-transmitted a few times).

Synchronous replication requires the throughput to be very high (to support the large amounts change at least during bursts of activity), latency to be very low (otherwise the remote site will get the data late and respond to the primary even later, causing the primary to hold off the commit) and extremely reliable (otherwise the primary may think the remote site is not accessible and therefore shutdown itself to protect the data). If you run the Data Guard in maximum protection mode, you are essentially telling Oracle to make sure that the remote site has absolutely, positively, undoubtedly (add any other adjective you can think of) received the change and has committed.

If Oracle can’t ensure that because of any reason, such as not getting the response in time due to a less reliable network, what choices does it have? If it allows further data changes, the changes are not there at the remote site yet. At this time if the primary database fails, the data is gone. Therefore Oracle has no choice but to shut down the primary database to stop any changes coming in. It’s maximum protection, after all, and that’s what you have instructed it to do.

So, if you decide to use maximum protection, you have to use a high throughput, low latency and extremely reliable network infrastructure. Most public commercial network infrastructures are not. Either you have to contract a commercial carrier to provide this elevated level of service, or build your own, such as using dark fiber. The costs of the network infrastructure become exponentially high, especially when the remote site is farther away from the primary. In many cases, the cost of the network itself may be several times that of the database infrastructure it protects. Owing to the cost limitations, you may be forced to locate the remote site close by, e.g. in New York City and Hoboken, NJ. It will still be exceedingly expensive; and it may not offer the same degree of protection that you expect. These two cities are close enough to be in the same exposure area of disasters such as floods, hurricanes, war and so on. The farther away the remote site is, the more protected your data is; but so is your cost. Could you accomplish no data loss without this expensive proposition?

Until now, the decision was really black and white. If you want no data loss at all, you have no choice but to go for a super expensive network solution. Many companies can’t justify that high sticker price and therefore settle for potential data loss. Many, in fact, make detailed plans as a part of the business continuity efforts on handling this lost data.

It’s assumed that zero data loss is synonymous with expensive network. If you don’t have a high throughput, low latency, highly reliable network, you have to live with some data loss.

Here is the good news. No, it doesn’t have to. Now, it’s possible to have the cheaper commoditized public network infrastructure and still have complete data protection. Allow me to explain.

Data Protection in Oracle

In Oracle, the data files are written asynchronously at different intervals unrelated to the data changes and commits. In other words, when you commit a change, the data files may not have that changed data. In fact the change occurs in the memory only (called a buffer cache) and may not exist in the data files for hours afterwards. Similarly when you make a change but not commit, the data can still be persisted to the data files. Let me repeat that: the data files are updated with the changed data even if you didn’t commit yet. This is the reason why if you have a storage or operating system level replication solution—even synchronous—replicating the data files, the remote site may or may not have the data, even hours after the change.

How does Oracle protect the data that was changed and committed but in the memory, if the data files do not have them? It captures the pre- and post-change data and packages them into something called redo blocks. Remember, these have nothing to do with data blocks. These are merely changes created by activities performed on the database. This redo data—also known as redo vector—is written to a special area in memory called log buffer. When you commit, the relevant redo blocks from the log buffer are written to special files in the database called redo log files, also known as online redo log files. The commit waits until this writing—known as redo flushing—has ended. You can check the Oracle sessions waiting for this flushing to complete by looking at the event “log file sync”. Since the changes—most importantly—committed changes are recorded in the redo log files, Oracle does not need to rely on the memory alone to know which changes are committed and which are not. In case of a failure, Oracle examines the redo logs to find these changes and updates the data files accordingly. Redo logs are very small compared to the data files.

By the way, redo flushing also occurs at other times: every three seconds, every filled 1 MB of log buffer, when a third of the log buffer is full and some other events; but those additional flushes merely make sure the redo log file is up to date, even if there is no commit.

As you can see, this redo log file becomes the most important thing in the data recovery process. When a disaster occurs, you may have copies of the data files at the remote site (thanks to the replication); but as you learned in the previous section the copies are not useful yet since they may not have all the committed changes and may even have uncommitted changes. In other words, this copy is not considered “consistent” by Oracle. After a disaster, Oracle needs to check the redo log files and apply the changes to make the data files consistent. This is known as a “media recovery”. You have to initiate the media recovery process. In case of a synchronous replication at storage or Operating System level, the redo logs are perfectly in sync with the primary site and Oracle has no trouble getting to the last committed transaction just before the failure. There will be no data lost as a result of the recovery process. In case of Data Guard with maximum protection, this is not required since the changes are updated at the remote site anyway. But what about your cheaper, commodity network with asynchronous replication? The redo logs at the remote site will not be up to date with the primary site’s redo logs. When you perform media recovery, you can’t get to the very last change before the failure, simply because you may not have them. Therefore you can perform a recovery only up to the last available information in the redo at the remote site. This is known as “incomplete” media recovery, distinguished from the earlier described “complete” media recovery. To complete the media recovery, you need the information in the redo log files at the primary site; but remember, the primary site is destroyed. This information is gone. You end up with a data loss. Perhaps even worse, you won’t even know exactly how much you lost, since you don’t have the access to the primary redo log files.

Now, consider this situation carefully. All you need is the last redo log file from the primary database to complete the recovery. Unfortunately the file is not available because it’s destroyed or otherwise inaccessible since the site itself is inaccessible. This tiny little file is the only thing that stays between you and complete recovery. What if you somehow magically had access to this file, even though the rest of the data center is gone? You would have been able to complete the recovery with no data loss and looked like a hero and that too without a synchronous replication solution with super expensive network.

Enter the Black Box

Oracle’s redo data can be written to multiple files at the same time. This is called a redo log group and the files are called members. Log flushing writes to all members before confirming the flush. As you can see, all members of a group have the same data. Multiple members are created only for redundancy. As long as one member of a group is available, Oracle can use it to perform recovery. This is where the new tool from Axxana comes in. It is a storage device—named Phoenix—where you create the second member of the redo log groups. The first member of the group is on your normal storage as usual. When disaster strikes and nukes the primary site, you have a copy of the all-important redo log from the Phoenix system.

This is where the first benefit of Phoenix systems comes in. The storage is not just any ordinary one. It’s encased in a bomb-proof, fire-proof and water-proof container which preserves the internal storage from many calamities. The storage has normal connectivity such a network port to connect a network as a NAS and a fiber port to connect to a fiber switch as a SAN. Under ordinary circumstances you would use these ports to connect to your infrastructure system and use the storage. After a disaster, due to the indestructible nature of the enclosure, this storage will most likely be intact. All you have to do is to access the data from it and perform a complete recovery. No data needs to be lost.

But how do you get to the information on the Phoenix system? This is where the second benefit comes in. The Phoenix system transfers its data to another component of the Phoenix system at the remote site. The Phoenix system creates a replica of the required redo logs at the remote site. Since the only data on it are the small redo log files, the amount to transfer is very little and does not put a toll on your other network infrastructure.

Note a very important point: Phoenix system does not perform any transfer of data during normal operation. It’s only during a disaster the system transports the required redo log files to complete a 100% recovery at the remote site.

But it still depends on getting the data out of the Phoenix system that was at the primary site. What if the disaster site is physically inaccessible or it is infeasible to physically transport the Phoenix system to a location where it can be connected to your network? It’s quite possible in case of floods, hurricanes or other natural disasters or manmade ones like war or strikes. The network cables are also likely out of commission after a disaster. Without that access, how can Phoenix system extract the needed last mile transactions from the primary site—you might ask.

No worries; there is a cellular modem built into the Phoenix system that allows you to connect to it from the remote site and transfer the data over a cellular network wirelessly. The system also has its own battery that allows it to stay operational even when external power is gone—a common occurrence in almost any calamity. What’s more: the transfer of data after the disaster can also utilize this cellular connectivity; so you may not even need to physically connect to this storage at the primary site (now defunct due to the disaster) to your network. The data you need to perform the complete no-data-loss recovery may already be transferred over to the system at the remote site and is waiting for you. In any case, you have access to the data. And all this comes without the need to invest in synchronous replication and expensive network infrastructure.

In summary, assuming the primary site is gone and you are using a cheaper asynchronous network, the remote site will be partially up-to-date and the last mile transactions will be lost with the loss of the primary database. However, since you used the disaster-proof Phoenix system for an additional member of the redo log group, the last mile transactions will be intact in that system; the remote system won’t have it yet. At this point you have multiple options:

1) If the network infrastructure between primary and remote sites is still operational (rare; but possible), Phoenix system data transfer creates the copy of redo logs at the remote site automatically. With this in place, you can perform a complete database recovery. Data Loss: 0%.

2) If the network infrastructure is not operational, Phoenix will automatically engage the built-in cellular modem and initiate the data transfer to the remote site. It will be slow; but the amount of data to be transferred is small; so it won’t matter much. Once that is complete, your recovery will be complete with 0% data loss.

3) If you don’t have cellular access either, but have access to the physical Phoenix system device, you can mount it on your network and perform a complete recovery.

As I started off, this is a very simple and elegant out of the box solution to a complex problem. It opens up possibilities for a no-data-loss recovery scenario in case of a disaster, where the option didn’t even exist. In a future post I will describe in detail my hands on experience with screenshots, scripts, etc. with the Phoenix system. Stay tuned.

More about Axxana: www.axxana.com

The Arup Nanda Blog

Pages

Monday, November 09, 2015

NOUG Session: How Cache Fusion Works

Sunday, October 25, 2015

Oracle Open World 2015: The Art and Craft of Tracing

Thursday, September 24, 2015

NYOUG Session: Latches Demystified

Saturday, August 01, 2015

NYOUG July 2015 Session on Tracing and Profiling

Sunday, April 12, 2015

No Data Loss without Synchronous Network

Replication and Last Mile Transactions

Synchronous Replication

Data Protection in Oracle

Enter the Black Box

Translate

Total Pageviews