The Arup Nanda Blog: Partitioning

Showing posts with label Partitioning. Show all posts

Tuesday, September 10, 2013

New York Oracle User Group Fall Conference Materials

Thank you all who attended my sessions at NYOUG Fall Conference this morning. I appreciate spending you most precious commodity - your time - with me. I sincerely hope you found both the presentations enlightening as well as entertaining.

Please see the details of the sessions below along with the download links.

Keynote: Oracle 12c Gee Whiz Features

Yet another Oracle version is out and so are about 1500 new features in a variety of areas. Some are well marketed (e.g. pluggable database or the multitenant option) and some shine by their sheer usefulness. And there are some that do not get a whole lot of coverage but are are hidden gems. In this session you learned 12 broad areas of Oracle Database 12c I feel are worth learning about to make your job as a DBA or developer better, easier, smoother and, in some cases, even make it possible what was hitherto impossible or impractical.

Download slides here and scripts here.

Big Data for Oracle DBAs

Have you ever considered the impact of Big Data on your career as database professionals? Or do you feel frustrated by the lack of a coherent set of explanations of various concepts like Hadoop, Hive, HDFS and Pig Latin - the essential vocabulary of Big Data? If you did, then session was for you. In this, I explained the various concepts of Big Data - Hadoop, Hive, HDFS, etc. and explained how they relate to and contrast with modern relational database concepts.

Download the slides here.

Sunday, April 04, 2010

Online Materialized View Complete Refresh With Partitions

The other day I was posed with an interesting problem regarding the refresh of the materialized views. First, a little background. MVs are nothing new in Oracle; they have been around since Oracle 7 (called snapshots then). Essentially they are similar to tables (as they actually store data) but populated by running a query on some source tables. This query is the defining query of the MV. So, the user can select from the MV instead of executing the defining query – reducing the execution time significantly.

However, the MV and the query would both give the same result if the source tables have not changed. If the data has changed, the MV will not have known about it and will give a stale result. Therefore, you need to refresh the MV from time to time. Usually the MVs are refreshed by the following method:

begin
dbms_mview.refresh(‘’);
end;

So, what’s the problem? The problem is, during complete refreshes, the MV remains unavailable to the users. This duration of refresh depends on the size of the MV, the execution time of the underlying query and so on; and the unavailability is a nuisance for most applications. In some cases the applications are designed to timeout after some wait; and they do so quite consistently in this case – making it more than just a nuisance.

So, I was asked to devise a solution to make the MVs available during the complete refreshes. I accomplished that with a not-so-common use of partitions. Here is how I did it.

Prebuilt Table

First, as an architectural standard, I advocate the use of prebuilt tables in MVs. Here is how an MV is usually created:

create materialized view mv_bookings_01
refresh complete on demand
enable query rewrite
as
select hotel_id, sum(book_txn) tot_txn
from bookings
group by hotel_id
/

This creates the segment mv_bookings_01 during this process. Instead of this approach, I recommend first creating and populating a regular table and then converting that to an MV. In this revised approach, you would first create the table:

create table mv_bookings_01
as
select HOTEL_ID, sum(book_txn) tot_txn
from bookings
group by HOTEL_ID
/

Then, you would create the MV using the prebuilt table option:

create materialized view mv_bookings_01
on prebuilt table
refresh complete on demand
enable query rewrite
as
select HOTEL_ID, sum(book_txn) tot_txn
from bookings
group by HOTEL_ID
/

This does not do anything to the functionality of the MV itself. The MV feels, looks and smells like it was before; but with two significant advantages:

(1) You can easily alter the columns
(2) You can manipulate the table

I blogged about it at http://arup.blogspot.com/2009/01/ultra-fast-mv-alteration-using-prebuilt.html. Anyway, back to the issue at hand. Using the prebuilt trick, I can create a partitioned MV as well. In this specific case, I will add a column to partition on. Remember, this column does not actually participate in the application; it simply creates an anchor point for the partition.

create table mv_bookings_01
(
ref_type number,
hotel_id number(3),
tot_txn number
)
partition by list (ref_type)
(
partition p0 values (0),
partition p1 values (1)
)
/

After the table is created, I can insert data into it:

insert /*+ append */ into mv_bookings_01
select 0, hotel_id, sum(book_txn) tot_txn
from bookings
group by 0, hotel_id
/

Of course I could have selected the data from the source tables directly while creating the prebuilt table; but I just wanted to show a different approach of data loading. The column REF_TYPE is not part of the application; but I have added it to divide the table along a value by list partitioning. The column can hold only two values – 0 and 1, and hence the table has only two partitions.

Once the table is created, I can use the prebuilt table option to create the MV as shown above. However, the presence of the new column makes it a little different. My MV creation script now looks like this:

create materialized view mv_bookings_01
on prebuilt table
enable query rewrite
as
select 0 as ref_type, HOTEL_ID, sum(book_txn) tot_txn
from bookings
group by 0, HOTEL_ID
/

Note that I have used “0 as ref_type” in the select clause, i.e. the ref_type will always be 0. This is not going to a problem as you can see later. When the MV is first created, the value of ref_type is 0; hence the partition p0 is the one is that is populated; not the partition p1. We can confirm that:

SQL> select count(1) from mv_bookings_01 partition (p0);

COUNT(1)
----------
2

SQL> select count(1) from mv_bookings_01 partition (p1);

COUNT(1)
----------
0

Now that the partitions are in place, let’s see how we refresh this MV. We will no longer use the dbms_mview approach. Rather, we will use this:

(1) Create a temporary table
(2) Create all the indexes
(3) Exchange the partition that is not used with this temporary table

Step 1: Temporary table

First we create a temporary table that is identical to the prebuilt table of the MV in structure; but just not partitioned. We will use the value of ref_type column as 0 or 1 based on the value already in the table, in fact exactly opposite of what is in the table. Since we have 0 in the table, we will use 1 in the temporary table. This temporary table will contain the data that we need refreshed as.

create table tmp1
as
select 1 as ref_type, hotel_id, sum(book_txn) tot_txn
from bookings
group by 1, hotel_id
/

We also need to create another temporary table with no data.

create table tmp0
(
ref_type number,
hotel_id number(3),
tot_txn number
)
/

Step 2: Create Indexes

In this case we don’t have any indexes on the MV; but if we had, we would have created them here on the TMP0 and TMP1 tables.

Step 3: Exchange Partitions

When the temporary table is ready, we can exchange the inactive partition with this.

alter table mv_bookings_01
exchange partition p1
with table tmp1
including indexes
without validation
update global indexes
/
alter table mv_bookings_01
exchange partition p0
with table tmp0
including indexes
without validation
update global indexes
/

This SQL performs a switch: the segment that was called partition P1 in the table mv_bookings_01 is now called TMP1 and former segment TMP1 is now called partition P1 in the table mv_bookings_01. This occurs at the data dictionary level; no actual data is ever transferred from one segment to the other. As a result this is very quick. A lock is required on the table only for that instance is such an insignificant amount of time that it may not be even noticeable. Similarly the segment used to be called P0 partition is now known as TMP0 table and TMP0 is called P0.

There are several key things to note here. Note the clause “without validation”, which tells Oracle not to bother checking inside the TMP0 table that it will confirm to the partition specification. This is critical for the performance. Since we built the table with “1” as a hardcoded value, we know that the REF_TYPE column will definitely contain 1, satisfying the partition requirement for P1. A further checking is not required.

The “including indexes” clause switched the local index partitions as well. If there is a global index on the MV’s prebuilt table, that would have been invalidated and needed rebuilding afterwards. But, we avoided that invalidation by using a “update global indexes” clause in the alter statement.

Once the switchover is complete, we can check the MV to see if the data has been visible.

SQL> select count(1) from mv_bookings_01 partition (p0)
2 /

COUNT(1)
----------
0

SQL> select count(1) from mv_bookings_01 partition (p1)
2 /

COUNT(1)
----------
2

This completes the refreshing of the MV and the data is visible to end users. Let’s see the timeline

(1) Building of temporary table
(2) Building of indexes on the temporary table
(3) Exchange partitions

Steps 1 and 2 take most of the time; but that is done offline, without affecting the MV itself. So, long time there does not affect the availability of the MV itself. The step 3 is where the availability is impacted; but that is miniscule.

The table TMP0 now has the rows from partition P0 of the MV. To confirm that:

SQL> select count(1) from tmp0;

COUNT(1)
----------
2

You can drop the table by “drop table tmp0 purge”. You should also drop TMP1 since that table will contain the previous contents of the partition P1, which is useless now.

The next time this MV needs refreshing, you have to repeat the process; but with a very important distinction – the partition P0 needs exchange now. I used a script that checks the value of REF_TYPE in the MV now and use the other value passed to an SQL script with a positional parameter that accepts 0 or 1. Using this approach you can refresh a MV very quickly.

Oracle 11g Variation

In Oracle 11g, there is a new feature – Virtual Column. You can define a column as virtual in the table, which is not actually stored in the table; but calculated during runtime. What’s more, this virtual column can also be used as a partition key. You can define the REF_TYPE as a virtual column in that case.

Takeaways

(1) Just because there is an Materialized View, it does not have to be refreshed by the usual approach, i.e. dbms_mview.refresh. There are other options as well.
(2) Partitioning is not just for performance or purge reasons; it can be used in clever ways such as MV refreshes.
(3) In this specific case, we didn’t have a column that could have been used as a partitioning key; so we had to resort to adding a column. In many cases, the table may already have a column for that purpose, making the process even simpler.

Sunday, September 28, 2008

OOW'08 Partitioning - Why, When, What and How

I had a full session (300+ in attendance with 120 wait-listed). There was a gentleman from the organizing team who was trying to let in people flow in to the room as unobtrusively as possible. He was the same one for my other two sessions (which happend to be in the same room - full and and some 100+ wait listed) and he was visibly flustered to conduct the same routine. "You must be quite popular" - he exasperated! I took it as a compliment. For those who attended, thank you for taking the time to spend with me. I truly feel honored. For those who couldn't come in, I assume you have found some other helpful session.

Here is the abstract:

How do you decide which partitioning schemes to use, which column to partition on, or whether or not to use more than one column for partitioning keys? These are questions you must answer to develop a good strategy. Too many architects and DBAs choose a wrong strategy, making partitioning a failure. In this session, you will *not* learn the syntax or types of partitioning but how to decide the best strategy, presented through real-life case studies and examples. This session covers Oracle 11g features in partitioning.

If you haven't already done so from the Oracle Open World site, you can download here: http://www.proligence.com/OOW08_Part_ppt.pdf

By the way, I have also written an article on the same presentation, which you can find here: http://www.proligence.com/OOW08_Part_doc.pdf

Please leave your comments on these two documents. As always, thank you visiting and offering your feedback.

Wednesday, September 17, 2008

Speaking on Partitioning at NY Metro Oracle Day

I recently delivered a session at New York Metro Area Oracle Day organized by User Groups in New York, New Jersey and Connecticut. If you attended, thank you for coming. There is nothing much satisfying than see people taking time off their daily grind and be in the room to listen to someone they may or may not know; but trust they can get something from. Thank you for all those who attended.

The session was about how to use partitioning, not what it is. So I didn't talk about syntax; but how to exploit partitioning techniques to accomplish common objectives.

Here is the powerpoint presentation deck. However, you may find a descriptive article more suitable for reading.

Congratulations to the lady who won my book RMAN Recipes for asking the best question. I apologize, I couldn't get your name. If you read this, please get back to me.

The question was: In a RAC envrionment does partitioning help or hurt?

Answer: Partitioning helps in these ways:
(1) it allows several segments to be create for the same object. Therefore the chances of cache buffer chains gets reduced, which ultimately reduces the global cache related waits.
(2) it reduces the chance that rows will be found on one block, leading to block contention among instances and reduces gc buffer busy waits.
(3) in hash partitioned indexes, it reduces contention for a few index blocks, leading to lesser gc current grant related waits.

The downside? Well, not much. Partition operations (drop/add/split) are DDL, not DML operations. So, when you perform one of these, the library cache must be synced up on all instances, whihc may create library cache lock/latch issues. So, as a rule, perform partition maintenance operations during off-hours or periods of light activity.

The Arup Nanda Blog

Pages