Saturday, April 16, 2011

Can I Fit a 80MB Database Completely in a 80MB Buffer Cache?

This is in the Series "100 Things You Probably Didn't Know About Oracle". If you haven't already, I urge you to read the other parts -

  • Part1 (Commit does not force writing of buffers into the disk), 
  • Part2 (How Oracle Locking Works), 
  • Part3 (More on Interested Transaction Lists).

During the recently concluded Collaborate 11 (http://collaborate11.ioug.org) I was asked a question which led me to the idea for this entry - the fourth in the series. If the database size is 100GB and you have a 100GB buffer cache, all the blocks will be in the memory and there will never be an I/O to the datafiles other than the regular DBWR lazy writes, right?

This is a very important question and you must consider the implications carefully. Many folks assume that by getting a large buffer cache eliminates or reduces the buffer related I/O - a very wrong assumption.A large buffer cache helps; but the relationship between buffer and block is not one to one. A block may have more than one buffer in the buffer cache. How so? Let's see how that happens. We will examine this in a non-RAC database to keep it simple.

Setup

First, let's create a table and insert some rows into it.

SQL> create table bhtest (col1 number, col2 char(2000));

Table created.

SQL> insert into bhtest values (&n,'&c');
Enter value for n: 1
Enter value for c: x
old   1: insert into bhtest values (&n,'&c')
new   1: insert into bhtest values (1,'x')

1 row created.

SQL> /
Enter value for n: 2
Enter value for c: x
old   1: insert into bhtest values (&n,'&c')
new   1: insert into bhtest values (2,'x')

1 row created.

... insert 6 rows ...
commit;

Note how I used char(2000) instead of varchar2(2000). The reason is simple. The char datatype takes up entire string of values, all 2000 of them regardless of the actual value placed there. So, even though I put 'x' there, the entire row will be quite a long one.

After the insertion, check which blocks these rows went into. We can do that by calling dbms_rowid.rowid_block_number() function.

select col1, dbms_rowid.rowid_block_number(rowid)
from bhtest;

      COL1 DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
---------- ------------------------------------
         4                                 4419
         5                                 4419
         1                                 4423
         2                                 4423
         3                                 4423

5 rows selected.

From the output we can see that rows 1 through 3 went to block 4423 and 4 and 5 went to block 4419. We also need the object ID of the object

SQL> select object_id from dba_objects
  2  where object_name = 'BHTEST'

  3  /
 OBJECT_ID
----------
     99360

This completes the setup. In summary, we know that these rows are in blocks 4419 and 4423 and the object ID is 99360.

Experiment


If possible, clear out the cache by shutting down and restarting the database. This brings up an empty cache. It's not absolutely necessary though. Now select from the table:

SQL> select * from bhtest;  

This will bring up all the blocks of the table into the buffer cache, To check for the same, you can check the view V$BH (the buffer headers). The column OBJD is the object_id. (Actually it's the DATA_OBJECT_ID. In this case both are the same; but may not be in all cases). Here are the columns of interest to us:

  • FILE# - the file_id 
  • BLOCK# - the block number 
  • CLASS# - the type of the block, e.g. data block, segment header, etc. Shown as a code 
  • STATUS - the status of the buffer, Exclusive Current, Current, etc. 
  • LOCK_ELEMENT_ADDR - if there is a buffer lock on the buffer, then the address of the lock 


To make it simpler to understand, we will use a decode() on the class# field to show the type of the block. With that, here is our query:

select file#, block#,
    decode(class#,1,'data block',2,'sort block',3,'save undo block', 4,
'segment header',5,'save undo header',6,'free list',7,'extent map',
8,'1st level bmb',9,'2nd level bmb',10,'3rd level bmb', 11,'bitmap block',
12,'bitmap index block',13,'file header block',14,'unused',
15,'system undo header',16,'system undo block', 17,'undo header',
18,'undo block') class_type, status, lock_element_addr
from v$bh
where objd = 99360
order by 1,2,3
/

Save this query because we will be using it a lot in this experiment. Here is the output.

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     cr         00
         7       4418 segment header     xcur       00
         7       4419 data block         xcur       00
         7       4420 data block         xcur       00
         7       4421 data block         xcur       00
         7       4422 data block         xcur       00
         7       4423 data block         xcur       00

7 rows selected.


There are 7 buffers. In this example we have not restarted the cache. So there are two buffers for the segment header. There is one buffer for each data block - from 4419 to 4423. The status is "xcur", which stands for Exclusive Current. I will explain that in detail in a later blog. But in summary it means that the buffer was acquired (or filled by a block) with the intention of being modified. If the intention is merely to select, then the status would have shown CR (Consistent Read). In this case since the rows were inserted modifying the buffer, the blocks were gotten in xcur mode. From a different session update a single row. For easier identification I have used Sess2> as the prompt:

Sess> update bhtest set col2 = 'Y' where col1 = 1;

1 row updated.

From the original session, check the buffers:

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     cr         00
         7       4418 segment header     xcur       00
         7       4419 data block         xcur       00
         7       4420 data block         xcur       00
         7       4421 data block         xcur       00
         7       4422 data block         xcur       00
         7       4423 data block         cr         00
         7       4423 data block         xcur       00

8 rows selected. 

There are 8 buffers now, up one from the previous seven. Note there are two buffers for block ID 4423. One CR and one xcur. Why two?

It's because when the update statement was issued, it would have modified the block. Instead of modifying the existing buffer, Oracle creates a "copy" of the buffer and modifies that. This copy is now [Note there was a typo earlier "not", it should have been "now". Corrected. Thanks to Martin Bex] XCUR status because it was acquired for the purpose of being modified. The previous buffer of this block, which used to be xcur, is converted to "CR". There can't be more than one XCUR buffer for a specific block, that's why it is exclusive. If someone wants to find out the most recently updated buffer, it will just have to look for the copy with the XCUR status. All others are marked CR.

Let's continue with this experiment. From a third session, update a different row in the same block.

Sess3> update bhtest set col2 = 'Y' where col1 = 2;

1 row updated.

From the original session, find out the buffers.

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     xcur       00
         7       4418 segment header     cr         00
         7       4419 data block         xcur       00
         7       4420 data block         xcur       00
         7       4421 data block         xcur       00
         7       4422 data block         xcur       00
         7       4423 data block         cr         00
         7       4423 data block         xcur       00
         7       4423 data block         cr         00
         7       4423 data block         cr         00

There are 4 buffers for block 4423 now - up from 2. What happened? Since the buffer was required to be modified once more, Oracle created yet another "copy", marked it "xcur" and relegated the older one to "cr". What about the extra CR copy? That was done because Oracle had to perform something called CR processing to create a CR copy from another CR copy or an XCUR copy.

You can notice how the number of buffers proliferate. Let's change the experiment a little bit. From a 4th session, select from the table, instead of updating a row:

Sess4>  select * from bhtest ;

From the original session, check for the buffers.

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     xcur       00
         7       4418 segment header     cr         00
         7       4419 data block         xcur       00
         7       4420 data block         xcur       00
         7       4421 data block         xcur       00
         7       4422 data block         xcur       00
         7       4423 data block         cr         00
         7       4423 data block         cr         00
         7       4423 data block         cr         00
         7       4423 data block         cr         00
         7       4423 data block         cr         00
         7       4423 data block         xcur       00

12 rows selected.

Whoa! there are 12 buffers now. Block 4423 now has 6 buffers - up from 4 earlier. This was merely a select statement, which, by definition does not change data. Why did Oracle create a buffer for that?

Again, the answer is CR processing. The CR processing creates copies of the buffer and rolls them back or forward to create the CR copy as of the correct SCN number. This created 2 additional CR copies. From one block, now you have 6 buffers and some buffers were created as a result of select statement. This should answer the question whether the buffer cache of size of the database will be able to hold all the buffers.

Free Buffers

While being on the subject, let's ponder over another question - what happens when you flush the buffer cache? Let's see.

SQL> alter system flush buffer_cache;

System altered.

Checking the buffers using the script shown earlier:

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     free       00
         7       4418 segment header     free       00
         7       4419 data block         free       00
         7       4420 data block         free       00
         7       4421 data block         free       00
         7       4422 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00

The buffers are still there and marked as belonging to the object. However the status is "free", i.e. the buffers can be reused if some session wants them for some other block. If a session wants a free buffer and can't find one, it waits with the wait event "free buffer wait". At that point, Oracle makes room in the buffer cache for the blocks requested by the session by forcing the buffers out of the buffer cache. The CR copies are discarded (since they were not updated) and the XCUR copies were written to the disk (if not written already).

Who does the writing? It's the process known as DBWR - Database Buffer Writer, which is generally named DBW0, DBW1, etc. The DBWR is a very lazy process. It sleeps most of the time, unless it is woken up by someone. In this case the session (actually the server process) kicks DBWn to write the dirty buffers to the disk and change the status to non-dirty. This is why sometimes SELECT statement may generate writing to data files. Until the buffers are freed up, the session waits patiently and displays to all that it's waiting on free buffer waits. You can check it by selecting the EVENT column from V$SESSION.

Impact on Segment Header

So far we talked about the data block. When does the segment header see some action? Segment header does not contain any data; so it does not need to be updated every time an update is made. But when the segment itself is modified, segment header is updated. Let's see an example with adding a column:

SQL> alter table bhtest add (col3 number);

Table altered.

Check the buffers:

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     free       00
         7       4418 segment header     free       00
         7       4419 data block         free       00
         7       4420 data block         free       00
         7       4421 data block         free       00
         7       4422 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00

16 rows selected.

There are 6 copies of the segment header. Table alteration changed the segment header block - block# 4418. Just like any other block, the buffer was copied over and modified, creating multiple copies of the block. Whe you issue another DDL operation - truncate - the result is the same:

SQL> truncate table bhtest; 

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4416 1st level bmb      free       00
         7       4417 2nd level bmb      free       00
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     free       00
         7       4418 segment header     free       00
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     free       00
         7       4419 data block         free       00
         7       4420 data block         free       00
         7       4421 data block         free       00
         7       4422 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00

20 rows selected.

There are now additional segment header buffers, since truncate is a DDL command and segment header is modified. Even though there was no block of the table in the buffer cache (remember, we flushed it), the segment header still needs to be updated. Truncate also marks all blocks as free, and resets the high water mark and updates the bitmap block. The bitmap block (BMB) is used in tablespaces with automatic segment space management (ASSM) to display whether a block is free or not, which is similar to the functionality of freelists. The truncate caused the BMBs (there are two - 1st level and 2nd level) to be modified as well and they also come into the buffer cache.

Takeaways

From the above discussion you saw how a table with just two blocks populated fills up the buffer cache with 20 buffers. Imagine a normal database with, say 10000 filled blocks (8KX10,000 = 80M). It might easily fill 200,000 buffers. With a 8K block size that amounts to 8 K X 200 K = 1600M, or about 1.6 GB of buffer cache. The amount of buffers taken up depends on several factors:

(1) less buffers consumed in selects
(2) less buffers consumed if the commits occur more frequently, since the demand for CR processing is less
(3) more buffers are consumed if the activity updates buffers
(4) more the logical I/O, the more the need for buffers
(5) DDL activities increase the demand for the buffers, even if the blocks are empty

Food for Thought

Now that you know how the buffers are consumed, what do you think the situation is in a RAC environment  where buffers are transferred from one instance to the other - better, worse, or about the same? The answer will come in a future blog. Stay tuned.

13 comments:

Surachart Opun said...

Great -)
Thank You

Bryan Grenn said...

Very nice article Arup. I have been giving this concept a lot of thought. We have been meeting with hardware vendors that are promising 4tb of memory in the next couple of years. Of course your topic has come to mind. Can you fit the whole database in memory ? What does this mean to our future Storage decisions ?
I am going to look closely at your block usage testing (and include some index block testing since we have a lot of indexes), to see how big of a database we can fit into 4tb of memory. From the looks of your testing, if there are lots of updates, we might find a 1tb database might not even fit. I certainly didn't expect that !

Anonymous said...

Nice work Arup! And it reminds me to remind customers to consider this when intending to place volatile objects in keep cache they cannot budget merely the size of the object in question. Measurements over time are required to have enough head room to avoid most thrashing.

mwf@rsiz.com (anon for not logging in)

Arup Nanda said...

@Bryan - your point is exactly what I was trying to illustrate. Oracle' buffer cache mechanism does not allow entire database to fit into the buffer cache. If a all-memory access is desired, the best choice is to get a database specifically designed for that purpose, e.g. TimesTen.

In part 2 of this topic I am going to explain how you can accomplish the objective of most of the buffers of the block.

Arup Nanda said...

@MWF - thanks for chiming in. One of the most common myths folks tend to dabble in is that a table defined as cached (alter table xxx cache) is cached. In a future blog in this series I hope to bust that myth.

Arup Nanda said...

@Surachart - thank you.

painting techniques said...

This is indeed interesting!...Thanks for sharing this!...Daniel

Enrique Aviles said...

Excellent illustration. I logged in to an Oracle session and typed in the examples as I read.

It's interesting to see how many people think having a large SGA means the entire database is in memory. Even if that was possible, in order to do that the application must select all rows from all tables and indexes which is not realistic.

Additionally, just because a row was loaded in the buffer cache doesn't mean it will stay there indefinitely so at some point in the future the same row would have to be read from disk again.

This brings me to another question: If a row is loaded in the buffer cache and later the buffer is marked free will Oracle reuse that buffer or will it allocate a new one (assuming there is a lot of free, unallocated memory)? I'm just trying to understand if Oracle continues to consume or allocate memory even if there are free buffers that could be reused. Would that only happen if it's running low on memory?

Thanks

Alberto said...

Great article Arup, we already attend the next one...

Flado said...

Quote:
The CR processing creates copies of the buffer and rolls them back or forward to create the CR copy as of the correct SCN number.

Could you expand a bit on that? I can see how a buffer may be rolled back (by aplying undo), but forward? That would require redo, which is a whole different can of worms.

Cheers,
Flado

NEERAJ VISHEN said...

very good post........

all d best

Nitin R said...

Great article. Neatly explained.
Many thanks to you.

Anonymous said...

I searched a long time for such an great article. Thank you very much.
disk doctors digital media recovery

Translate