Thursday, June 13, 2013

Primary Keys Guarantee Uniqueness? Think Again.

When you create a table with a primary key or a unique constraint, Oracle automatically creates a unique index, to ensure that the column does not contain a duplicate value on that column; or so you have been told. It must be true because that is the fundamental tenet of an Oracle database, or for that matter, any database.

Well, the other day I was checking a table. There is a primary key on the column PriKey. Here are the rows:

Select PriKey from TableName;

PriKey
------
1
1

I got two rows with the same value. The table does have a primary key on this column and it is enforced. I can test it by inserting another record with the same value - “1”:

SQL> insert into TableName values (1,…)

It errors with ORA-00001: unique constraint violated error. The question is: why there are two rows with duplicate values in a column that has an enforced primary key, and it refuses to accept the very value that is already violated the primary key? It could be a great interview question, test your mettle; or just entertaining. Read on for the answer.


Setup

Let’s start with creating a table

SQL> create table pktest1 (
  2     pk1     number,
  3     col2    varchar2(200)
  4  );

Table created.

Notice how I deliberately decided not to add a primary key column now. Let’s add some records. Note that I inserted two records with the pk1 value of 1.

SQL> insert into pktest1 values (1,'One');
1 row created.

SQL> insert into pktest1 values (1,'Second One');

1 row created.

SQL> insert into pktest1 values (2,'Two');

1 row created.

SQL> commit;

Commit complete.

Now we will create an index.

SQL> create index in_pktest1_01 on pktest1 (pk1);

Index created.

Note, I did not use a uniqueness clause; so the index will be created as nonunique. I can confirm that by checking the status and uniqueness of the index quickly:

SQL> select index_name, status, uniqueness
  2  from user_indexes
  3  where table_name = 'PKTEST1';

INDEX_NAME                     STATUS   UNIQUENES
------------------------------ -------- ---------
IN_PKTEST1_01                  VALID    NONUNIQUE

And, now I will add a primary key constraint on that column:

SQL> alter table pktest1 add constraint in_pktest1_01 primary key (pk1);

alter table pktest1 add constraint in_pktest1_01 primary key (pk1)
                                                 *
ERROR at line 1:

ORA-02437: cannot validate (ARUP.IN_PKTEST1_01) - primary key violated

The constraint creation failed, as expected since there are two rows with the same value in the column. We have to delete the offending row for this PK to be created. I should have done that; but instead I used something like the following:

SQL> alter table pktest1 add constraint in_pktest1_01 primary key (pk1) disable keep index;

Table altered.

The constraint was created, even with the duplicate values! So, how did Oracle allow that? The statement succeeded because you created the constrained with a disabled status. You can confirm by checking the status of the constraint:

SQL> select constraint_name, status
  2  from user_constraints
  3  where table_name = 'PKTEST1';

CONSTRAINT_NAME                STATUS
------------------------------ --------
IN_PKTEST1_01                  DISABLED

This is where you need to understand a very important attribute of the key enforcement of Oracle Database through an index. It’s true that PK or UK constraints are enforced through unique indexes. The only reason you the unique index was created in that case is to enforce the uniqueness. But what if Oracle already has a unique index on that column? In that case, Oracle decides to repurpose that index for the primary key. That’s what happened in this case.

But wait, the index we created was non-unique; how did Oracle use it to enforce the PK? Well, the answer is simple. Oracle simply doesn’t care. If an index exists, Oracle just uses it – unique or not.

When you disable the constraint, the purpose of the index is also eliminated and it is dropped. However, if you pre-created the index, it is not dropped. If the index was created with the constraint definition, then the “keep index” clause preserves the index. You can check that the index still exists even when the PK is gone.

SQL> select index_name, status
  2  from user_indexes
  3  where table_name = 'PKTEST1';

INDEX_NAME                     STATUS
------------------------------ --------
IN_PKTEST1_01                  VALID

Remember, the constraint is still disabled. Later, I enabled the constraint:

SQL> alter table pktest1 modify constraint in_pktest1_01 enable novalidate;

Table altered.

How was the constraint enabled? Looking at the table data:

SQL> select * from pktest1;
       PK1 COL2
---------- --------------------
         1 One
         1 Second One
         2 Two

The data still shows duplicate rows. The trick is using the clause “novalidate”, which instructs Oracle to skip checking of the existing data in the table. This is why the duplicate value in the table were tolerated while creating the constraint. The constraint is still enabled.

SQL> select constraint_name, status
  2  from user_constraints
  3  where table_name = 'PKTEST1';

CONSTRAINT_NAME                STATUS
------------------------------ --------
IN_PKTEST1_01                  ENABLED

However, only the existing rows are skipped; the future rows are subject to the enforcement of the constraint, as shown below:

SQL> insert into pktest1 values (1,'Third One');

insert into pktest1 values (1,'Third One')
*
ERROR at line 1:
ORA-00001: unique constraint (ARUP.IN_PKTEST1_01) violated

This is yet another property of the Oracle database you should be aware of – the presence of NOVALIDATE clause in constraint enablement. This is why there were duplicate values but a primary key constraint was still enabled.

One question you may be wondering about is – why would Oracle even have such a feature that could possibly allow non-conforming values? Consider this: you have a billion row table in a datawarehouse and using ETL processes you move them around between tables. If you have a constraint enabled, the constraint will be checked for the rows inserted – consuming considerable time and system resources. The data integrity in each step does not change; so why would you have the constraint. Well, you do have the constraints like primary key and foreign key to show the relationship between tables for modeling systems and query tools that rely on such information. One small example is Query Rewrite by Oracle, which allows a query to be rewritten from a summary table (materialized view) even though the query used a lot of dependent tables. Without the relationships between the tables Oracle could not have decided which materialized view to pull the data from. So, you want the constraints; but when you enable them, you don’t want to check for the conformance in the existing data. Here is the how the process looks like

1 Disable constraints on TargetTable
2 Load TargetTable Select * from SourceTable
3 alter table TargetTable enable constraint … novalidate

Since the constraints are disables during Step 1, the loading will be much faster in step 2. When the constraints are enabled in Step 3, they are not checked for in the existing data – making it almost instantaneous. This is why Oracle allows this process.

Conclusion

Let’s summarize what we learned from this:

(1) Just because there is an enabled constraint on a table does not mean that all the rows will conform to the constraint. For instance, a primary key on a column does not mean that the column will not contain any duplicate values.
(2) The primary key constraint is enforced through a unique index. However, if Oracle already finds an index – unique or not – it uses it for the primary key.
(3) In that case, if the index was initially created as nonunique, it will continue to show as nonunique; but it will actually be a unique index.
(4) If you disable the primary key constraint, the index, if created by Oracle to support the PK, is also dropped. To keep the index, use the “keep index” clause.
(5) When you enable a constraint with the “novalidate” clause the constraint does not check for the existing data; so there could be non-conforming values in the table.

By the way, what is the difference between primary key and unique constraints? They both enforce uniqueness of values in a column; but unique keys allow nulls while primary keys don't. Both of them enforce the uniqueness by creating a unique index.

I hope it was useful. Like always, I will appreciate you feedback.

11 comments:

Nishant Baurai said...

Good one Arup.

Enrique Aviles said...

Excellent explanation and examples as usual. Still, I think the title could be a bit misleading. Primary keys do guarantee uniqueness when enabled otherwise you would not get ORA-00001 errors. Duplicates were allowed when the table had no primary key. I thought you were going to show an example where a table with an enabled primary key still allowed duplicates. I'm glad to see that is not possible. :)

Now, if someone thinks duplicate values will be magically removed by creating a PK then that's a different and incorrect assumption.

Arup Nanda said...

@Enrique - that is the whole point. The presence of primary keys do enforce the non-duplicity of values. Many people assume that the corollary to this fact must be when there is a primary key, there is no possibility of duplicate values. This corollary is not correct; and that was the point of this post. Hope it explains.

Maja said...

Thanks for the post. I really like the way you explain things.

Anonymous said...

Great Job!

Srikar Dasari said...


Nice one Arup.

It appears to me that, Tom kyte says there would not be any non-unique key as such, Oracle appends the row id internally if it is defined as a non unique index to enforce uniquness

http://www.oracle.com/technetwork/issue-archive/o14asktom-084433.html

Regards,
Srikar Dasari

Gokhan Atil said...

Hi Arup,

Definitely an interesting post I should mention that it's not only about primary keys. We can ask the same question for any constraint. Unique constraint guarantees uniqueness? "Not null" constraint guarantees that column will not have nulls? etc...

The "novalidate" parameter lets users skip validating data for the constraints, so this is an intentionally designed behavior of Oracle. Of course, creating a nonunique index is a good trick for PK/unique constraints but it is not necessary to enable "invalid" null/check and foreign key constraints.

By the way, in my tests, I didn't need to use "keep index" while creating the constraint. Oracle used existing index even I didn't add "keep index".

Regards

Gokhan

Mohamed Houri said...
This comment has been removed by the author.
Mohamed Houri said...

Hi Arup,
Nice post and, and as always, elegantly narrated. You are one of those persons that possess a really attractive writing style.
Today, I learnt the option “keep” that allows keeping an index (created automatically during the primary key constraint creation) when dropping that primary key.
There is also the direct path option which silently disables the primary key index and allow duplicate keys to occurs behind the scene:
http://hourim.wordpress.com/category/direct-path/
However, I have one remark related to point 3)
(3) In that case, if the index was initially created as nonunique, it will continue to show as nonunique; but it will actually be a unique index.
I don’t think that a non unique index enforcing a primary key will be considered as a unique index.
SQL> create table t1 as select rownum n1, rownum + 3 n2 from dual connect by level <= 10;

Table created.

SQL> create index ind_1 on t1(n1,n2);

Index created.

SQL> alter table t1 add constraint t1_pk primary key (n1);

Table altered.

SQL> select index_name, uniqueness
from user_indexes
where table_name = 'T1';

INDEX_NAME UNIQUENES
------------------------------ ---------
IND_1 NONUNIQUE

SQL> select * from t1;

N1 N2
---------- ----------
1 4
2 5
3 6
4 7
5 8
6 9
7 10
8 11
9 12
10 13

10 rows selected.

SQL> insert into t1 values (1,25);
insert into t1 values (1,25)
*
ERROR at line 1:
ORA-00001: unique constraint (XXX.T1_PK) violated

SQL> insert into t1 values (11,25);
1 row created.
SQL> insert into t1 values (12,25);
1 row created.
It will be instead considered unique only for the primary key columns which should be at the leading edge of the non unique index.
SQL> alter table t1 drop constraint t1_pk;

Table altered.

SQL> drop index ind_1;

Index dropped.

SQL> create index ind_1 on t1(n2,n1); --> index doesn’t start with the pk column

Index created.

SQL> alter table t1 add constraint t1_pk primary key (n1);

Table altered.

SQL> select index_name, uniqueness
2 from user_indexes
3 where table_name = 'T1';

INDEX_NAME UNIQUENES
------------------------------ ---------
IND_1 NONUNIQUE
T1_PK UNIQUE

Oracle realizes that it is impossible to enforce primary key constraint using an index which doesn’t start with the primary key columns.

Non unique index can easily check if an index entry already exist that violates the primary key constraint even if this index has extra columns when compared to the primary key columns; all what it needs is to start with these primary key columns.

Best regards
Mohamed Houri

Anonymous said...

Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! Its always nice when you can not only be informed, but also entertained! Im sure you had fun writing this article.
Valuable information and excellent design you got here! I would like to thank you for sharing your thoughts and time into the stuff you post!!
http://www.sqlservermasters.com/

ok said...

think back work from home supervisor

Translate