Thursday, June 13, 2013

Primary Keys Guarantee Uniqueness? Think Again.

When you create a table with a primary key or a unique constraint, Oracle automatically creates a unique index, to ensure that the column does not contain a duplicate value on that column; or so you have been told. It must be true because that is the fundamental tenet of an Oracle database, or for that matter, any database.

Well, the other day I was checking a table. There is a primary key on the column PriKey. Here are the rows:

Select PriKey from TableName;

PriKey
------
1
1

I got two rows with the same value. The table does have a primary key on this column and it is enforced. I can test it by inserting another record with the same value - “1”:

SQL> insert into TableName values (1,…)

It errors with ORA-00001: unique constraint violated error. The question is: why there are two rows with duplicate values in a column that has an enforced primary key, and it refuses to accept the very value that is already violated the primary key? It could be a great interview question, test your mettle; or just entertaining. Read on for the answer.


Setup

Let’s start with creating a table

SQL> create table pktest1 (
  2     pk1     number,
  3     col2    varchar2(200)
  4  );

Table created.

Notice how I deliberately decided not to add a primary key column now. Let’s add some records. Note that I inserted two records with the pk1 value of 1.

SQL> insert into pktest1 values (1,'One');
1 row created.

SQL> insert into pktest1 values (1,'Second One');

1 row created.

SQL> insert into pktest1 values (2,'Two');

1 row created.

SQL> commit;

Commit complete.

Now we will create an index.

SQL> create index in_pktest1_01 on pktest1 (pk1);

Index created.

Note, I did not use a uniqueness clause; so the index will be created as nonunique. I can confirm that by checking the status and uniqueness of the index quickly:

SQL> select index_name, status, uniqueness
  2  from user_indexes
  3  where table_name = 'PKTEST1';

INDEX_NAME                     STATUS   UNIQUENES
------------------------------ -------- ---------
IN_PKTEST1_01                  VALID    NONUNIQUE

And, now I will add a primary key constraint on that column:

SQL> alter table pktest1 add constraint in_pktest1_01 primary key (pk1);

alter table pktest1 add constraint in_pktest1_01 primary key (pk1)
                                                 *
ERROR at line 1:

ORA-02437: cannot validate (ARUP.IN_PKTEST1_01) - primary key violated

The constraint creation failed, as expected since there are two rows with the same value in the column. We have to delete the offending row for this PK to be created. I should have done that; but instead I used something like the following:

SQL> alter table pktest1 add constraint in_pktest1_01 primary key (pk1) disable keep index;

Table altered.

The constraint was created, even with the duplicate values! So, how did Oracle allow that? The statement succeeded because you created the constrained with a disabled status. You can confirm by checking the status of the constraint:

SQL> select constraint_name, status
  2  from user_constraints
  3  where table_name = 'PKTEST1';

CONSTRAINT_NAME                STATUS
------------------------------ --------
IN_PKTEST1_01                  DISABLED

This is where you need to understand a very important attribute of the key enforcement of Oracle Database through an index. It’s true that PK or UK constraints are enforced through unique indexes. The only reason you the unique index was created in that case is to enforce the uniqueness. But what if Oracle already has a unique index on that column? In that case, Oracle decides to repurpose that index for the primary key. That’s what happened in this case.

But wait, the index we created was non-unique; how did Oracle use it to enforce the PK? Well, the answer is simple. Oracle simply doesn’t care. If an index exists, Oracle just uses it – unique or not.

When you disable the constraint, the purpose of the index is also eliminated and it is dropped. However, if you pre-created the index, it is not dropped. If the index was created with the constraint definition, then the “keep index” clause preserves the index. You can check that the index still exists even when the PK is gone.

SQL> select index_name, status
  2  from user_indexes
  3  where table_name = 'PKTEST1';

INDEX_NAME                     STATUS
------------------------------ --------
IN_PKTEST1_01                  VALID

Remember, the constraint is still disabled. Later, I enabled the constraint:

SQL> alter table pktest1 modify constraint in_pktest1_01 enable novalidate;

Table altered.

How was the constraint enabled? Looking at the table data:

SQL> select * from pktest1;
       PK1 COL2
---------- --------------------
         1 One
         1 Second One
         2 Two

The data still shows duplicate rows. The trick is using the clause “novalidate”, which instructs Oracle to skip checking of the existing data in the table. This is why the duplicate value in the table were tolerated while creating the constraint. The constraint is still enabled.

SQL> select constraint_name, status
  2  from user_constraints
  3  where table_name = 'PKTEST1';

CONSTRAINT_NAME                STATUS
------------------------------ --------
IN_PKTEST1_01                  ENABLED

However, only the existing rows are skipped; the future rows are subject to the enforcement of the constraint, as shown below:

SQL> insert into pktest1 values (1,'Third One');

insert into pktest1 values (1,'Third One')
*
ERROR at line 1:
ORA-00001: unique constraint (ARUP.IN_PKTEST1_01) violated

This is yet another property of the Oracle database you should be aware of – the presence of NOVALIDATE clause in constraint enablement. This is why there were duplicate values but a primary key constraint was still enabled.

One question you may be wondering about is – why would Oracle even have such a feature that could possibly allow non-conforming values? Consider this: you have a billion row table in a datawarehouse and using ETL processes you move them around between tables. If you have a constraint enabled, the constraint will be checked for the rows inserted – consuming considerable time and system resources. The data integrity in each step does not change; so why would you have the constraint. Well, you do have the constraints like primary key and foreign key to show the relationship between tables for modeling systems and query tools that rely on such information. One small example is Query Rewrite by Oracle, which allows a query to be rewritten from a summary table (materialized view) even though the query used a lot of dependent tables. Without the relationships between the tables Oracle could not have decided which materialized view to pull the data from. So, you want the constraints; but when you enable them, you don’t want to check for the conformance in the existing data. Here is the how the process looks like

1 Disable constraints on TargetTable
2 Load TargetTable Select * from SourceTable
3 alter table TargetTable enable constraint … novalidate

Since the constraints are disables during Step 1, the loading will be much faster in step 2. When the constraints are enabled in Step 3, they are not checked for in the existing data – making it almost instantaneous. This is why Oracle allows this process.

Conclusion

Let’s summarize what we learned from this:

(1) Just because there is an enabled constraint on a table does not mean that all the rows will conform to the constraint. For instance, a primary key on a column does not mean that the column will not contain any duplicate values.
(2) The primary key constraint is enforced through a unique index. However, if Oracle already finds an index – unique or not – it uses it for the primary key.
(3) In that case, if the index was initially created as nonunique, it will continue to show as nonunique; but it will actually be a unique index.
(4) If you disable the primary key constraint, the index, if created by Oracle to support the PK, is also dropped. To keep the index, use the “keep index” clause.
(5) When you enable a constraint with the “novalidate” clause the constraint does not check for the existing data; so there could be non-conforming values in the table.

By the way, what is the difference between primary key and unique constraints? They both enforce uniqueness of values in a column; but unique keys allow nulls while primary keys don't. Both of them enforce the uniqueness by creating a unique index.

I hope it was useful. Like always, I will appreciate you feedback.
Post a Comment

Translate