Duplicate records in teradata tables

Teradata offers the user the choice whether of not a check is made on duplicate records. Let’s first look at some code that allows duplicate records to be inserted.
The code below has two elements that enables duplicate records:

it contains a primary index that is not unique.
the table is a multiset table.

CREATE MULTISET  TABLE SAN_D_FAAPOC_01.EMPLOYEE
(
EMP_NUM INTEGER NOT NULL,
EMP_NAME CHAR(30) NOT NULL,
DEPT_NO INTEGER NOT NULL
)
 PRIMARY  INDEX(EMP_NUM);
 
INSERT INTO SAN_D_FAAPOC_01.EMPLOYEE VALUES (123456,'VINAY',101);
INSERT INTO SAN_D_FAAPOC_01.EMPLOYEE VALUES (123457,'SACHIN',102);
INSERT INTO SAN_D_FAAPOC_01.EMPLOYEE VALUES (123457,'SACHIN',102);

select * from SAN_D_FAAPOC_01.EMPLOYEE;

Two identical records can be inserted.
The table is a so-called multiset table. This type table allows duplicate records. Note that this is not a Teradata standard. The standard is a table that doesn’t have duplicate records. The standard would be: “CREATE SET TABLE …”. Such a table doesn’t allow duplicate records.

CREATE SET  TABLE SAN_D_FAAPOC_01.EMPLOYEE
(
EMP_NUM INTEGER NOT NULL,
EMP_NAME CHAR(30) NOT NULL,
DEPT_NO INTEGER NOT NULL
)
 PRIMARY  INDEX(EMP_NUM);
 
INSERT INTO SAN_D_FAAPOC_01.EMPLOYEE VALUES (123456,'VINAY',101);
INSERT INTO SAN_D_FAAPOC_01.EMPLOYEE VALUES (123457,'SACHIN',102);
INSERT INTO SAN_D_FAAPOC_01.EMPLOYEE VALUES (123457,'SACHIN',102);

select * from SAN_D_FAAPOC_01.EMPLOYEE;

This set of inserts generates an error as an attempt is made to insert two identical records. However it takes time to do such a check. Hence in a data warehouse environment, such a check with its accompanying overhead costs, might be too expensive. This explains why a multiset is preferred above the standard Teradata standard.
Finally, in most cases the primary index is set to being unique. As the attribute that is used in the primary key in unique, the whole record is always unique as well. In that case an additional check whether the record is unique is not performed. Hence this definition will lead to unique records as the primary index is unique. Again, this uniqueness is enforced without an explicit test on the records.

CREATE MULTISET  TABLE SAN_D_FAAPOC_01.EMPLOYEE
(
EMP_NUM INTEGER NOT NULL,
EMP_NAME CHAR(30) NOT NULL,
DEPT_NO INTEGER NOT NULL
--,CONSTRAINT FOREIGN_EMP_DEPT FOREIGN KEY ( DEPT_NO)  REFERENCES  WITH NO CHECK OPTION DEPARTMENT(DEPT_NO)   
)
UNIQUE  PRIMARY  INDEX(EMP_NUM);

Hence, we have two rules that may enforce unicity of records:

using the standard set option that prevent duplicate inserting duplicate rows
using a unique primary index conditions that prevents usage of duplicate values for attributes that form the primary index.

It is easier to check the unicity of a primary index as a primary index is only defined on a part of the record. Hence the combination multiset / non unique primary index is most easy to check. In that case, nothing needs to be checked on duplicates. On the other hand, set/ non unique is most difficult to check as the whole record must be checked.

Breaking

Door tom

Gerelateerd bericht

Je miste

Flask and JSON

A webserver from the command line

Use the node.js server as restful app server

Reading a CSV file and translate into dataframe