The overextended usage of surrogate keys

I recently encountered a situation whereby surrogate keys were used as primary keys. It looks as a reasonable choice; since the so-called SCD2 mechanism was also used to store historic information. The mechanism then works as follows: if we get updated information on a record that is already in the table, we insert a new record with the new information. As a consequence a new surrogate key is issued that acts to identify the new record. This implies that an object that regularly needs to be updated also repeatedly has new surrogate keys that regularly needs to be inserted. Each value of a surrogate key corresponds to a new record that was inserted as a result from new information on the object.
We know that situation from the dimensional models that Kimball advocates. The dimensions are regularly updated and (hence) objects regularly are attached to new values of the surrogate keys that are used as a primary key. If new facts are added, these new surrogate keys will be used. See the scheme below:

scheme usage SK

Hence, in the Kimball situation, the usage of surrogate keys will not lead to major problems. Each time an update on the dimension is received, a new record will be inserted. A new surrogate key will then be used. If new facts are received, the surrogate keys are looked up from the dimensions and they are used as a foreign key to link the facts to dimensions.

So far, so good.

However, the dimensions may also be linked to other tables. The linkage may be maintained with these surrogate keys. The organisation asked to update these surrogate keys that are used as foreign keys to display the most recent situation.
This leads to a trickling down effect of a change as a consequenc from using surrogate keys.

scheme usage SK

See the scheme above.
1: A change on a first table arrives. As a consequence, a new record is created that captures the change. As a surrogate key is used as a primary key, a new value for the surrogate key is retrieved.
2: The surrogate key value is also used in a second table as a foreign key. As the organisation asks to update such information, a new record must be inserted that captures the new foreign key value. New record implies a new surrogate key.
3: This induces another round of updates. The new surrogate key may on its turn be used in yet a third table. New record in the third table. New surrogate key.
4: Etc.

What happened is that a complicated and interrelated structure does not sustain a surrogate key structure that is volatile with respect with the underlying objects. If each new representation of the object leads to a new primary key, we are faced with a continous task of updating the foreign keys. A change in one row of a fundamental table may trickle down to a series of updates of foreign keys elsewhere. It not only leads to updates of foreign keys that refer to the product table. It also leads to updates of foreign keys that point to tables that point to product tables etc.

Consequence. If we have an interrelated model, we should maintain a foreign key that does not change its value with respect to the underlying object. If we use a SCD2 mechanism , we should use a primary key that is composed of a stable surrogate key plus the date at which the change is implemented. This will uniquely identify the record. It also avoids the constant update of foreign key as we may issue the stable surrogate key.

Breaking

Door tom

Gerelateerd bericht

Je miste

Flask and JSON

A webserver from the command line

Use the node.js server as restful app server

Reading a CSV file and translate into dataframe