This question may seems very trivial or even stupid but surprisingly, depending on who you ask it to, you may get different answers. What developers call Primary Key is Synthetic Key for Database Administrators, while their version of primary key is known to us as Natural Key. In this article we would explore why the difference in terminology and does it really matter what name you call it by?

Primary key is fundamental way of identifying a row in the database. Every record in database needs to have an absolute unique identifier which would identify this record.

This is not a database problem, in real world we need to identify various things, for example people. Is their name the unique identifier, you could have multiple people with same names, so what would you do to uniquely identify a person. May be identify them by their phone numbers, but if two person from different countries may have same phone number and very unlikely but theoretically possible to have same name. So we still need a unique identifier to get to this person, so how about a combination key, the chances of a person having same phone number, same name and same date of birth is realistically rare. So we can use this combination to uniquely identify a person or a person record in database. Now depending on your role as a developer or database administrator, you would call this key as the Primary Key or Natural Key.

In pretty much every database you can use combination keys as Primary Key however the issue with combination key is that, if you use it as a Primary Key you would need to use the same three column combination as a Foreign Key as well. Oops! So now you would have Name, DOB and PhoneNumber as Foreign Key in every table which has a reference to our User table. Not a very pretty scenario, and then if you imagine mapping this to an ORM framework like Hibernate, it would be a nightmare. You would also need to pass it to every UI page, and search parameter which would increase the complexity and ugliness of the whole application.

So get out of this mess, databases allow you to generate auto generated keys, like Counters or Sequence or some thing similar. Most of the database either use Sequence like Oracle, PostGreSQL or an auto generated counter backed column like MySQL, MSSql Server e.t.c. These auto generated keys are what database administrators call as Synthetic Key for which I have to agree with them. Using these Synthetic Key allows you to have simpler foreign keys and query params.

So what exactly is the issue with Synthetic Key or may be these DBAs are just touchy about terminology. Actually for this you need to understand a little bit about how database store data.

Databases store data sorted based on Primary Key or Clustered Index, a Clustered Index is a unique identifier based on which the sorting order of physical storage of data is defined. In almost all databases except MSSSql Server, you can not have non primary key clustered index which means the data is sorted based on primary key. When you are loading the data through any query, the database would not load one record at a time, it loads one Page at a time, a Page is a unit of bytes which is the minimum amount of data a database loads in one I/O operation. This Page is usually placed in cache for easier access, so if the next query is based on Clustered Index aka Primary Key the data would be returned from cache instead of additional I/O, which makes it tad bit faster.

So if you listed to your DBA and define your natural keys as primary keys your queries would run a lot faster.



There are currently no comments.