Rules to be checked before denormalizing the tables
Frequently, however, performance needs dictate very quick retrieval capability for data stored in relational databases. To accomplish this, sometimes the decision is made to denormalize the physical implementation. Denormalization is the process of putting one fact in numerous places. This speeds data retrieval at the expense of data modification.
It is not the intention of this article to promote the concept of denormalization. Of course, a normalized set of relational tables is the optimal environment and should be implemented for whenever possible. Yet, in the real world, denormalization is sometimes necessary. Denormalization is not necessarily a bad decision if implemented wisely. You should always consider these issues before denormalizing:
- can the system achieve acceptable performance without denormalizing?
- will the performance of the system after denormalizing still be unacceptable?
- will the system be less reliable due to denormalization?
If the answer to any of these questions is "yes," then you should avoid denormalization because any benefit that is accrued will not exceed the cost. If, after considering these issues, you decide to denormalize be sure to adhere to the general guidelines that follow.
If enough DASD is available at your shop, create two sets of tables: one set fully normalized and another denormalized. Populate the denormalized versions by querying the data in the normalized tables and loading or inserting it into the denormalized tables. Your application can access the denormalized tables in a read-only fashion and achieve performance gains. It is imperative that a controlled and scheduled population function is maintained to keep the data in the denormalized and normalized tables synchronized.
If DASD is not available for two sets of tables, then maintain the denormalized tables programmatically. Be sure to update each denormalized table representing the same entity at the same time, or alternately, to provide a rigorous schedule whereby tables will be synchronized. At any rate, all users should be informed of the implications of inconsistent data if it is deemed impossible to avoid unsynchronized data.
When updating any column that is replicated in many different tables, always update it everywhere that it exists simultaneously, or as close to simultaneously as possible given the physical constraints of your environment. If the denormalized tables are ever out of sync with the normalized tables be sure to inform end-users that batch reports and on-line queries may not contain sound data; if at all possible, this should be avoided.
Finally, be sure to design the application so that it can be easily converted from using denormalized tables to using normalized tables.
Can't find what you're looking for? Try Google Search!
Comments on "Rules to be checked before denormalizing the tables"