Normalisation

This is a perfect example of common sense made complicated. Don't panic if this turns into a blur for you!

The rules

Normalisation says that you should make your database work properly. You probably would have anyway (to a point).

It is based on a series of rules or guidelines called "normal forms". They just mean "do it right". Three are understandable by mere mortals:

  1. First normal form (1NF) says:
    • don't expect the user to type the same data in over and over on different rows
    • don't put more than one thing into one field
    • don't store any data which can be calculated from other data
    • don't store a list of similar things in one record
  2. Second normal form says that a table should only hold data which belongs to all parts of a composite primary key
  3. Third normal form is similar to 2NF but says that any data in a table must depend on the primary key and not on another field in the table (that just means that data should be in the right table!)

Examples

Normalisation often means splitting tables into two. When you do that it is often easier to handle the data.

First normal form

Going against first normal form could happen when:

Second normal form

An example of going against second normal form could be an appointments database which holds a record of visits by sales people. The primary key of the main table is salesPersonId and dateTime which together identify the records uniquely. If this table included the telephone number of the sales person then that should be split out into a separate table as it depends on just one part of the composite primary key (the sales person) and not the other (date and time).

Third normal form

Going against third normal form might mean putting full address details into an table recording customers. They do not belong there because the zip code/postcode allows you to identify the street, town/city and county. Those things "depend" on postcode/zip code and not on the primary key (the person). When the postcode is PL26 4FF the street, town and county will always be the same. Better to have a postcode table which holds the street, village, town and country. Then only the postcode needs to be entered for a person's address and the full address is available from the related table.

Benefits

All of these make sense because they make the database better in many ways including:

This page is not a good explanation of normalisation. This is because you need to be aware of normalisation at this stage rather than understand it completely. For now, be aware that it is possible to create a working database that is very inefficient and that normalisation is the name for fixing that. If you plan to work a lot with databases then later you should come back to normalisation and try to understand it fully.Practice creating a good data structure

submit to reddit Delicious Tweet