Normalisation is performed between building the logical and the physical model and before the actual implementation. It allows to have a set of tables optimised for space.
We need to ensure that all of our data structures are able to be implemented in a relational database. This is not a problem if the structure is what we term Simple, for example:
Customer: Customer Number; Name; Address; Telephone; Item Ordered; Quantity; Price
This can easily be implemented in a database table. However, a customer may order several items and each customer in our database may order a different number of items. This situation makes it difficult for us to implement the data in a relational database since we do not know how many Order entries to allow for.
For example, we could have the situation where most of our customers only order a single item, several order 2 or 3 and a few occasionally order 15. In this case we would have to provide enough space in the table so that every customer could order up to 15 items but because most only order one product this would be a very wasteful use of space and would make the database perform very poorly.
A structure like this is not called Simple - it is called Complex. A complex entity must be converted to simple entities before it can be implemented in the database. This conversion process is called Normalisation.
Games order form
Let’s look at an order form example to understand how a complex structure works.
| Image: An order form example |
The document shown here represents an order form for a computer games retail company. Customers may place several orders and obviously the same product may be ordered by more than one customer. You will notice that certain data repeats more than once (Quantity, Product ID, Description, Unit Price and Amount) - this is therefore a Complex entity.
Those of us who have an ordered mind but aren't quite aware of relational databases might try to capture the Order data in a spreadsheet, such as Microsoft Excel.
Click here to access the example spreadsheet. The first Start tab shows you how it would originally look like. We will use the rest of the tabs later on, as we go through the whole normalisation process.
It isn't such a bad thing to use a Microsoft Excel spreadsheet, since it records every purchase made by every customer. But what if you started to ask complicated questions, such as:
How many copies of Alpine Ski Instructor Games-R-Us ordered in 2011?
What are total sales of Deck Chair Attendant Deluxe in the county of Buckinghamshire?
What items were sold on July 31, 2011?
As the spreadsheet grows, it becomes increasingly difficult to address these questions. In an attempt to put the data into a state where we can reasonably expect to answer such questions, we begin the normalisation process.
With the Entity-Relationship (ER) models, the modelling or mapping the business starts at the top by asking questions and addressing issues. When the database tables have been designed from an ER model (“top-down”) it is very important that these tables are normalised to eliminate redundancy. This happens at the level of the Logical model. This ensures that data only exists in the database in one place, hence giving faster retrieval from smaller databases. Updating, inserting and deleting data will also only have to occur in one place, and should also be quicker and less prone to problems caused by anomalies in the data.
Normalisation can also be applied from the bottom up. The examples shown here are using such an approach where the design of the database is done by collating all the business reports and placing this information in database tables, which are then normalised.
Thus, normalisation can be applied both when designing the tables for a new project or to check or transform existing files in databases.
The normalisation method is very formal and at times can appear quite mechanical in its nature. This has advantages, in that the techniques can be applied by relatively junior members of the business analysis team. However, the major disadvantage is that if only current business reports are used, the design will only relate to the current model of the business rather than taking on board any new requirements.
Disadvantages can occur when the normalised tables of data are queried. The querying of data across multiple tables means those tables have to be joined. The joining of tables is time consuming and uses resources both of the processor and the file I/O. This can mean that tables are often denormalised to give adequate performance.
Denormalisation is staying at lower level of normalisation or deliberately (but knowingly) breaking some of the normalisation rules in order to work with reduced number of tables. This is often applied in data warehousing with data that have hierarchical structure with multiple levels. Thus, denormalisation provides a “trade-off” between eliminating redundancies and achieving acceptable query speed.
To summarise, the normalisation method has got the following characteristics:
- Approach to database design from ‘top-down’ or ‘bottom up’
- Useful for transforming existing files, checking existing databases
- Rigorous and formal approach
- Leads to increasing the number of tables, hence slowing down the querying of data
- Denormalisation provides a “trade-off” between eliminating redundancies and achieving acceptable query speed.
When applied top-down, normalisation is a very good strategy for designing the tables and selecting primary and foreign keys. When applied bottom-up, it leads to improving the structure of tables. Its big benefit is removing redundancy and thus organising the data in the best way with regard to space.
An additional benefit when normalisation is applied top-down is that any problems show up early in the design, before implementation in the database.
As a result of normalisation, a table may have to be broken into two or more tables, but ultimately their further maintenance is easier.
There are no insertion anomalies when data cannot be added due to absence of other data. For instance, impossibility to add a new customer unless they make an order.
There are no deletion anomalies. When deleting a record means deleting data that should have been kept. For instance, deleting the cancelled order from a customer with only one order means deleting the customer too.
Any modifications are easier and faster because there are no redundancies (duplications). Everything is stored in one place only.
To summarise, the benefits of normalisation are:
- Strategy for key and table selection
- Problems show up early in the design
- Eliminates insertion anomalies
- Eliminates deletion anomalies
- Reduction in database modification time
Next, let’s see what solution is possible for our order example by following the process of normalization through its series of stages called Normal forms.
In this course, you will go through the normalisation process and learn about Normal Forms.
A world-leading tech and digital skills organization, we help many of the world’s leading companies to build their tech and digital capabilities via our range of world-class training courses, reskilling bootcamps, work-based learning programs, and apprenticeships. We also create bespoke solutions, blending elements to meet specific client needs.