Friday, 22 April 2016

What is Normalization ?


Normalization is a process used in Relational DB design to organize the data for minimizing the duplication.
We divide the database in two or more tables and create relationship between them. After isolating the data we perform some addition, deletion or modification on the fields of a table then we propagate and remove the duplicate data from the related tables.

The main goals of normalization process are : 
  1. Eliminate the redundancy of data
  2. Make sure that the data dependencies (relationship) make sense.
It reduces the space occupied by the duplicate data in the database tables and ensure that the data is logically stored.

Some of the positive points of the data normalization :
  • Data integrity
  • To make optimized queries on the normalized tables and produce fast, efficient results.
  • To make faster index and also make a perfect sorting.
  • To increase the database performance

First normal Form (1NF)
Atomicity for values + Primary key
  • Each column should have atomic values
  • All the fields should contain only scalar values (Not a list of values)
  • Table must have at least one candidate key
    • Designate a primary key for each table
  • Table shouldn't have any duplicate record
  • No repeating groups
    (
    No attributes which occur a different number of times on different records)
    • Put items in the repeating group in a new table
BEFORE






AFTER




















Second normal Form (2NF)
  • Table must be in 1NF
  • All non-key attributes in table must be functionally dependent on primary key
    • Subset of data is removed and is organized in separate tables.
    • This process is applied to multiple rows of a table till the duplicity get reduced.
BEFORE





AFTER






Third Normal Form (3NF)
  • Table must be in 2NF
  • All columns should depend on the primary key only
    No inter-dependencies among non-key attributes
    • Move all items in transitive dependencies to new entity
    • Identify primary key in new entity
    • Place primary key for new entity as a foreign key on original entity
  • If there is no non-key attribute in table, it is already in 3NF
    BEFORE





    AFTER








    Boyce-Codd Normal Form (BCNF)
    • Table must be in 3NF
    • No inter-dependency among attributes of candidate keys
    BEFORE





    Table can have one of the 2 composite keys : 
    Here we have 2 dependencies :
    City, Street -> ZipCode
    ZipCode -> City

    So, dependency between attributes belongs to a key. It's not in BCNF.

    AFTER
    If we make 2 tables : 
       ZipCode, Street
       City , Street
    We will loose relation ZipCode -> City

    So, make 2 schemas : 







    Fourth Normal Form (4NF)
    • Table must be in BCNF.
    • Avoid independent may-to-one relationship between columns.
    • Table must not contain more than one multi-valued attribute.
    BEFORE









    Here, Primary key is { MovieName, City, MovieType }
    All columns are part of one & only candidate key, so it is in BCNF.
    But : 
    • Many movies can have same movie type
    • Many cities may screen the same movie

    So, it violates 4NF.

    Example 2.











    AFTER
    Split into 2 schemas :





















    Fifth Normal Form (5NF)
    • Table must be in 4NF.
    • All non-trivial join dependencies get eliminated.
    • 5NF is satisfied when all tables are broken into as many tables as possible in order to avoid redundancy.
      • Cannot be broken into smaller tables

    No comments:

    Post a Comment

    Note: only a member of this blog may post a comment.