Normalization in DBMS: 1NF, 2NF, 3NF and BCNF Example

In relational database management system design, Normalization is a technique in which the process of organizing data to minimize redundancy. 

Normalization  usually involves dividing a database into two or more tables and defining relationships between the tables. In other words Normalization is a database design technique which organizes tables in a manner that reduces redundancy and dependency of data. It divides larger tables to smaller tables and links them using relationships.

If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator. Managing a database with anomalies is next to impossible.Managing a database with anomalies is next to impossible.

Update anomalies: If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.

Deletion anomalies: We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved somewhere else.

Insert anomalies: We tried to insert data in a record that does not exist at all.

Normalization is a method to remove all these anomalies and bring the database to a consistent state.

database-normalization

Why Database Normalization is important ?

Normalization basically affords you to store data with little duplication - how much depends on the actual degree of normalization.Duplication makes the data difficult to maintain without additional work, like creating triggers or manual conventions to store data in lock steps. Normalization prevents anomalies. Anomalies are nothing but problems that can occur in poorly planned or un-normalized databases.

There are three types of anomalies; update, deletion and insertion anomalies. These anomalies are caused by having duplicate values in a table, or having no enforcement for uniqueness and referential integrity.Data without normalization contains duplicate values, which leads to anomalies.

According to E. F. Codd the objectives of normalization were stated as follows:

  1. To free the collection of relations from undesirable insertion, update and deletion dependencies.
  2. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs.
  3. To make the relational model more informative to users.
  4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

As of now there are total 8 normal forms, but to keep our data consistent & non-redundant the first 3 Normal Formsare sufficient.

Types of Database Normalization

  1. First Normal Form (1 NF)
  2. Second Normal Form (2 NF)
  3. Third Normal Form (3 NF)
  4. Boyce-Codd Normal Form (3.5 NF)
  5. Fourth Normal Form (4 NF)
  6. Fifth Normal Form (5 NF)

In this Database management system tutorial i will discuss about First Normal Form (1 NF) , Second Normal Form (2 NF) and Third Normal Form (3 NF) .

1NF (First Normal Form) Rules

  1. There are no repeating or duplicate fields
  2. Each cell contain only a single value
  3. Each record is unique

The below table is not 1NF- 

not-in-first-normal-form

After completing database normalization (1 NF) we will get two table which are look like

first-normal-form

 

2NF (Second Normal Form) 

Before we learn about the second normal form, we need to understand the following:

  1. Prime attribute: An attribute, which is a part of the prime-key, is known as a prime attribute.
  2. Non-prime attribute: An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.

If we follow second normal form, then every non prime attribute should be fully functionally dependent on prime key attribute. That is, if X → A holds, then there should not be any proper subset Y of X for which Y → A also holds true.

second-normal-form-example

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e., Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key attribute individually.

But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which is not allowed in Second Normal Form.

example-normalization

We broke the relation in two as depicted in the above picture. So there exists no partial dependency.

2NF (Second Normal Form) Rules

  1. Be in 1NF
  2. All non-key fields depend on all components of the primary key .
  3. Guaranteed when primary key is a single field.

Another example - the below table is not 2NF- 

not-in-second-normal-form

After completing database normalization (2 NF) we will get two table which are look like

second-normal-form.png

3NF (Third Normal Form) 

For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy:

  1. No non-key field depends upon another
  2. All non-key fields depend only on the primary key .
  3. Other words ; There can be no independencies among non-key attribute
  4. Has no transitive functional dependencies
  5. Be in 2NF.
  6. No non-prime attribute is transitively dependent on prime key attribute.
  7. For any non-trivial functional dependency, X → A, then either: o X is a superkey or, o A is prime attribute.

Now question is what is primary key ?

A primary is a single column value used to identify a database record uniquely. It has following attributes 

  1. A primary key cannot be NULL
  2. A primary key value must be unique
  3. The primary key values cannot be changed
  4. The primary key must be given a value when a new record is inserted.

not-3nf

We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

To bring this relation into third normal form, we break the relation into two relations as follows:

3nf-normalization

Another example - the below table is not 3NF- 

not-in-third-normal-form

After completing database normalization (3 NF) we will get two table which are look like

third-normal-form

If you would like to know more about that , you can visit click here . 

Codechief is a very fast growing community among programmers and have a reach of around 1 million+ readers globally. Contribution at Codechief is open for all those who have a passion to learn and help others by sharing their knowledge. If you think you have the zeal to learn, start contributing on Codechief contribute. you can also mail your article to [email protected] See your article appearing on the codechief main page and help other code.

We believe that everyone has the right to learn, so we allow both students and professionals to contribute on Codechief.Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

A web enthusiastic, self-motivated Full-Stack Web Developer from Dhaka, Bangladesh with experience in developing applications using JavaScript, Laravel & Wordpress specifically. Facebook Github Website