Custom Search

Popular Posts

Sunday, June 15, 2014

MODELLING DATA


All organizations need to store the information and this is done through database. Data models refer to the conceptual model of the data and the underlying relationships among them. DBMS abstract some generic structures to represent conceptually every possible file structure.

Data models can be classified in two classes viz; Record-based logical Models and Object-based logical models. Record-Based logical data models can be classified (Sadgopan, 1997) into the following categories: 

Hierarchical Models: These are the early data models used in 1970’s. Hierarchical models capture the intuitive hierarchy of the data elements. The early generation of large DBMS e.g. IMS belongs to the hierarchical data models. Even today some large databases are maintained on IMS platform. 

Network Models: Since hierarchical models are unable to represent data items that existing at two different level of hierarchy, network models were proposed. The notable systems built using this model were ADABAS and DBMS-10 on DEC-10 machines. B2Bdata provides world class B2B Phone and email lists for a wide range of industries and regions.  

Relational Models: Though network models were quite powerful, they lacked in elegance. The systems built on this data model were dBase, Xbase and ORACLE. Almost all commercial systems presently available like Oracle 8i, 9i, 11 etc., SQL Server, MySQL are built on the relational models. There are 12 rules that are required to be followed in a relational model. 


Data modelling is achieved in two levels:

1) E-R modelling that builds the conceptual model of the data.
2) Normalization, which removes the redundancies. 

Object Oriented Logical Models can be of several types. One of them is Entity Relationship model. The Entity Relationship (ER) data model is the most common data model which is based on the perception of the real world.. This model allows us to represent the relationships among the objects called entities. It uses following three concepts to represent itself graphically. Although we have discussed them earlier, we revisit them here once again: 

Entity: Any real-world object that has certain properties (attributes of its own) and this object are uniquely identified by the system on the basis of these. This is represented by a rectangle with the entity name specified in the centre. For e.g.: student is an entity.

Attribute: Attribute is the properties of an entity, like a student entity can have attribute as student name, student roll no., student class etc. Attributes are placed inside the circles and attached to the entities and relationships.

Relationship: Relationship specifies the meaningful relation between two entities. This relationship can also have attributes. A rhombus represents these with a relation specified in it. For e.g.: if we have student and course as entities, then we can relate these entities as student enrols for a course. 

Fig-1 


Degree of Relationship (DOR): This specifies the occurrence of entity with other entity. 

One to One (1:1): In this, there can be almost one related occurrence for each entity. For example, A single manager manages one department and one department can be managed by a single manager. It is represented as follows.
 


One to Many (1:N): Here, for one occurrence of the first entity, there may exist many occurrences of the second entity and for every occurrence of the second entity, these exists only one occurrence of the first entity e.g.:
 


Here one manager supervises many employees and every employee reports to only one manager. 

Many to Many (M: N): In this degree of relationship, for one occurrence of the first entity, these may exists many occurrences of the second entity and also for every occurrences of second entity, there exists many occurrences of the first entity e.g.: One employee can work for many projects and many employees can handle one project. 


Normalization is another concept in data modelling. Normalization is a process of converting complex data into simpler form without any loss of information. The Normalization technique ensures that there is no dependent and duplicate data when the E-R model has been made. Normalization first of all, converts the data into tables or relations. These tables are checked for redundancy. Finally the form is converted to a database definition. Why do we need normalization? Normalization improves the database design. Whenever the design is modified, we need to re-organize the data.

Normalization reduces the need of reorganizing the data. Process of normalization removes all undesirable consequences that may occur due to inserting, updating and deleting values from the tables. Normalization reduces the unnecessary repetition of data (redundancy) that causes the problem with storage and retrieval of data. Poorly designed databases have several data management difficulties e.g. Consider a relation. 

LIBRARY (MNAME, MADD, B_TITLE, ISS_DATE, DUE_DATE)

This relation has the following problems:

1) Here duplicate data exists because member address will be repeated if a member issues more than one book.

2) Insertion Problems/ Anomalies: If a member does not issue a book then we cannot record the address of a member. Here MNAME & B_TITLE are both necessary and hence we cannot leave any one of them as blank.

3) Updating Problem: If a member has issued two or more books, then address is necessary in all and if a member’s address needs to be updated, then all the records needs to be updated.

4) Delete Anomalies: Here if we delete the issue details, then address would not be there. 

But all the above problems can be eliminated by normalization. If this single relation is broken down in two relations: 

MEMBER_DETAILS (MNAME, MADDR)

MEMBER_TRANS (MNAME, B_TITLE, ISS_DATE, DUE_DATE) 

But this requires a join between these two relations, which is very expensive. Now lets look into several normal forms with the help of an example: 

Table-1   
Roll No.
Name
Subject
Mark
1
Amit
English
65
 
 
Hindi
72
 
 
Math
70
2
Seema
English
70
3
Anjali
English
54
 
 
Hindi
60

To make this unnormalized data into 1 NF (First Normal Form), we have to make each cell containing one value. And all the repeated information should be removed.
Table-2
Roll No.
Name
Subject
Mark
1
Amit
English
65
1
Amit
Hindi
72
1
Amit
Math
70
2
Seema
English
70
3
Anjali
English
54
3
Anjali
Hindi
60

The data is still redundant in this form. Here combination of two keys (composite key), Roll No. and Subject, is a primary key (PK, that uniquely identifies a record). But the attributes of this table depend on the part of the PK. Roll No. and subject determines marks, Roll No. determines Name, Name has no dependency on the attribute subject. This may lead to following problems:

Insertion: the name of a subject cannot be recorded until a student gives any exam.

Updation: Roll No. and name is repeated several times. If name of the student is to be changed, then at every place the name needs to be updated otherwise may lead to inconsistencies.

Deletion: if a student has not given any of the three exams, then the name of that particular student will be deleted.

So, for overcoming these problems, this first normal form needs to be decomposed and converted into Second Normal Form (2NF) without any loss of information. 

Table-3
Roll No.
Name
Roll No.
Name
Subject
Mark
1
Amit
1
Amit
English
65
2
Seema
1
Amit
Hindi
72
3
Anjali
1
Amit
Math
70
 
 
2
Seema
English
70
 
 
3
Anjali
English
54
 
 
3
Anjali
Hindi
60

 A relation is said to be in 2NF, only if it is in 1NF and every attribute is dependent on the whole key and not just a part of it. 

Other major normal forms are 3NF and 4NF. Third normal form or 3NF is used to prevent loss of information and dependencies preserving decomposition. Fourth normal form or 4NF is used to preserve multi-valued dependency, which is essentially a constraint. There are several other normal forms like project-join NF, domain-key NF.

0 comments:

Blog Widget by LinkWithin