Database systems are complex in nature and we need to keep it as simple as possible. Data abstraction is the best way to keep a database system easy to manage by breaking it into levels and abstract (hide) irrelevant details at each level.
This helps us to focus on a specific level only and easy to manage the whole database system.
What is Data abstraction in DBMS?
The process of hiding irrelevant information at each level of a database is known as data abstraction.
The type of information that is relevant and irrelevant depends upon the level itself which we will see later in the post.
Data abstraction in DBMS is very helpful in dealing with the complex database system because it breaks the problem into subproblems which makes it easy to manage.
Example: We use Google daily but we have no ideas of its data storage. The information like how and where Google stores its data is irrelevant for us that's why the information is hidden from us. This is known as data abstraction.
Three levels of Data abstraction
There are three levels of data abstraction in DBMS which reduce the complexity of the database and also provide data independence at each level.
This is the first or lowest level of abstraction which describes how a record is actually stored in the system memory. It is a low-level representation of the database.
Physical level deals with the storage of the data for the whole database system.
The Database Administrator (DBA) manages the physical level. DBA decides certain things like the drive where the data will be actually stored in the system and whether the storage will be centralized or decentralized.
The physical level of abstraction actually contains the database storage files and binary files which is the actual storage of the database system. It depends on the hardware and OS of the system.
The access modes like sequential or random access, file organisation methods like B+ tree and indexing and hashing are implemented at this level.
This is the second level of abstraction in DBMS. It describes the data stored in the database and relationship among them.
The logical level contains the data that is actually stored in the database. It defines the overall structure of the database and relationships between the data.
In simple words, we create the blueprint of the database at the logical level.
Example: Take the example of the university database. We need to store data about teachers and students. But what data we are going to store? What are their types? How they will be related to each other?
At the logical level, we will define all of them. Take the table of teachers that contains
SALARY and table of students that contains
PROJECT_GUIDE and so on. The project guide will only contain the entry present in
TEACHER_ID. Here we define the structure of the database and relationships among the data.
This is the last level of abstraction in DBMS. It is intended for final users.
The application program (which general users use) tries to view that data according to the user role. We hide the data from a view that is irrelevant to them. This is easier to understand with an example.
Example: Students only need to view their score, courses, attendance and other details that are relevant fo them. Students cannot view the teacher's salary because the data is irrelevant to them.
But teachers can view each and every detail of the students as well as their own data.
Here we create two separate views. One for the students and the other one for the teachers with the appropriate set of data.
You might ask Why we need View when we can use SELECT in this case?
What is Data Independence?
Data independence is an advantage of levels of abstraction in the database management system.
It means the data at different levels are independent of each other. It means that if we modify the data at any level then other levels will not be affected and will work fine as expected.
This is important because it removes the complexity of manipulating the data at one level and fix on the other two levels.
Types of data independence
There are two types of data independence which separated all the three levels of data from each other.
The physical level of data independence
It refers to the ability to modify physical level of DBMS without affecting the logical level.
We modify physical level of the database for performance reasons.
Suppose that we created a database with 10,000 records in mind but the database is growing bigger. In this case we need to upgrade the physical level and optimize it for performance.
Here are a few common changes that we do on the physical level to optimize it.
- Change the operating system
- Change the file system
- Change the data structure to store data
- Change the storage device
The physical level of DBMS will change but have no effect on the logical level because they are separate.
Logical level of data independence
It refers to the ability to modify the logical level without affecting the view or application level.
This feature is hard to implement because if we modify the logical level then the view level will be affected because it rounds query dependent on local schema of database.
However, there are cases where it is useful like when we add new columns, rows or tables in an existing database the view level will work fine because queries will successfully deal with the data.
The database is divided into three levels of abstraction.
- Physical level: Describes how a record is actually stored into the database.
- Logical level: Descries the actual data stored in the database.
- View level: Divide the data into multiple groups (View) as per the need of application program.
The advantages of dividing the database into three levels of abstraction.
- Reduce complexity
- The customized view according to the requirements
- Maintain security
- Provide data independence.
Types of database independence.
- The physical level of independence in DBMS: Means the physical level is separated from the logical level.
- Logical level of independence in DBMS: It means the changes at the logical level in data will not affect view level.