DBMS Interview Questions | Eklavya Online

DBMS Interview Questions

Introduction

Data is meaningful information.
Database is a collection of relevant data.
DBMS means database management system.
DBMS provides the software to manage the database.
Following are the operations to be performed on database: insertion, deletion,
updation, sorting, searching, traversing, etc.

Following are the different types of DBMS:

FMS, Hierarchical, DBMS

FMS (File Management System)

It is simple to create but difficult to manage.
No relations are there (like 1-to-1, 1-to-many)

Hierarchical System

Here data is stored in tree like structure.
It exhibits / exists 1-to-many relationship only.

Network DBMS

It exhibits all relationships. It is quite fast to manage the data.
Limitation: As size increases, it becomes very complicated to manage.

RDBMS

It exhibits all relationships. It is very easy to manage the data.
In RDBMS data is stored in tabular form.
Tables have rows & columns.
Rows represent records and column represents fields.
To access the data in RDBMS “SQL (Structured Query Language)” is used.

Limitations of RDBMS:

1. It is slow as compare to network RDBMS.
2. Redundancy of data to establish the relationship between tables.

Different types of RDBMS:

ORACLE, SQL Server, Sybase, DB2, MySQL, etc.

SQL queries are divided into three parts:

1. DDL: create, alter, rename, etc.
2. DML: insert, update, delete, truncate, etc.
3. Data retrieval: Select Queries.

Database provides a systematic and organized way of storing, managing and retrieving from collection of logically related information.

Secondly the information has to be persistent, that means even after the application is closed the information should be persisted.

Finally it should provide an independent way of accessing data and should not be dependent on the application to access the information.

Main difference between a simple file and database that database has independent way (SQL) of accessing information while simple files do not File meets the storing, managing and retrieving part of a database but not the independent way of accessing data. Many experienced programmers think that the main difference is that file can not provide multi-user capabilities which a DBMS provides. But if we look at some old COBOL and C programs where file where the only means of storing data, we can see functionalities like locking, multi-user etc provided very efficiently. So it’s a matter of debate if some interviewers think this as a main difference between files and database accept it… going in to debate is probably loosing a job.

There is set of rules that have been established to aid in the design of tables that are meant to be connected through relationships. This set of rules is known as Normalization.

Benefits of normalizing your database include:

  • Avoiding repetitive entries
  • Reducing required storage space
  • Preventing the need to restructure existing tables to accommodate new data.
  • Increased speed and flexibility of queries, sorts, and summaries.

Following are the three normal forms:

First Normal Form

For a table to be in first normal form, data must be broken up into the smallest un possible.In addition to breaking data up into the smallest meaningful values, tables first normal form should not contain repetitions groups of fields.

Second Normal form

The second normal form states that each field in a multiple field primary keytable must be directly related to the entire primary key. Or in other words,each non-key field should be a fact about all the fields in the primary key.

Third normal form

A non-key field should not depend on other Non-key field.

LIKE operator is used to match patterns. A “%” sign is used to define the pattern.

Below SQL statement will return all words with letter “S”

SELECT * FROM pcdsEmployee WHERE EmpName LIKE ‘S%’

Below SQL statement will return all words which end with letter “S”

SELECT * FROM pcdsEmployee WHERE EmpName LIKE ‘%S’

Below SQL statement will return all words having letter “S” in between

SELECT * FROM pcdsEmployee WHERE EmpName LIKE ‘%S%’

 

“_” operator (we can read as “Underscore Operator”). “_” operator is the character defined at that point. In the below sample fired a query Select name from pcdsEmployee where name like ‘_s%’ So all name where second letter is “s” is returned.

If you want to find out the number of descendants for a node, all you need is the left_val and right_val of the node for which you want to find the  descendants  count.

The formula is

No. of descendants = (right_val – left_val -1) /2

So,  for 6 -11 Amanda, (11 – 6 – 1) /2 =  2 descendants

for 1-12  Peter, (12 – 1 -1 ) / 2 = 5 descendants.

for 3-4   Mary, (4 -3 – 1) / 2 =  0, means it is a child and has no descendants.

The modified preorder traversal is a little more complicated to understand, but is very useful.

INNER JOIN

Inner join shows matches only when they exist in both tables. Example in the below SQL there are two tables Customers and Orders and the inner join in made on Customers.Customerid and Orders.Customerid. So this SQL will only give you result with customers who have orders. If the customer does not have order it will not display that record.

SELECT Customers.*, Orders.* FROM Customers INNER JOIN Orders ON Customers.CustomerID =Orders.CustomerID

LEFT OUTER JOIN

Left join will display all records in left table of the SQL statement. In SQL below customers with or without orders will be displayed. Order data for customers without orders appears as NULL values. For example, you want to determine the amount ordered by each customer and you need to see who has not ordered anything as well. You can also see the LEFT OUTER JOIN as a mirror image of the RIGHT OUTER JOIN (Is covered in the next section) if you switch the side of each table.

SELECT Customers.*, Orders.* FROM Customers LEFT OUTER JOIN Orders ON Customers.CustomerID =Orders.CustomerID

RIGHT OUTER JOIN

Right join will display all records in right table of the SQL statement. In SQL below all orders with or without matching customer records will be displayed. Customer data for orders without customers appears as NULL values. For example, you want to determine if there are any orders in the data with undefined CustomerID values (say, after a conversion or something like it). You can also see the RIGHT OUTER JOIN as a mirror image of the LEFT OUTER JOIN if you switch the side of each table.

SELECT Customers.*, Orders.* FROM Customers RIGHT OUTER JOIN Orders ON Customers.CustomerID=Orders.CustomerID

Star schema is good when you do not have big tables in data warehousing. But when tables start becoming really huge it is better to denormalize. When you denormalize star schema it is nothing but snow flake design. For instance below customeraddress table is been normalized and is a child table of Customer table. Same holds true for Salesperson table.

UNION SQL syntax is used to select information from two tables. But it selects only distinct records from both the table. , while UNION ALL selects all records from both the tables.

There are basically two types of indexes:-

  • Clustered Indexes.
  • Non-Clustered Indexes.

In clustered index the non-leaf level actually points to the actual data.In Non-Clustered index the leaf nodes point to pointers (they are rowid’s) which then point to actual data.

SELECT INTO statement is used mostly to create backups. The below SQL backsup the Employee table in to the EmployeeBackUp table. One point to be noted is that the structure of pcdsEmployeeBackup and pcdsEmployee table should be same. SELECT * INTO pcdsEmployeeBackup FROM pcdsEmployee

SQL stands for Structured Query Language.SQL is an ANSI (American National Standards Institute) standard computer language for accessing and manipulating database systems. SQL statements are used to retrieve and update data in a database.

Denormalization is the process of putting one fact in numerous places (its vice-versa of normalization).Only one valid reason exists for denormalizing a relational design – to enhance performance.The sacrifice to performance is that you increase redundancy in database.

Insert statement is used to insert new rows in to table. Update to update existing data in the table. Delete statement to delete a record from the table. Below code snippet for Insert, Update and Delete :-

  • INSERT INTO pcdsEmployee SET name=’rohit’,age=’24’;
  • UPDATE pcdsEmployee SET age=’25’ where name=’rohit’;
  • DELETE FROM pcdsEmployee WHERE name = ‘sonia’;

View is a virtual table which is created on the basis of the result set returned by the select statement.

CREATE VIEW [MyView] AS SELECT * from pcdsEmployee where LastName = ‘singh’

In order to query the view

SELECT * FROM [MyView]

CROSS JOIN” or “CARTESIAN PRODUCT” combines all rows from both tables. Number of rows will be product of the number of rows in each table. In real life scenario I can not imagine where we will want to use a Cartesian product. But there are scenarios where we would like permutation and combination probably Cartesian would be the easiest way to achieve it.

ETL (Extraction, Transformation and Loading) are different stages in Data warehousing. Like when we do software development we follow different stages like requirement gathering, designing, coding and testing. In the similar fashion we have for data warehousing.

Extraction:-

In this process we extract data from the source. In actual scenarios data source can be in many forms EXCEL, ACCESS, Delimited text, CSV (Comma Separated Files) etc. So extraction process handle’s the complexity of understanding the data source and loading it in a structure of data warehouse.

Transformation:-

This process can also be called as cleaning up process. It’s not necessary that after the extraction process data is clean and valid. For instance all the financial figures have NULL values but you want it to be ZERO for better analysis. So you can have some kind of stored procedure which runs through all extracted records and sets the value to zero.

Loading:-

After transformation you are ready to load the information in to your final data warehouse database.

SQL statements are good for set at a time operation. So it is good at handling set of data. But there are scenarios where we want to update row depending on certain criteria. we will loop through all rows and update data accordingly. There’s where cursors come in to picture.

The hierarchical  data is an example of the composite design pattern. The entity relationship diagrams (aka ERdiagram) are used to represent logical and physical relationships between the database tables. The diagram below shows how the table can be designed to store tree data by maintaining the adjacency information via superior_emp_id.

 

As you can see the “superior_emp_id” is a foreign key that points to the emp_id in the same table. So, Peter has null as he has no superiors. John and Amanda points to  Peter who is their manager or superior and so on.

The above table can be created using SQL DDL (Data Definition Language) as shown below.

CREATE TABLE employee

(

emp_id          NUMBER (4) CONSTRAINT emp_pk PRIMARY KEY,

emp_name        VARCHAR2 (40) NOT NULL,

title           VARCHAR2 (40),

dept_id         NUMBER (2) NOT NULL,

superior_emp_id NUMBER (4) CONSTRAINT emp_fk REFERENCES employee(emp_id)

CONSTRAINT emp_pk

PRIMARY KEY NONCLUSTERED (emp_id)

)

This can be represented as an object model  to map relational data as shown below

public class Employee

{

private Long id;

private String name;

private String title;

private Employee superior;

private Set subordinates;

//getters and setters are omitted

}

DBMS provides a systematic and organized way of storing, managing and retrieving from collection of logically related information. RDBMS also provides what DBMS provides but above that it provides relationship integrity. So in short we can say:

RDBMS = DBMS + REFERENTIAL INTEGRITY

These relations are defined by using “Foreign Keys” in any RDBMS.Many DBMS companies claimed there DBMS product was a RDBMS compliant, but according to industry rules and regulations if the DBMS fulfills the twelve CODD rules it’s truly a RDBMS. Almost all DBMS (SQL SERVER, ORACLE etc) fulfills all the twelve CODD rules and are considered as truly RDBMS.

In fourth normal form it should not contain two or more independent multi-v about an entity and it should satisfy “Third Normal form”.

Fifth normal form deals with reconstructing information from smaller pieces of information. These smaller pieces of information can be maintained with less redundancy.

  • ORDER BY clause helps to sort the data in either ascending order to descending order.
  • Ascending order sort query:

SELECT name,age FROM pcdsEmployee ORDER BY age ASC

  • Descending order sort query

SELECT name FROM pcdsEmployee ORDER BY age DESC

It is a Form of attack on a database-driven Web site in which the attacker executes unauthorized SQL commands by taking advantage of insecure code on a system connected to the Internet, bypassing the firewall. SQL injection attacks are used to steal information from a database from which the data would normally not be available and/or to gain access to an organization’s host computers through the computer that is hosting the database.

SQL injection attacks typically are easy to avoid by ensuring that a system has strong input validation.

As name suggest we inject SQL which can be relatively dangerous for the database. Example this is a simple SQL

SELECT email, passwd, login_id, full_name

FROM members WHERE email = ‘x’

Now somebody does not put “x” as the input but puts “x ; DROP TABLE members;”.

So the actual SQL which will execute is :-

SELECT email, passwd, login_id, full_name FROM members WHERE email = ‘x’ ; DROP TABLE members;

Think what will happen to your database.

Select top 1 * from sales.salesperson

Data mining is a concept by which we can analyze the current data from different perspectives and summarize the information in more useful manner. It’s mostly used either to derive some valuable information from the existing data or to predict sales to increase customer market.

There are two basic aims of Data mining:-

  • Prediction: –

From the given data we can focus on how the customer or market will perform. For instance we are having a sale of 40000 $ per month in India, if the same product is to be sold with a discount how much sales can the company expect.

  • Summarization: –

To derive important information to analyze the current business scenario. For example a weekly sales report will give a picture to the top management how we are performing on a weekly basis?

Group by” clause group similar data so that aggregate values can be derived.

You can use a self-join to find the manager of an employee whose emp_id is 3

Select e.emp_id,e.emp_name, title

Fromemployee e, employee s

where e.superior_emp_id = s.employee_idand e.emp_id = 3

This should return: 1, Peter, CIO

SQL IN operator is used to see if the value exists in a group of values. For instance the below SQL checks if the Name is either ‘rohit’ or ‘Anuradha’ SELECT * FROM pcdsEmployee WHERE name IN (‘Rohit’,’Anuradha’) Also you can specify a not clause with the same. SELECT * FROM pcdsEmployee WHERE age NOT IN (17,16).

In 1969 Dr. E. F. Codd laid down some 12 rules which a DBMS should adhere in order to get the logo of a true RDBMS.

Rule 1: Information Rule.

“All information in a relational data base is represented explicitly at the logical level and in exactly one way – by values in tables.”

Rule 2: Guaranteed access Rule.

“Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.”

In flat files we have to parse and know exact location of field values. But if a DBMS is truly RDBMS you can access the value by specifying the table name, field name, for instance Customers.Fields [‘Customer Name’].

Rule 3: Systematic treatment of null values.

“Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.”.

Rule 4: Dynamic on-line catalog based on the relational model.

“The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.”The Data Dictionary is held within the RDBMS, thus there is no-need for off-line volumes to tell you the structure of the database.

Rule 5: Comprehensive data sub-language Rule.

“A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items

  • Data Definition
  • View Definition
  • Data Manipulation (Interactive and by program).
  • Integrity Constraints
  •  
  • Transaction boundaries ( Begin , commit and rollback)

Rule 6: .View updating Rule

“All views that are theoretically updatable are also updatable by the system.”

Rule 7: High-level insert, update and delete.

“The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data.”

Rule 8: Physical data independence.

“Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.”

Rule 9: Logical data independence.

“Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit un-impairment are made to the base tables.”

Rule 10: Integrity independence.

“Integrity constraints specific to a particular relational data base must be definable in the relational data sub-language and storable in the catalog, not in the application programs.”

Rule 11: Distribution independence.

“A relational DBMS has distribution independence.”

Rule 12: Non-subversion Rule.

“If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity Rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).”

If we want relational system in conjunction with time we use sixth normal form. At this moment SQL Server does not supports it directly.

Data Warehousing is a process in which the data is stored and accessed from central location and is meant to support some strategic decisions. Data Warehousing is not a requirement for Data mining. But just makes your Data mining process more efficient.

Data warehouse is a collection of integrated, subject-oriented databases designed to support the decision-support functions (DSF), where each unit of data is relevant to some moment in time.

Data Warehousing” is technical process where we are making our data centralized while “Data mining” is more of business activity which will analyze how good your business is doing or predict how it will do in the future coming times using the current data. As said before “Data Warehousing” is not a need for “Data mining”. It’s good if you are doing “Data mining” on a “Data Warehouse” rather than on an actual production database. “Data Warehousing” is essential when we want to consolidate data from different sources, so it’s like a cleaner and matured data which sits in between the various data sources and brings then in to one format. “Data Warehouses” are normally physical entities which are meant to improve accuracy of “Data mining” process. For example you have 10 companies sending data in different format, so you create one physical database for consolidating all the data from different company sources, while “Data mining” can be a physical model or logical model. You can create a database in “Data mining” which gives you reports of net sales for this year for all companies. This need not be a physical database as such but a simple query.

“HAVING” clause is used to specify filtering criteria for “GROUP BY”, while “WHERE” clause applies on normal SQL.

Yes, it can be done using the “modified preorder tree traversal” as described below.As shown in the previous diagram above, each node is marked with a left and right numbers using a modified preorder traversalas shown above. This can be represented in a database table as shown below.

Below SQL selects employees born between ’01/01/1975′ AND ’01/01/1978′ as per mysql

SELECT * FROM pcdsEmployee WHERE DOB BETWEEN ‘1975-01-01’ AND ‘2011-09-28’

E-R

diagram also termed as Entity-Relationship diagram shows relationship between various tables in the database. .

DML stands for Data Manipulation Statements. They update data values in table. Below are the most important DDL statements:-

  • SELECT – gets data from a database table
  • UPDATE – updates data in a table
  • DELETE – deletes data from a database table
  • INSERT INTO – inserts new data into a database table

DDL stands for Data definition Language. They change structure of the database objects like table, index etc. Most important DDL statements are as shown below:-

  • CREATE TABLE – creates a new table in the database.
  • ALTER TABLE – changes table structure in database.
  • DROP TABLE – deletes a table from database
  • CREATE INDEX – creates an index
  • DROP INDEX – deletes an index

Data Marts are smaller section of Data Warehouses. They help data warehouses collect data. For example your company has lot of branches which are spanned across the globe. Head-office of the company decides to collect data from all these branches for anticipating market. So to achieve this IT department can setup data mart in all branch offices and a central data warehouse where all data will finally reside.

If we want to join two instances of the same table we can use self-join.

Index makes your search faster. So defining indexes to your database will make your search faster.Most of the indexing fundamentals use “B-Tree” or “Balanced-Tree” principle. It’s not a principle that is something is created by SQL Server or ORACLE but is a mathematical derived fundamental.In order that “B-tree” fundamental work properly both of the sides should be balanced.

A query nested inside a SELECT statement is known as a subquery and is an alternative to complex join statements. A subquery combines data from multiple tables and returns results that are inserted into the WHERE condition of the main query. A subquery is always enclosed within parentheses and returns a column. A subquery can also be referred to as an inner query and the main query as an outer query. JOIN gives better performance than a subquery when you have to check for the existence of records.

For example, to retrieve all EmployeeID and CustomerID records from the ORDERS table that have the EmployeeID greater than the average of the EmployeeID field, you can create a nested query, as shown:

SELECT DISTINCT EmployeeID, CustomerID FROM ORDERS WHERE EmployeeID > (SELECT AVG(EmployeeID) FROM ORDERS)

DISTINCT keyword is used to return only distinct values.

Below is syntax:- Column age and Table pcdsEmp

SELECT DISTINCT age FROM pcdsEmp

SELECT * FROM employee WHERE left_val < 7 and right_val > 8 WHERE ORDER BY left_val ASC;

Below Sql Query find the second highest salary

SELECT * FROM pcdsEmployeeSalary a WHERE (2=(SELECT COUNT(DISTINCT(b.salary)) FROM pcdsEmployeeSalary b WHERE b.salary>=a.salary))

When we design transactional database we always think in terms of normalizing design to its least form. But when it comes to designing for Data warehouse we think more in terms of denormalizing the database. Data warehousing databases are designed using Dimensional Modeling. Dimensional Modeling uses the existing relational database structure and builds on that.

There are two basic tables in dimensional modeling:-

  • Fact Tables.
  • Dimension Tables.

Fact tables are central tables in data warehousing. Fact tables have the actual aggregate values which will be needed in a business process. While dimension tables revolve around fact tables. They describe the attributes of the fact tables

Following are difference between them:

  • DELETE TABLE syntax logs the deletes thus making the delete operations low. TRUNCATE table does not log any information but it logs information about deallocation of data page of the table. So TRUNCATE table is faster as compared to delete table.
  • DELETE table can have criteria while TRUNCATE can not.
  • TRUNCATE table can not have triggers.

Insert’s are slower on tables which have indexes, justify it?or Why do page splitting happen?

All indexing fundamentals in database use “B-tree” fundamental. Now whenever there is new data inserted or deleted the tree tries to become unbalance.

Creates a new page to balance the tree.Shuffle and move the data to pages.

So if your table is having heavy inserts that means it’s transactional, then you can visualize the amount of splits it will be doing. This will not only increase insert time but will also upset the end-user who is sitting on the screen. So when you forecast that a table has lot of inserts it’s not a good idea to create indexes.

Aggregate and Scalar functions are in built function for counting and calculations.

Aggregate functions operate against a group of values but returns only one value.

  • AVG(column) :- Returns the average value of a column
  • COUNT(column) :- Returns the number of rows (without a NULL value) of a column
  • COUNT(*) :- Returns the number of selected rows
  • MAX(column) :- Returns the highest value of a column
  • MIN(column) :- Returns the lowest value of a column

Scalar functions operate against a single value and return value on basis of the single value.

  • UCASE(c) :- Converts a field to upper case
  • LCASE(c) :- Converts a field to lower case
  • MID(c,start[,end]) :- Extract characters from a text field
  • LEN(c) :- Returns the length of a text

The collection of database and DBMS software together is known as a database system. Through the database system, we can perform many activities such as-

The data can be stored in the database with ease, and there are no issues of data redundancy and data inconsistency.

The data will be extracted from the database using DBMS software whenever required. So, the combination of database and DBMS software enables one to store, retrieve and access data with considerate accuracy and security.

Denormalization is the process of boosting up database performance and adding of redundant data which helps to get rid of complex data. Denormalization is a part of database optimization technique. This process is used to avoid the use of complex and costly joins. Denormalization doesn’t refer to the thought of not to normalize instead of that denormalization takes place after normalization. In this process, firstly the redundancy of the data will be removed using normalization process than through denormalization process we will add redundant data as per the requirement so that we can easily avoid the costly joins.

An extension of an entity type is specified as a collection of entities of a particular entity type that are grouped into an entity set.

BCNF stands for Boyce-Codd Normal Form. It is an advanced version of 3NF, so it is also referred to as 3.5NF. BCNF is stricter than 3NF.

A table complies with BCNF if it satisfies the following conditions:

  • It is in 3NF.
  • For every functional dependency X->Y, X should be the super key of the table. It merely means that X cannot be a non-prime attribute if Y is a prime attribute.

The Relationship is defined as an association among two or more entities. There are three type of relationships in DBMS-

One-To-One: Here one record of any object can be related to one record of another object.

One-To-Many (many-to-one): Here one record of any object can be related to many records of other object and vice versa.

Many-to-many: Here more than one records of an object can be related to n number of records of another object.

You have to use Structured Query Language (SQL) to communicate with the RDBMS. Using queries of SQL, we can give the input to the database and then after processing of the queries database will provide us the required output.

The DML Compiler translates DML statements in a query language that the query evaluation engine can understand. DML Compiler is required because the DML is the family of syntax element which is very similar to the other programming language which requires compilation. So, it is essential to compile the code in the language which query evaluation engine can understand and then work on those queries with proper output.

Data independence specifies that “the application is independent of the storage structure and access strategy of data.” It makes you able to modify the schema definition at one level without altering the schema definition in the next higher level.

It makes you able to modify the schema definition in one level should not affect the schema definition in the next higher level.

There are two types of Data Independence:

Physical Data Independence: Physical data is the data stored in the database. It is in the bit-format. Modification in physical level should not affect the logical level.

For example: If we want to manipulate the data inside any table that should not change the format of the table.

Logical Data Independence: Logical data in the data about the database. It basically defines the structure. Such as tables stored in the database. Modification in logical level should not affect the view level.

For example: If we need to modify the format of any table, that modification should not affect the data inside it.

PROJECTION and SELECTION are the unary operations in relational algebra. Unary operations are those operations which use single operands. Unary operations are SELECTION, PROJECTION, and RENAME.

As in SELECTION relational operators are used for example – =,=, etc.

  • Redundancy control
  • Restriction for unauthorized access
  • Provides multiple user interfaces
  • Provides backup and recovery
  • Enforces integrity constraints
  • Ensure data consistency
  • Easy accessibility
  • Easy data extraction and data processing due to the use of queries

Functional Dependency is the starting point of normalization. It exists when a relation between two attributes allow you to determine the corresponding attribute’s value uniquely. The functional dependency is also known as database dependency and defines as the relationship which occurs when one attribute in a relation uniquely determines another attribute. It is written as A->B which means B is functionally dependent on A.

ACID properties are some basic rules, which has to be satisfied by every transaction to preserve the integrity. These properties and rules are:

ATOMICITY: Atomicity is more generally known as ?all or nothing rule.’ Which implies all are considered as one unit, and they either run to completion or not executed at all.

CONSISTENCY: This property refers to the uniformity of the data. Consistency implies that the database is consistent before and after the transaction.

ISOLATION: This property states that the number of the transaction can be executed concurrently without leading to the inconsistency of the database state.

DURABILITY: This property ensures that once the transaction is committed it will be stored in the non-volatile memory and system crash can also not affect it anymore.

  • Inconsistent
  • Not secure
  • Data redundancy
  • Difficult in accessing data
  • Data isolation
  • Data integrity
  • Concurrent access is not possible
  • Limited data sharing
  • Atomicity problem

Shared lock: Shared lock is required for reading a data item. In the shared lock, many transactions may hold a lock on the same data item. When more than one transaction is allowed to read the data items then that is known as the shared lock.

Exclusive lock: When any transaction is about to perform the write operation, then the lock on the data item is an exclusive lock. Because, if we allow more than one transaction then that will lead to the inconsistency in the database.

Relational Algebra is a Procedural Query Language which contains a set of operations that take one or two relations as input and produce a new relationship. Relational algebra is the basic set of operations for the relational model. The decisive point of relational algebra is that it is similar to the algebra which operates on the number.

There are few fundamental operations of relational algebra:

  • select
  • project
  • set difference
  • union
  • rename,etc.

An entity set that doesn’t have sufficient attributes to form a primary key is referred to as a weak entity set. The member of a weak entity set is known as a subordinate entity. Weak entity set does not have a primary key, but we need a mean to differentiate among all those entries in the entity set that depend on one particular strong entity set.

Following are three levels of data abstraction:

Physical level: It is the lowest level of abstraction. It describes how data are stored.

Logical level: It is the next higher level of abstraction. It describes what data are stored in the database and what relationship among those data.

View level: It is the highest level of data abstraction. It describes only part of the entire database.

For example- User interact with the system using the GUI and fill the required details, but the user doesn’t have any idea how the data is being used. So, the abstraction level is absolutely high in VIEW LEVEL.

Then, the next level is for PROGRAMMERS as in this level the fields and records are visible and the programmer has the knowledge of this layer. So, the level of abstraction here is a little low in VIEW LEVEL.

And lastly, physical level in which storage blocks are described.

RDBMS stands for Relational Database Management Systems. It is used to maintain the data records and indices in tables. RDBMS is the form of DBMS which uses the structure to identify and access data concerning the other piece of data in the database. RDBMS is the system that enables you to perform different operations such as- update, insert, delete, manipulate and administer a relational database with minimal difficulties. Most of the time RDBMS use SQL language because it is easily understandable and is used for often.

The Checkpoint is a type of mechanism where all the previous logs are removed from the system and permanently stored in the storage disk.

There are two ways which can help the DBMS in recovering and maintaining the ACID properties, and they are- maintaining the log of each transaction and maintaining shadow pages. So, when it comes to log based recovery system, checkpoints come into existence. Checkpoints are those points to which the database engine can recover after a crash as a specified minimal point from where the transaction log record can be used to recover all the committed data up to the point of the crash.

There are four types of database languages:

Data Definition Language (DDL) e.g., CREATE, ALTER, DROP, TRUNCATE, RENAME, etc. All these commands are used for updating the data that?s why they are known as Data Definition Language.
Data Manipulation Language (DML) e.g., SELECT, UPDATE, INSERT, DELETE, etc. These commands are used for the manipulation of already updated data that’s why they are the part of Data Manipulation Language.
DATA Control Language (DCL) e.g., GRANT and REVOKE. These commands are used for giving and removing the user access on the database. So, they are the part of Data Control Language.
Transaction Control Language (TCL) e.g., COMMIT, ROLLBACK, and SAVEPOINT. These are the commands used for managing transactions in the database. TCL is used for managing the changes made by DML.
Database language implies the queries that are used for the update, modify and manipulate the data.

Data abstraction in DBMS is a process of hiding irrelevant details from users. Because database systems are made of complex data structures so, it makes accessible the user interaction with the database.

For example: We know that most of the users prefer those systems which have a simple GUI that means no complex processing. So, to keep the user tuned and for making the access to the data easy, it is necessary to do data abstraction. In addition to it, data abstraction divides the system in different layers to make the work specified and well defined.

E-R model is a short name for the Entity-Relationship model. This model is based on the real world. It contains necessary objects (known as entities) and the relationship among these objects. Here the primary objects are the entity, attribute of that entity, relationship set, an attribute of that relationship set can be mapped in the form of E-R diagram.

In E-R diagram, entities are represented by rectangles, relationships are represented by diamonds, attributes are the characteristics of entities and represented by ellipses, and data flow is represented through a straight line.

An attribute refers to a database component. It is used to describe the property of an entity. An attribute can be defined as the characteristics of the entity. Entities can be uniquely identified using the attributes. Attributes represent the instances in the row of the database.

For example: If a student is an entity in the table then age will be the attribute of that student.

There are following types of keys:

Primary key: The Primary key is an attribute in a table that can uniquely identify each record in a table. It is compulsory for every table.

Candidate key: The Candidate key is an attribute or set of an attribute which can uniquely identify a tuple. The Primary key can be selected from these attributes.

Super key: The Super key is a set of attributes which can uniquely identify a tuple. Super key is a superset of the candidate key.

Foreign key: The Foreign key is a primary key from one table, which has a relationship with another table. It acts as a cross-reference between tables.

A stored procedure is a group of SQL statements that have been created and stored in the database. The stored procedure increases the reusability as here the code or the procedure is stored into the system and used again and again that makes the work easy, takes less time in processing and decreases the complexity of the system. So, if you have a code which you need to use again and again then save that code and call that code whenever it is required.

Relational Calculus is a Non-procedural Query Language which uses mathematical predicate calculus instead of algebra. Relational calculus doesn’t work on mathematics fundamentals such as algebra, differential, integration, etc. That’s why it is also known as predicate calculus.

There is two type of relational calculus:

  • Tuple relational calculus
  • Domain relational calculus

he Join operation is one of the most useful activities in relational algebra. It is most commonly used way to combine information from two or more relations. A Join is always performed on the basis of the same or related column. Most complex queries of SQL involve JOIN command.

There are following types of join:

Inner joins: Inner join is of 3 categories. They are:

  • Theta join
  • Natural join
  • Equi join

Outer joins: Outer join have three types. They are:

  • Left outer join
  • Right outer join
  • Full outer join

The Entity is a set of attributes in a database. An entity can be a real-world object which physically exists in this world. All the entities have their attribute which in the real world considered as the characteristics of the object.

For example: In the employee database of a company, the employee, department, and the designation can be considered as the entities. These entities have some characteristics which will be the attributes of the corresponding entity.

The Data model is specified as a collection of conceptual tools for describing data, data relationships, data semantics and constraints. These models are used to describe the relationship between the entities and their attributes.

There is the number of data models:

  • Hierarchical data model
  • network model
  • relational model
  • Entity-Relationship model and so on.

DELETE command: DELETE command is used to delete rows from a table based on the condition that we provide in a WHERE clause.

  • DELETE command delete only those rows which are specified with the WHERE clause.
  • DELETE command can be rolled back.
  • DELETE command maintain a log, that’s why it is slow.
  • DELETE use row lock while performing DELETE function.

TRUNCATE command: TRUNCATE command is used to remove all rows (complete data) from a table. It is similar to the DELETE command with no WHERE clause.

  • The TRUNCATE command removes all the rows from the table.
  • The TRUNCATE command cannot be rolled back.
  • The TRUNCATE command doesn’t maintain a log. That’s why it is fast.
  • TRUNCATE use table log while performing the TRUNCATE function.

Following are three levels of data abstraction:

Physical level: It is the lowest level of abstraction. It describes how data are stored.

Logical level: It is the next higher level of abstraction. It describes what data are stored in the database and what the relationship among those data is.

View level: It is the highest level of data abstraction. It describes only part of the entire database.

For example– User interacts with the system using the GUI and fill the required details, but the user doesn’t have any idea how the data is being used. So, the abstraction level is entirely high in VIEW LEVEL.

Then, the next level is for PROGRAMMERS as in this level the fields and records are visible and the programmers have the knowledge of this layer. So, the level of abstraction here is a little low in VIEW LEVEL.

And lastly, physical level in which storage blocks are described.

Data integrity is one significant aspect while maintaining the database. So, data integrity is enforced in the database system by imposing a series of rules. Those set of integrity is known as the integrity rules.

There are two integrity rules in DBMS:

Entity Integrity : It specifies that “Primary key cannot have a NULL value.”

Referential Integrity: It specifies that “Foreign Key can be either a NULL value or should be the Primary Key value of other relation

The Checkpoint is a type of mechanism where all the previous logs are removed from the system and permanently stored in the storage disk.

There are two ways which can help the DBMS in recovering and maintaining the ACID properties, and they are- maintaining the log of each transaction and maintaining shadow pages. So, when it comes to log based recovery system, checkpoints come into existence. Checkpoints are those points to which the database engine can recover after a crash as a specified minimal point from where the transaction log record can be used to recover all the committed data up to the point of the crash.

The term query optimization specifies an efficient execution plan for evaluating a query that has the least estimated cost. The concept of query optimization came into the frame when there were a number of methods, and algorithms existed for the same task then the question arose that which one is more efficient and the process of determining the efficient way is known as query optimization.

There are many benefits of query optimization:

  • It reduces the time and space complexity.
  • More queries can be performed as due to optimization every query comparatively takes less time.
  • User satisfaction as it will provide output fast

1NF is the First Normal Form. It is the simplest type of normalization that you can implement in a database. The primary objectives of 1NF are to:

  • Every column must have atomic (single value)
  • To Remove duplicate columns from the same table
  • Create separate tables for each group of related data and identify each row with a unique column

DBMS is a collection of programs that facilitates users to create and maintain a database. In other words, DBMS provides us an interface or tool for performing different operations such as the creation of a database, inserting data into it, deleting data from it, updating the data, etc. DBMS is a software in which data is stored in a more secure way as compared to the file-based system. Using DBMS, we can overcome many problems such as- data redundancy, data inconsistency, easy access, more organized and understandable, and so on. There is the name of some popular Database Management System- MySQL, Oracle, SQL Server, Amazon simple DB (Cloud-based), etc.

2NF is the Second Normal Form. A table is said to be 2NF if it follows the following conditions:

  • The table is in 1NF, i.e., firstly it is necessary that the table should follow the rules of 1NF.
  • Every non-prime attribute is fully functionally dependent on the primary key, i.e., every non-key attribute should be dependent on the primary key in such a way that if any key element is deleted, then even the non_key element will still be saved in the database.

A Relation Schema is specified as a set of attributes. It is also known as table schema. It defines what the name of the table is. Relation schema is known as the blueprint with the help of which we can explain that how the data is organized into tables. This blueprint contains no data.

A relation is specified as a set of tuples. A relation is the set of related attributes with identifying key attributes

See this example:

Let r be the relation which contains set tuples (t1, t2, t3, …, tn). Each tuple is an ordered list of n-values t=(v1,v2, …., vn).

An entity type is specified as a collection of entities, having the same attributes. Entity type typically corresponds to one or several related tables in the database. A characteristic or trait which defines or uniquely identifies the entity is called entity type.

For example, a student has student_id, department, and course as its characteristics.

Data Definition Language (DDL) is a standard for commands which defines the different structures in a database. Most commonly DDL statements are CREATE, ALTER, and DROP. These commands are used for updating data into the database.

Extension: The Extension is the number of tuples present in a table at any instance. It changes as the tuples are created, updated and destroyed. The actual data in the database change quite frequently. So, the data in the database at a particular moment in time is known as extension or database state or snapshot. It is time dependent.

Intension: Intension is also known as Data Schema and defined as the description of the database, which is specified during database design and is expected to remain unchanged. The Intension is a constant value that gives the name, structure of tables and the constraints laid on it.

A checkpoint is like a snapshot of the DBMS state. Using checkpoints, the DBMS can reduce the amount of work to be done during a restart in the event of subsequent crashes. Checkpoints are used for the recovery of the database after the system crash. Checkpoints are used in the log-based recovery system. When due to a system crash we need to restart the system then at that point we use checkpoints. So that, we don’t have to perform the transactions from the very starting.

The 2-Tier architecture is the same as basic client-server. In the two-tier architecture, applications on the client end can directly communicate with the database at the server side.

Once the DBMS informs the user that a transaction has completed successfully, its effect should persist even if the system crashes before all its changes are reflected on disk. This property is called durability. Durability ensures that once the transaction is committed into the database, it will be stored in the non-volatile memory and after that system failure cannot affect that data anymore.

A Database is a logical, consistent and organized collection of data that it can easily be accessed, managed and updated. Databases, also known as electronic databases are structured to provide the facility of creation, insertion, updating of the data efficiently and are stored in the form of a file or set of files, on the magnetic disk, tapes and another sort of secondary devices. Database mostly consists of the objects (tables), and tables include of the records and fields. Fields are the basic units of data storage, which contain the information about a particular aspect or attribute of the entity described by the database. DBMS is used for extraction of data from the database in the form of the queries.

3NF stands for Third Normal Form. A database is called in 3NF if it satisfies the following conditions:

  • It is in second normal form.
  • There is no transitive functional dependency.
    For example: X->Z
    Where:
    X->Y
    Y does not -> X
    Y->Z so, X->Z

The degree of relation is a number of attribute of its relation schema. A degree of relation is also known as Cardinality it is defined as the number of occurrence of one entity which is connected to the number of occurrence of other entity. There are three degree of relation they are one-to-one(1:1), one-to-many(1:M), many-to-one(M:M).

The 3-Tier architecture contains another layer between the client and server. Introduction of 3-tier architecture is for the ease of the users as it provides the GUI, which, make the system secure and much more accessible. In this architecture, the application on the client-end interacts with an application on the server which further communicates with the database system.

The entity set specifies the collection of all entities of a particular entity type in the database. An entity set is known as the set of all the entities which share the same properties.

For example, a set of people, a set of students, a set of companies, etc.

DData Manipulation Language (DML) is a language that enables the user to access or manipulate data as organized by the appropriate data model. For example- SELECT, UPDATE, INSERT, DELETE.

There is two type of DML:

Procedural DML or Low level DML: It requires a user to specify what data are needed and how to get those data.

Non-Procedural DML or High level DML:It requires a user to specify what data are needed without specifying how to get those data.

System R was designed and developed from 1974 to 1979 at IBM San Jose Research Centre. System R is the first implementation of SQL, which is the standard relational data query language, and it was also the first to demonstrate that RDBMS could provide better transaction processing performance. It is a prototype which is formed to show that it is possible to build a Relational System that can be used in a real-life environment to solve real-life problems.

Following are two major subsystems of System R:

  • Research Storage
  • System Relational Data System

The transparent DBMS is a type of DBMS which keeps its physical structure hidden from users. Physical structure or physical storage structure implies to the memory manager of the DBMS, and it describes how the data stored on disk.

Normalization is a process of analysing the given relation schemas according to their functional dependencies. It is used to minimize redundancy and also used to minimize insertion, deletion and update distractions. Normalization is considered as an essential process as it is used to avoid data redundancy, insertion anomaly, updation anomaly, deletion anomaly.

There most commonly used normal forms are:

  • First Normal Form(1NF)
  • Second Normal Form(2NF)
  • Third Normal Form(3NF)
  • Boyce & Codd Normal Form(BCNF)