Friday, 14 December 2012


B.E/B.Tech DEGREE EXAMINATION Nov/Dec 2006
Fifth semester

                                   
Computer Science and Engineering

CS1301 -DATABASE MANAGEMENT SYSTEMS
(Regulation 2004)
Time: 3 hours                                                                                              Maximum marks: 100

Answer ALL questions

PART A (10 x 2 =20 marks)
1. Compare database systems with file systems.
2. Give the distinction between primary key, candidate key and super key.
3. Write a SQL statement to find the names and loan numbers of all customers who have a loan at Chennai branch.
4. What is multivalued dependency?
5. Give the measures of the quality of the disk?
6. What are the two types of ordered indices?
7. List out the ACID properties.
8. What is shadow paging?
9. Compare DBMS versus object oriented DBMS.
10. What is Data warehousing?
PART B (5 x 16 = 80)
11. (a) (i) Describe the system structure of database system. (12)
 
(ii) List out the functions of DBA (4)
(or)
(b) (i) Illustrate the issues to be considered while developing an ER-diagram. (8)

(ii) Consider the relational database

employee (empname, street, city)

works (empname, companyname, salary)

company (companyname, city)

manages (empname, managername).
               
 Give an expression in the relational algebra for each request.

Find the names of all employees who work for first bank Corporation.
Find the names, street addresses and cities of residence of all employees who work for first bank corporation and earn more than 200000 per annum.
Find the names of all employees in this database who live in the same city as the company for which they work.
Find the names of all employees who earn more than every Employees of small Bank Corporation. (4 * 2 = 8)

12. (a) (i) Discuss about triggers. How do triggers offer a powerful mechanism for dealing with the changes to database with suitable example. (10)
(ii) What are nested queries? Explain with example. (6)
(or)
          
(b) (i) What is normalization? Give the various normal forms of relational schema and define a relation which is in BCNF and explain with suitable example. (12)
(ii)Compare BCNF versus 3NF. (4)
13. (a) (i) Describe about RAID levels. (10)
13.(a)(ii) Explain why allocations of records to blocks affects database system performance significantly. (6)
(or)
(b) (i) Describe the structure of B+ tree and give the algorithm for search in the B+ tree with example. (12)
(ii) Give the comparison between ordered indexing and hashing
14. (a) (i) Explain the different forms of Serializability. (10)
(ii) What are different types of schedules are acceptable for recoverability? (6)
(or)
(b) (i) Discuss on two-phase locking protocol and timestamp-based protocol. (12)
(ii) Write short notes on log-based recovery. (4)

15. (a) (i) Discuss in detail about the object relational database and its advantages. (8)
(ii) Illustrate the issues to implement distributed database. (8)
(or)
(b) (i) Give the basic structure of XML and its document schema. (8)
(ii) What are the two important classes of data mining problems?

Explain about rule discovery using those classes. (8)

  1. Compare database systems with file systems.
Database Systems                              File System
(i) Data independence is there                Data independence is not there
(ii) Data integrity and security more         Data integrity and security is less
(iii) It is easy to access the data               It is difficult to access the data
(iv) Data can be access concurrently       It produces concurrent anomaly
  1. 2. Give the distinction between primary key, candidate key and super key.
Super key:
              A super key is a set of one or more attributes that taken collectively; allow us to identify uniquely an entity in the entity set.
E.g.: Emp id is a super key of employee relation.

Candidate Key:
            
           Super key for which no proper subset is a super key such minimal super keys are called candidate keys.
{Empname, Empstreet} is a candidate key.

Primary Key:
          It is one of the candidate key that is chosen by the database designer as the principle means of identifying entities within an entity.
  1. 3. Write a SQL statement to find the names and loan numbers of all customers who have a loan at Chennai branch.
  2. Π customername, loannumber (σ branchname = Chennai(loan))

  1. What is multivalued dependency?

  1.  
Multivalued dependence (MVD) x->->y specified on relation schema
R where a and y are both subsets of R, specifies the following constraints on any relation state r of R.
         If two tuples t1 and t2 exist in r such that t1 [z] =t2[x] then two tuples t3 and t4 should also exist in r with the following properties where we use Z to denote
           (R – (x U y))
          t3[x] = t4[x] = t1[x] = t2[x]
          t3[y] = t1[y] and t4[y] = t2[y]
          t3[z] = t2[z] = t4[z] = t1[z]
5. Give the measures of the quality of the disk?
Access Time
        
         Seek Time

         Rotational Time
         Data Transfer Rate
         Mean Time To Failure
  1. What are the two types of ordered indices?
   (i) Primary index
      (ii) Secondary index
  1. 7. List out the ACID properties.
   1. Atomicity
           2. Consistency preservation
           3. Isolation
           4. Durability
  1. What is shadow paging?
Shadow paging considers the data base to be make up of a number of fixed size disk page ( or disk blocks) say n.

  1. Compare DBMS versus object oriented DBMS.


DBMS
 Object Oriented DBMS
  1. User simple data types

  1. Structured data types are
                not allowed
  1. Multivalued attributes are specified by creating new relation         
 Deal with complex data types
 Structured data types are
 not allowed
 They represented as is relation

  1. What is Data warehousing?


A data warehouse is a repository of information gathered from multiple sources, stored index a unified schema at a single site.






  1. (a) (i) Describe the system structure of database system. (12)

Database system structure
A database system is portioned into modules that deal with each of the responsibilities of the overall system. The functional components of a database system can be broadly divided into the storage manager and the query processor components.
     Storage Manager:
A storage manager is a program module that provides the interface between the low level data stored in the database and the application programs and queries submitted to the system. The storage manager is responsible for the interaction with the file manager. The raw data are stored on the disk using the file system, which is usually provided by a conventional operating system. The storage manager translates the various DML statements into low-level file system commands. Thus the storage manager is responsible for storing, retrieving and updating in the database.
The storage manager components include:
*  Authorization and integrity manager, which tests for the satisfaction of integrity constraints and checks the authority of users to access data.

*  Transaction manager. Which ensures that the database remains in a consistent (correct) state despite system failures, and that concurrent transaction executions proceed without conflicting.
*  File manager, which manages the allocation of space on disk storage and the data structures used to represent information stored on disk.
*  Buffer manager, which is responsible for fetching data from disk storage into main memory, and deciding what data to cache in main memory. The buffer manager is a critical part of the database system, since it enables the database to handle data sizes that are much larger than the size of main memory.

The storage manager implements several data structures as part of the physical system implementation:
*  Data files, which store the database itself.
*  Data Dictionary, which stores metadata about the structure of the database, in particular the schema of the database.

Indices, which provide fast access to data items that hold particular values.
Query processor:
The query processor components include
*  DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
*  DML Compiler, which translates DML statements in a query language into an evaluation plan consisting of low-level instructions that the query evaluation engine understands. A query can usually be translated into any of a number of alternative evaluation plans that all give the same result. The DML compiler also performs query optimization, that is, it picks the lowest cost evaluation plan from among the alternatives.
*  Query evaluation engine, which executes low-level instructions generated by the DML compiler.


              
(ii) List out the functions of DBA (4)
Functions of Database Administrator:
The DBA is responsible for authorizing access to the database, for   coordinating and monitoring its use, and for acquiring software and hardware resources as needed.
The DBA is accountable for problems such as breach of security or poor system response time.


(b) (i) Illustrate the issues to be considered while developing an ER-diagram. (8)

Design Issues of ER-Diagram:
Use of Entity Sets versus Attributes
Consider the entity set employee with attributes employee-name and telephone-number. It can easily be argued that a telephone is an entity in its own right with attributes telephone-number and location (the office where the telephone is located). If we take this point of view, we must redefine the employee entity set as:
*  The employee entity set with attribute employee-name
*  The telephone entity set with attributes telephone-number and location
*  The relationship set emp-telephone, which denotes the association between employees and the telephones that they have
It is the main difference between these two definitions of an
employee? Treating a telephone as an attribute telephone-number implies that employees have precisely one telephone number each. Treating a telephone as an entity telephone permits employee to have several telephone numbers (including zero) associated with them. However, we could instead easily define telephone-number as a multivalued attribute to allow multiple telephones per employee.
     The main difference then is that treating a telephone as an entity better models a situation where one may want to keep extra information about a telephone, such as its location, or its type (mobile, video phone or plain old telephone), or who all share the telephone. Thus treating telephone as an attribute and is appropriate when the generality may be useful.

Use of Entity Sets versus Relationship Sets
     It is not always clear whether an object is best expressed by an entity set or a relationship. We assumed that a bank loan is modeled as an entity. An alternative is to model a loan not as an entity, but rather as a relationship between customer and branches, with loan number and amount as descriptive attributes. Each loan is represented by a relationship between a customer and a branch.
    
     If every loan is held by exactly one customer and is associated with exactly one branch, we may find satisfactory the design where a loan is represented as a relationship. We must define a separate relationship for each holder of the joint loan. Then we must replicate the values for the descriptive attributes loan-number and amount in each such relationship. Each such relationship must, of course, have the same value for the descriptive attributes loan-number and amount.

     Two problems arise as a result of the replication:
*  the data are stored multiple times, wasting storage space and
*  Updates potentially leave the data in an inconsistent state, where the values differ in two relationships for attributes that are supposed to have the same value.

The problem of replication of the attributes loan-number and
amount is absent in the original design, because there loan is an entity set.

Binary versus n-ary relationship sets
     Relationships in databases are often binary. Some relationships that appear to be non binary could actually be better represented by several binary relationships. For instance one could create a ternary relationship parent, relating a child to his/her mother and father. However, such a relationship could also be represented by two binary relationships, mother and father, relating a child to his/her mother and father separately. Using the two relationships mother and father allows us record a child’s mother, even if we are not aware of the father’s identity; a null value would be required if the ternary relationship parent is used. Using binary relationship sets is preferable in this case.
In fact, it is always possible to replace a non binary (n-ary, for n>2) relationship set by a number of distinct binary relationship sets.
For simplicity, consider the abstract ternary (n=3) relationship set R, relating entity sets A, B and C. We replace the relationship set R by an entity set E, and create three relationship sets:

*  RA, relating E and A
*  RB, relating E and B
*  RC, relating E and C

If the relationship set R had any attributes, these are
Assigned to entity set E; further, a special identifying attribute is created for E(since it must be possible to distinguish different entities in an entity set on the basis of their attribute values).


(ii) Consider the relational database

(1)Π employeename (σ companyname = “ABC” (works))

(2) Π employeename, street, city (σ companyname = “ABC” ^ salary > 200000 (works employee))

(3) Π employeename (employee works company)


employee (empname, street, city)

(1)Π employeename (σ companyname = “ABC” (works))

(2) Π employeename, street, city (σ companyname = “ABC” ^ salary > 200000 (works employee))

(3) Π employeename (employee works company)


works (empname, companyname, salary)

(1)Π employeename (σ companyname = “ABC” (works))

(2) Π employeename, street, city (σ companyname = “ABC” ^ salary > 200000 (works employee))

(3) Π employeename (employee works company)

company (companyname, city)






  1. (a) (i) Discuss about triggers. How do triggers offer a powerful mechanism for dealing with the changes to database with suitable example. (10)


A trigger is a statement that the system executes automatically as a side effect of a modification to the database.
We must meet two requirements:

*  Specify when a trigger is to be executed. This is broken up into an event that causes the trigger to be checked and a condition that must be satisfied for trigger execution to proceed.
*  Specify the actions to be taken when the trigger executes.
The above model of triggers is referred to as the event-condition-action-model for triggers.
    The database stores triggers just as if they were regular data, so that they are persistent and are accessible to all database operations. Once we enter a trigger into the database, the database system takes on the responsibility of executing it whenever the specified event occurs and the corresponding condition is satisfied.

In a DBMS, a trigger is a SQL procedure that initiates an action (i.e. fires and action) when an event (insert, delete or update) occurs.

Since triggers are event driven specialized procedures, they are stored in and managed by the DBMS.
A trigger can’t be called or executed; the DBMS automatically fires the trigger as a result of data modification to the associated table.
Triggers are used to maintain the referential integrity of data by changing the data in a systematic fashion.
Each trigger is attached to a single, specified table in the database.
For e.g.:

If a trigger is monitoring for changes to a column one could write a trigger like the following:
CREATE TRIGGER salary_trigger
BEFORE UPDATE on employee_table
REFERENCING NEW ROW As n, OLD ROW AS 0 FOR EACH ROW

If n salary <> o.salary THEN
Do something here
ENDIF



(ii) What are nested queries? Explain with example. (6)


Nested Queries

The language SQL allows for the formulation of the same query semantics in a variety of different ways. Queries containing joins can be expressed in a \canonical" form (more than one relation name in from clause, join condition in where clause) or by using nested sub queries (SFW-blocks in the where clause). Depending on
  1. the form of the where predicate,
  2. the presence or absence of aggregates,
  3. whether or not the sub query is correlated,
  4. and other characteristics,
We can distinguish a number of different types of nested sub queries. Some of those types can easily be transformed into a canonical form of join query, others can not.
Sub queries are similar to SELECT chaining. While SELECT chaining combines SELECTs on the same level in a query, however, sub queries allow SELECTs to be embedded inside other queries:

They can take the place of a constant (Scalar Sub queries),
They can return a list of values for use in a comparison.
They can take the place of a constant yet vary based on the row being processed,
Sub queries are used in the following sub query expressions which return Boolean (true/false) results.

EXISTS (sub query)

The argument of EXISTS is an arbitrary SELECT statement. The sub query is evaluated to determine whether it returns any rows. If it returns at least one row, the result of EXISTS is TRUE; if the sub query returns no rows, the result of EXISTS is FALSE.
The sub query can refer to variables from the surrounding query, which will act as constants during any one evaluation of the sub query.
This simple example is like an inner join on col2, but it produces at most one output row for each tab1 row, even if there are multiple matching tab2 rows:

SELECT col1 FROM tab1
    WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2);
Example "Students in Projects":
SELECT name FROM stud WHERE EXISTS (SELECT 1 FROM assign WHERE stud = stud.id);
 Name 
------
 fred
 john
 lisa
(3 rows)

[NOT] IN
IN [NOT] (scalar form)

Expression IN (value [...])
The right-hand side of this form of IN is a parenthesized list of scalar expressions. The result is TRUE if the left-hand expression's result is equal to any of the right-hand expressions.
IN [NOT] (sub query form)

Expression IN (sub query)
The right-hand side of this form of IN is a parenthesized sub query, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the sub query result. The result of IN is TRUE if any equal sub query row is found.
Example "Is fred in Project 1? (--Yes)":

SELECT id, name FROM stud WHERE id in (SELECT stud FROM assign WHERE id = 1);
 id | name 
----+------
  1 | fred
(1 row)
 ANY and SOME
Expression operator ANY (sub query)
Expression operator SOME (sub query)
The right-hand side of this form of ANY is a parenthesized sub query, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the sub query result using the given operator, which must yield a Boolean result. The result of ANY is TRUE if any true result is obtained.
SOME is a synonym for ANY. IN is equivalent to = ANY.

ALL

Expression operator ALL (sub query)
The right-hand side of this form of ALL is a parenthesized sub query, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the sub query result using the given operator, which must yield a Boolean result. The result of ALL is TRUE if all rows yield TRUE (including the special case where the sub query returns no rows).
NOT IN is equivalent to <> ALL.

Row-wise comparison

(Expression, expression [...]) operator (sub query)
(Expression, expression [...]) operator (expression, expression [...])
The left-hand side is a list of scalar expressions. The right-hand side can be either a list of scalar expressions of the same length, or a parenthesized sub query, which must return exactly as many columns as there are expressions on the left-hand side. Furthermore, the sub query cannot return more than one row. (If it returns zero rows, the result is taken to be NULL.) The left-hand side is evaluated and compared row-wise to the single sub query result row, or to the right-hand expression list. Presently, only = and <> operators are allowed in row-wise comparisons. The result is TRUE if the two rows are equal or unequal, respectively.





(b) (i) What is normalization? Give the various normal forms of relational schema and define a relation which is in BCNF and explain with suitable example. (12)



Normalization of a data is a process of analyzing the given relation schema based on their FDs and primary keys to achieve the desirable properties of
  1. Minimizing redundancy
  2. Minimizing the insertion, deletion and update anomalies.
Different Normal Forms are
  1. FIRST NORMAL FORM
  2. SECOND NORMAL FORM 
  3. THIRD NORMAL FORM
  4. FOURTH NORMAL FORM
  5. FIFTH NORMAL FORM
  6. BOYCE – CODE NORMAL FORM
A relation schema R is in BCNF if whenever a non trivial functional dependency X->A holds in R then X is a super key of R.

 


                          

The only difference between the definition BCNF and 3NF is that which allows A to br prime, is absent from BCNF.
Suppose that we have thousands of lots in the relation but the lots are from only two countries XX and YY.
Lots sizes in XX country are 0.5, 0.6, 0.7, 0.8, 0.9, 0.10
Lots sizes in YY country are 1.1, 1.2… 2.0
We have additional FD,
Area->country name
Then, the relation schema is still in 3NF because country name is prime attribute. The area of a LOT that determines the country, as specified by FD5, can be represented by a separate relation R (Area, country name). This representation reduces the redundancy of repeating the same information in thousand of LOTS A tuples.
 




     FD5 violates BCNF in LOTS 1A because Area is not a super key of LOTS 1A.
     FD5 satisfies 3NF in LOTS 1A because country name is a prime attribute, but this condition does not exist in the definition of BCNF.




(ii)Compare BCNF versus 3NF. (4)

BCNF versus 3NF
              Third Normal Form (3NF):
                        3NF is based on the concept of transitive dependency. A
              functional dependency X -> Y in a relation schema R is a
              transitive dependency if there is a set of attributes Z that is
              neither a candidate key nor  a subset of any key R, and both
X->Z and Z -> Y hold.
 
  The dependency eid -> DMGRid is transitive through DNUMBER in EMP-DEPT, because both the dependencies Eid->DNUMBER and
DNUMBER->DMGR hold and DNUMBER is neither a key itself nor a subset of the key of EMP-DEPT.
     The dependency of DMGRid on DNUMBER is undesirable in
EMP-DEPT. Since DNUMBER is not a key of EMP-DEPT.
     The relation schema EMP-DEPT is in 2NF since no partial dependencies on a key exist. However EMP-DEPT is not in 3NF because of the transitive dependency of DMGRid (also dname) on eid via DNUMBER.
     We can normalize EMP-DEPT by decomposing it into the two 3NF relation schema ED 1 and ED 2.
     1NFRelation should have no non atomic attributes or nested relation. From new relation for each non atomic or nested relation.
     2NF for relation where PK contains multiple attributes, no non key attribute should be functionality dependent on a part of the PK. Decompose and set up a new relation for each partial key with its dependent attributes. Make sure to keep a relation with the original PK and any attributes that are fully functionally dependent on it.
     3NF Relation should not have a non key attribute functionally determined by another non key attribute i.e. there should be no transitive dependency of a non key attribute on the PK.
Decompose and set up a relation that includes the non key attributes that functionally determine other non key attribute.


  1. (a) (i) Describe about RAID levels. (10)


RAID Levels
             
  • Schemes to provide redundancy at lower cost by using disk striping combined with parity bits.
  • Different RAID organizations, or RAID levels, have differing cost, performance and reliability characteristics.
  • RAID Level 0: Block striping; non-redundant.
    • Used in high performance applications where data lost is not critical.
  • RAID Level 1: Mirrored disks with block striping.
    • Offers best write performance.
    • Popular for applications such as storing log files in a database system.
  • RAID Level 2: Memory Style Error Correcting Codes(ECC) with bit striping.
  • RAID Level 3: Bit-Interleaved Parity
  • A single parity bits is enough for error correction, not just detection, since we know which disk has failed.
  • When writing data, corresponding parity bits must also be computed and written to a parity bit disk.
  • To recover data in a damaged disk, compute XOR of bits from other disks(including parity bit disk)
  • To recover data in a damaged disk, compute XOR of bits from other disks(including parity bit disk)
  • Faster data transfer than with a single disk, but fewer I/Os per second since every disk has to participate in every I/O.
  • Subsumes level 2 (provides all its benefits, at lower cost).
  • RAID Level 4: Block-Interleaved parity; used block-level striping and keeps a parity block on a separate disk for corresponding blocks from N other disks.
  • When writing data blocks, corresponding block of parity bits must also be computed and written to parity disk.
  • To find value of damaged blocks, corresponding block of parity bits must also be computed and written to parity disk.
    • Provides higher I/O rates for independent block reads than Level3.
    • Block read goes to a single disk, so blocks stored on different disks can be read in parallel.
    • Provides high transfer rates for reads of multiple blocks than no-striping.
    • Before writing a block, parity data must be computed.
    • Can be done by using old parity block, old values of current block and new value of current block(2 block reads + 2 block writes)
    • Or by recomputing the parity value using the new values of blocks corresponding to the parity block.
      • More efficient for writing large amounts of data sequentially.
      • Parity block becomes a bottleneck for independent block writes since every block write also writes to parity disk.
      • RAID Level 5: Block interleaved Distributed. Parity: Partitions data and parity among all N+ 1 disk, rather than storing data in N disks and parity in 1 disk.
      • E.g. with 5 disks, parity block for nth set of blocks is stored one disk (n mod 5) + 1, with the data blocks stored on the other 4 disks.
  • RAID Level 5
    • Higher I/O rates than Level 4 Block writes occur in parallel if the blocks and their parity blocks are on different disks.
    • Subsumes Level 4: provides same benefits, but avoids bottleneck of parity disk.
  • RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but store extra redundant information to guard against multiple disk failures.
  • Better reliability than Level 5 at a higher cost; not used as widely.

RAID levels improves performance, complexity, or reliability. 
    • RAID-0 is Striping RAID. It does not, by itself, contribute to EDAP and provides NO redundancy, since if one drive fails, all the data in the array will be lost.
    • Two types of RAID provide EDAP for the drives: Mirroring and Parity.  Mirroring appeared earlier (in the UC Berkeley Papers) and was originally designated as RAID Level 1. Its  main problem is that it requires 100% redundancy or twice as much capacity as was originally needed. On the other hand, its read performance is improved and a higher percentage of drives in a RAID-1 system may fail simultaneously as compared to a Parity RAID system.
    • RAID Level 2 requires the use of non-standard disk drives and is therefore not commercially viable.
    • Parity RAID was identified in the UC Berkeley Papers as RAID Levels 3, 4, 5 and 6.  It RAID significantly reduces redundancy overhead to a range of 10% to 33% (compare to 100% for Mirroring RAID-1). RAID Levels 3-5 provide EDAP in case of  one disk failure, and RAID-6 tolerates failure of two disks at the same time or if the second disk fails later, during the reconstruction period.



A file is organized logically as a sequence of records; these records are mapped onto disk blocks. Files are provided as a basic construct in operating systems, so we shall assume the existence of an underlying file system. We need to consider ways of representing logical data models in terms of files.
          One approach to mapping the database to files is to use several files, and to store records of only one fixed length in any given file. An alternative is to structure our files so that we can accommodate multiple lengths for records; however, files of fixed length records are easier to implement than are files of variable-length records. Many of the techniques used for the former can be applied to the variable-length case. Thus, we begin by considering a file of fixed-length records.
Fixed-length records:
As an example, let us consider a file of account records for our bank database. Each record of this file is defined as:
          Type deposit = record
          Account-number: char (10);
          Branch-name: char (22);
          Balance: real;
          End
          If we assume that each character occupies 1 byte and that a real occupies 8 bytes, our account record is 40 bytes long. A simple approach is to use the first 40 bytes for the first record, the next 40 bytes for the second record, and so on. However two problems in this approach:
    • It is difficult to delete a record from this structure. The space occupied by the record to be deleted must be filled with some other record of the file, or we must have a way of marking deleted records so that they can be ignored.
    • Unless the block size happens to be a multiple of 40, some records will cross block boundaries. That is, part of the record will be stored in one block and part in another. It would thus require two block accesses to read or write such a record.
Variable Length Record:
Variable-length records arise in database systems in several ways.
  • Storage of multiple record types in a file.
  • Record types that allow variable lengths for one or more fields.
  • Record types that allow repeating fields.
We use one variable-length record for each branch name and for all the account information for that branch. The format of the record is
Type account-list = record
Branch-name: char (22);
Account-info: array [1……∞ ] of
Record;
Account-number: char (10);
Balance: real;
End
End



Tagged:

0 comments:

Post a Comment

Popular Posts