QUESTION
SQL Questions:
Q1:
1. Consider the following table :
Account(accNo, acName, balance, openedDate)
accNo is the primary key in the Account table. There are 1,000,000 accounts in the table distributed
equally in 10,000 pages on disk. Assume that an unclustered B+ Tree index with search key <accNo,
balance> exists with height 2.
The query shown below, which obtains the balance for a given account number, is run very often and is
important that it runs as efficiently as possible.
1 | P a g e
SELECT balance
FROM Account
WHERE accNo = ‘xxxx’
Would using the unclustered B+ tree index described above for this query be beneficial? Explain your
answer by estimate the cost in terms of disk I/Os for the query plans considered.
2. Consider the following table
Product(productNo, productName, color, size, manufacturer, description)
productNo is the primary key in the Product table. There are 10,000,000 products in the table distributed
equally in 100,000 pages on disk.
A popular query is the “Find Product Name” query, which obtains the product name for a given product
number. This query is shown in SQL below:
SELECT productName
FROM Product
WHERE productNo = ‘<prodNumber>’
To improve the performance of the above workload, you are given the choice of the following
alternatives:
Alternative 1: No indexes on Account table
Alternative 2: Unclustered B+ tree index on <productNo> with height 2.
Alternative 3: Clustered B+ tree index on <productNo> with height 2.
Alternative 4: Unclustered B+ tree index on <productNo, productName> with height 3.
Which Alternative would you choose? Why? Justify your answer by estimating the cost (in terms disk
I/Os) by considering query plans with different alternatives above.
3. Explain the steps that take place when queries are processed within a DBMS?
Q2:
1. Briefly explain what is meant by the “Next block” concept. What is its purpose?
2. Briefly explain the time taken to read/write a disk block.
3. Consider a disk with a sector size of 512 bytes, 2000 tracks per surface, 50 sectors per track, 5 doublesided
platters,
average
seek
time
of
10
msec.
Suppose
that
a
block
size
of
1,024
bytes
is
chosen.
Suppose
that
a
file
containing
100,000
records
of
100
bytes
each
is
to
be
stored
on
such
a
disk
and
that
no
record
is
allowed
to
span
two
blocks.
a.
How
many
records
fit
onto
a
block?
b.
How
many
blocks
are
required
to
store
the
entire
file?
If
the
file
is
arranged
sequentially
on
disk,
how
many
surfaces
are
needed?
c.
How
many
records
of
100
bytes
each
can
be
stored
using
this
disk?
d.
If
pages
are
stored
sequentially
on
disk,
with
page
1
on
block
1
of
track
1,
what
is
the
page
stored
on
block
1
of
track
1
on
the
next
disk
surface?
How
would
your
answer
change
if
the
disk
were
capable
of
reading/writing
from
all
heads
in
parallel?
e.
What
is
the
time
required
to
read
a
file
containing
100,000
records
of
100
bytes
each
sequentially?
Again,
how
would
your
answer
change
if
the
disk
were
capable
of
reading/writing
from
all
heads
in
parallel
(and
the
data
was
arranged
optimally)?
Assume
that
the
transfer
time
of
one
track
of
data
is
0.011
seconds.
4.
What
is
a
record
id?
How
is
it
useful?
Explain
why
it
is
desirable
for
record
id’s
to
remain
unchanged
even
when
the
records
are
moved
inside
a
page
(due
to
deletes)?
5.
Briefly
explain
an
index
and
its
characteristics
in
a
DBMS
context.
6.
What
is
the
advantage
of
using
indexes?
What
are
its
disadvantages?
Explain
your
answer
7.
Explain
the
following:
a.
Three
alternatives
for
data
entries
in
an
index
b.
Clustered
indexes
c.
Unclustered
indexes
d.
Sparse
indexes
e.
Dense
indexes
8.
Does
the
order
matter
in
composite
search
keys?
How?
Explain
your
answer.
9.
How
many
clustered
indexes
can
a
relation
have?
Why?
10.
Consider
the
B+
tree
index
of
order
d
=
2.
a.
Show
the
tree
that
would
result
from
inserting
a
data
entry
with
key
9
into
this
tree.
b.
Show
the
B+
tree
that
would
result
from
inserting
a
data
entry
with
key
3
into
the
original
tree.
How
many
page
reads
and
page
writes
will
the
insertion
require?
c.
Show
the
B+
tree
that
would
result
from
deleting
the
data
entry
with
key
8
from
the
original
tree,
assuming
that
the
left
sibling
is
checked
for
possible
redistribution.
2 | P a g e
d. Show the B+ tree that would result from deleting the data entry with key 8 from the original tree,
assuming that the right sibling is checked for possible redistribution.
Q3:
1. Briefly describe a transaction in the database context and its properties.
2. Explain why concurrent execution of transactions is desirable.
3. Explain the conflicts that may cause anomalies with interleaved execution of actions of multiple
transactions. (Hint: Use examples to illustrate anomalies)
4. Answer the following with regard to Strict Two Phase Locking Protocol (also written as Strict 2PL).
a. Explain Strict 2PL Protocol.
b. Show how Strict 2PL avoid conflicts that may cause anomalies.
5. What is meant by a cascading abort? What is the cause for a cascading abort? Give an example of
a schedule with a cascading abort. How does Strict 2PL protocol avoid cascading aborts?
6. What is meant by an unrecoverable schedule? Give an example of a schedule with an
unrecoverable schedule. How does Strict 2PL protocol ensure recoverable schedules
7. Consider the following schedule which uses Strict 2PL protocol.
T1 T2 T3 T4
S(A)
R(A)
3 | P a g e
X(B)
W(B)
S(B)
S(C)
R(C)
X(C)
X(B)
(i.) Determine whether a deadlock exists in the above schedule. Explain how you obtained at your
answer.
(ii.) If a deadlock has occurred, how does the DBMS resolve the deadlock?
8. Deadlock Prevention:
(i.) Explain deadlock prevention approaches (Wait-die & wound-wait).
(ii.) Show how a schedule where a deadlock may not occur still results in transaction aborts in the
deadlock prevention approaches.
(iii.) Explain why when a transaction is re-started in deadlock prevention approaches it is given the
original timestamp.
9. Answer the following with regard to phantoms.
a. Briefly explain the phantom problem
b. How do DBMS avoid the phantoms?
10. State the different Transaction Isolation Levels and possible anomalies that may occur with each
of them.
4 | P a g e
SOLUTION
Q1
- Sub part1:
Select balance from Account where AccNo=’xxxx’
If I choose clustered B+ Tree index with 1000000/10000 100 rec per page to speed up
Searching process.
- Select productname from product where product no = <prodno>
Again the searching is on product no. Therefore product no. must be clustered B+ Tree indexed.
There 10000000/100000 recs/page. So there are huze records. When data is in clustered index
Searching time will be reduced.
3.The steps of execution of a querry in high level language is as follows:
Figure 1: shows the steps of execution of Query in DBMS (Source: http://cnx.org/content/m28213/latest/)
Parsing and translating the query
The foremost step in the execution of a query is to convert the query into a usable form by the query processing engine. High level languages like SQL represent query as a string or a sequence of characters. Certain strings or character sequences represent tokens like keywords, operators, operands, literal strings etc. (http://www.interspire.com/content/2006/02/15/introduction-to-database-indexes/, 2012)Similar to any other high level language, there are syntaxes, rules how theses strings would be converted into understandable statements. The primary job of the parser is to extract the tokens from the strings and translate them into the corresponding relational algebra operations and operands. Then, the parser would check the validity of the syntaxes.
Optimizing the query
In this stage of the query processing the query processor applies the rules to the internal data structures of the query to transform these structures into equivalent, but more efficient representations. These rules are based on the mathematical models of the relational algebra expressions and tree (heuristics), upon cost estimates of different algorithms applied to the operations or upon the semantics within the query and relations it involves.
Evaluating the query
The final step in processing a query is the evaluation where the best plan for the execution is selected by the optimization engine and the query is evaluated.
Q2
1. Next block Concept: The blocks in a file should be sequentially arranged on the disk (by ‘Next’), to minimize seek and rotational delay. The blocks on the same track are followed by the blocks on the same cylinder which in turn is followed by blocks on adjacent cylinder.
The next block concept is used for a sequential scanning, pre-fetching the several files which make the access of records easy.
2. The organization of a disk system is shown as below:
Figure 2: organization of the disk system
After having a look at the structure of the disk system let us take a look at the read/write operation as asked in the question. When the read/write head is over the proper track, it waits until the appropriate sector is beneath the head; it the accesses the block of information in that sector. This process gives rise to four measures of disk drive’s efficiency:
- Seek time: It is the time it takes the read/write head to get into the position over the specified track.
- Latency: It is the time it takes for the specified sector to spin to the read/write head.
- Access time: The time it takes for a block to start being read; the sum of seek time and latency.
- Transfer rate: The rate which data moves from the disk to the memory. (Cahill,2009)
Computer science illuminated by Nell Dale, John Lewis
3. a) There would be almost 10 records per block.
b) There would be 9766 blocks as no record is allowed to span two blocks. The size of each surface would be 51200000 bytes (=512*2000*50). The size of a file is 10000000 bytes (=100000*100). Thus, each surface is bigger than the file size by more than five times the file size. Hence only one surface would be required for storing the file.
c) The size of the disk is 512000000 bytes (51200000*5*2). The number of records of 100 bytes which can fit into the disk is 5120000.
d) We already know that the size of the file is less than the surface, so it would be page 1 of block 1 and track for the next surface. The read/ write heads only affect the seek time of the disk, and have no relation with the storage of files on the disk, Hence the answer would remain unchanged.
e) The time taken to read a track and get transferred to another track is 0.021 s (=0.01+0.011). Each track has 25600 bytes (=512*50). Thus to read 100,000 records of 100 bytes it would take 8.203125 s. Since, the data is stored on a single surface so it would not affect the solution.
4. Record id or rid for short is a unique identifier for a particular record in a set of records. The rid has a property that we can identify the disk address of the page containing the record by using the rid. The number of I/O’s required to read a record, given a rid is therefore 1 I/O. Thus the cost of the I/O operations get reduced if we know the rid, because the I/O dominates the time it takes to run most of the database operations. This is why the rid is desirable even if the record is moved.
5. An Index is a small table having only two columns. The first column contains a copy of the primary or the candidate key of a table and the second column contains a set of pointers holding the address of the disk block where that particular key value can be found. After the index for a particular query is found appropriate, we can select the type of the index which best fits the job.
The characteristics of the indexes are as follows:
- Clustered vs. Non clustered
- Unique vs. Non unique
- Single column vs. Multicolumn
- Ascending or descending order on the columns in the index
- Full-table vs. filtered for non clustered indexes.
6.
The carefully designed indexes can reduce the time taken for a search operation conducted in the database. Suppose we have a table with each row containing 20 bytes of data, then in order to search the 100th element we would need to go through 99*20=1980 bytes of data before finding 100th record. However if the table is indexed, the search for the 100th record starts not from the table but from the index. The index containing only two columns may be just 4 bytes in each row. So, in this case only 396 bytes are read before getting the desired element. Hence indexing makes the access of the record faster.
Even the databases with moderate complexity make the task of indexing quite complex, time consuming and error prone. The database indexes too much space, for example if you check a textbook you would find references only in few pages as they take too much space and are complex to add. The another disadvantage of using more indexing is that it makes the inserting , updating and deleting the records difficult, while the finding of data is speed up.
7.
a. Three alternatives for data entries in an index:
- Actual data record (with key value k)
- k, rid of matching data record
- k, list of rids of matching data records.
b. Clustered Indexes: It might be the case that we are asked to create an index on a non unique key, such as dept.-id. There could be several employees in each department. Here we use a clustering index, where all employees belonging to the same Dept.-id are considered to be within a single cluster, and the index pointers point to the cluster as a whole.
Let us explain the diagram in figure 2. Each of the disk blocks contain 4 records . The index contains entries for 5 separate departments. The pointers of these entries point to the anchor record of the block where the first of the Dept.-id in the cluster can be found. The blocks themselves may point to the anchor record of the next block in case a cluster overflows a block size.
Figure 2: Clustered Indexes (Source: http://www.eazynotes.com/pages/database-management-system/indexing.html)
c. An un-clustered index is useful for the columns that have some repeated values. For example account type data in bank database. A bank database can have 10 million of rows, but there would be only 10-15 distinct account types. Un-clustered indexes have the same structure as clustered indexes except the data rows of the underlying table are not sorted and sorted based on their un-clustered keys.(Cahill,2009)
d. Sparse Index: For large tables the dense primary Index itself begins to grow in size. To keep the size of the index smaller, instead of pointing to each and every record in the main table, the index points to the records in the main table in a gap.
Figure 3: Sparse Indexes (Source: http://www.eazynotes.com/pages/database-management-system/indexing.html)
e. Dense Index: The primary Indexes are of two types the dense primary index and the non dense or the sparse index as explained above. There is one to one relationship between the entries in the index table and the records in the main table. Each and every record in the main table has an entry in the index.
Figure 4: Dense Indexes (Source: Figure 4: Dense Indexes (Source: http://www.eazynotes.com/pages/database-management-system/indexing.html)
8. The choice of ordering the most selective first doesn’t normally impact performance. This is contrary to what is often given as advice. The reason, I believe, is in the way a btree works. a b-tree guarantees a fixed amount of logical I/o (the tree depth) regardless of column order in the index. In fact, the least selective column might be the best choice because it will cluster data by that column. So if you have a column value that often is selected off a low cardinality value (e.g. where col = 0 on a col which has only a few values), putting that Column first clusters all of the col = 0 near each other. This in turn improves the chances of a block cache hit on the index access.
9. A clustered index determines the physical order of data in a table. A clustered index is analogous to a telephone directory, which arranges data by last name. Because the clustered index dictates the physical storage order of the data in the table, a table can contain only one clustered index.
Q3.
1. A database transaction is a logical unit of database operations which are executed as a whole to process user requests for retrieving data or updating the database.
Following are the properties of database transaction, collectively known as ACID:
- Atomicity-all the tasks of database transaction must be completed, if incomplete due to any possible reasons the database transaction must be aborted.
- Consistency: The database must be in a consistent or the legal state before and after the database transaction. This means that the database transaction must not break the database integrity constraints.
- Isolation: Data used in the execution of a database transaction must not be used in another database transaction until execution is completed.
- Durability: All the database modifications of a transaction will be permanent even if a system failure occurs after the transaction has been completed.
2. There are many complexities which arise due to the concurrent executions, but there are two good reasons which make it desirable:
a) Improved throughput and resource utilization: A transaction consists of many steps. Some involve I/O activity; others involve CPU activity. The CPU activity and the disks can operate in parallel. Therefore the I/O activity can be done in parallel with the processing work of the CPU. This can be utilized in the running multiple transactions in parallel. The processor can be running one transaction; a read/write operation can be done on the behalf of other transactions in the disks. This increases the throughput of the system- that is the number of transactions in a given amount of time.
b) Reduced waiting time: There are may be many transactions running on a system, some short some long. If the system does the transactions serially, a short transaction may have to wait fro long if some longer transaction is through the system. Thus in a concurrent system, where the transactions are run in parallel, the time to complete the transactions would be lesser.
3. Most of the database applications in the world are multi-user applications. This means that there can be multiple persons or processes reading from and writing to the database. These persons or processes may try to update the database. The update cycle would consist of the following sequences of actions:
- reading data into memory
- Update data in memory
- Write data back to the database
There may be occasions when the two different users will read both the same data in the memory. Then may be user 1 update the data and write those changes back to the database before the user 2 does the same thing. Thus we get concurrency conflict in such a situation, because user2 read data before the user 2 wrote it back to the database.(Rupley,2010)
4. a) In database transactions, strict two-phase locking is a concurrency control method which helps in guarantying serializability. The strict two –phase locking protocol permits release of exclusive locks only at the end of the transaction, in order to ensure recoverability and cascadelessness of the resulting schedules. The rigorous two-phase locking protocol releases all locks only at thee end of the transaction.
b) Implementing S2PL amount to building a 2PL scheduler, a lock manager that processes lock requests in accordance with the 2PL specification. This involves acquiring a shared lock for data item before reading x, and acquiring an exclusive lock on x before modifying x. The locks must be held for the full duration of the transaction: they can only be released after the transaction commits or aborts. A request for an exclusive lock on x must wait until any locks on x already held by other transactions are released. Similarly, if another transaction already holds an exclusive lock on x, any requests for a lock on x must wait until the exclusive lock is released. If a set of transactions become blocked waiting for locks, such that each of the transaction is waiting for a lock held by one of the others, a deadlock is detected by the DBMS and at least one of the transactions is aborted.
5. If a transaction T used a value X that has been written by a transaction T’, that will be aborted then T also has to be aborted and every transaction T” that used a value written by T’ should be aborted and so on recursively. Thus a single transaction abort leads to a series of transaction rollback. This is known as cascade abort.
A schedule avoids cascading aborts (or, is cascade-free) if (after an abort) the database can be restored to a consistent state by undoing the effects of the aborted transaction only. Consider the following schedule (B, H8):
W1(X) W1(Y) R2 (U) W2(X) R2(Y) W2(Y) W1 (Z) A1
Here, T2 has read Y, which was changed by T1, before T1 committed. So when T1 aborts, T2 has to undo its changes to Y and X before T1 undoes its changes to Y and X.
A schedule avoids cascading aborts if transactions read only the changes of committed transactions (i.e., a transaction does not read an item changed by another transaction until that transaction has committed). Schedules that avoid cascading aborts are (by definition) recoverable.
Suppose T1 changes A from 5 to 6, then T2 changes A from 6 to 7, then T1 aborts. It (operating in isolation) resets A to 5, but then T2’s change to A is inadvertently lost even if it commits. (See also B, H9.) A schedule is called strict if every value written by a transaction $T$ is not read or changed by other transactions until T either aborts or commits. Strict schedules avoid cascading aborts. Every serial schedule is strict, but serializable schedules may or may not be strict, avoid cascading aborts or recoverable.
8. (i) 1. Wait-Die Scheme
When transaction Ti requests a data item currently held by Tj, Ti is allowed to wait only if it has a timestamp smaller than that of Tj (i.e. Ti is older than Tj). Otherwise, Ti is rolled back (dies).
For example, suppose that transactions T22, T23 and T24 have timestamps 5, 10 and 15 respectively. If T22 requests a data item held by T23, then T22 will wait. If T24 requests a data item held by T23, then T24 will be rolled back.(Rupley,2010)
2. Wound-Wait Scheme
When transaction Ti requests a data item currently held by Tj, Ti is allowed to wait only if it has a timestamp larger than that of Tj (i.e. Ti is younger than Tj). Otherwise, Tj is rolled back (Tj is wounded by Ti).
Returning to our example, with transactions T22, T23 and T24, if T22 requests a data item held by T23, then the data item will be preempted from T23, and T23 will be rolled back. If T24 requests a data item held by T23, then T24 will wait.
(iii) The locking protocols that we have described thus far determine the order between every pair of conflicting transactions at execution time by the first lock that both members of the pair request that involves incompatible modes. Another method for determining the serializability order is to select an ordering among transactions in advance. The most common method for doing so is to use a timestamp-ordering scheme.(sudarshan,2011)
With each transaction Ti in the system, we associate a unique fixed timestamp, de-noted by TS(Ti). This timestamp is assigned by the database system before the trans-action Ti starts execution. If a transaction Ti has been assigned timestamp TS(Ti ), and a new transaction Tj enters the system, then TS(Ti ) < TS(Tj ). There are two simple methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a transaction’s time-stamp is equal to the value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a transaction’s timestamp is equal to the value of the counter when the transaction enters the system.
9. a. Phantom problem: Let us consider transaction that executes the following SQL query on the bank database:
Select sum (balance)
from (account)
where branch-name=’Perryridge’
Transaction requires access to all tuples of the account relation pertaining to the Perryridge branch.
Let be the transaction that executes the following SQL insertion:
insert into account
values (A-201, ‘Perryridge’, 900)
Let S be a schedule involving and . There are two possible reasons of conflict:
- If uses the tuple newly inserted by in computing sum (balance), then read a value written by . Thus, in a serial schedule equivalent to S, must come before .
- If does not use the tuple newly inserted by in computing sum(balance), then in a serial schedule equivalent to S, must come before .
The second of the two cases is curious. and do not access any tuple in common, yet they conflict with each other. This problem is called a phantom problem
b. Solution to the phantom problem:
- Table Locks
ü No problems
ü No concurrency
- Index Locks
ü Range locking (using next Key locking)
ü More Complex
ü More concurrency
10. Transaction isolation levels: Transaction isolation levels specify what data is visible to statements within a transaction. These levels directly impact the level of concurrent access by defining what interaction is possible between transactions against the same target data source. The following table gives the different anomalies associated with the isolation levels:
Isolation Level | Dirty read | Non repeatable Read | Phantom Read |
Read Uncommitted | Possible | Possible | Possible |
Read Committed | Not Possible | Not Possible | Possible |
Repeatable read | Not Possible | Not Possible | Possible |
Serializable | Not Possible | Not Possible | Not Possible |
References:
1. Introduction to Database indexes ,viewed on 28th may 2012, < http://www.interspire.com/content/2006/02/15/introduction-to-database-indexes/>
2. Indexes ,viewed on 28th may 2012,< http://www.eazynotes.com/pages/database-management-system/indexing.html)>
3. Cahill, Michael James 2009, Serializable Isolation for Snapshots Databases, viewed on 28th may 2012,http://cahill.net.au/wp-content/uploads/2010/02/cahill-thesis.pdf
4. Postegre SQL 2012,Transaction , viewed on 28th may 2012, < http://www.postgresql.org/docs/9.1/static/transaction-iso.html>
5. Michael James Cahill, Serializable Isolation for Snapshot Databases
6. General Index design Guidelines viewed on 28th may 2012, < http://msdn.microsoft.com/en- us/library/ms191195(v=sql.105).aspx>
7. Michael L. Rupley 2010, Introduction to Query Processing and Optimization
8. Querry processing and optimization , viewed on 28th may 2012,< http://cnx.org/content/m28213/latest/>
9. Silberschatz-Korth-Sudarshan 2011, Database System Concepts, Fourth Edition
LG32
But you can order it from our service and receive complete high-quality custom paper. Our service offers ACCOUNTING essay sample that was written by professional writer. If you like one, you have an opportunity to buy a similar paper. Any of the academic papers will be written from scratch, according to all customers’ specifications, expectations and highest standards.”