Question:
List libraries or programming language features you made use of?
- This is very useful to transfer into your resume, so provide adequate detail.
Deliverables
Checklist of deliverables | |
Hardcopy of | I/II/III |
This writeup | x |
Zip file containing | I/II/III |
This writeup | x |
Test cases showing input/output | x |
Source code | x |
README.TXT * | x |
- * include at top level a file titled README.TXT that provides Installation and Demo Instructions containing instructions on how to install and demo your program
Coverage – Did you complete all of SURLY Part I/II – what is missing?
version | Feature | Covered/Comment |
I | Relation | |
I | Insert | |
I | ||
I | Index – Heap | |
I | CATALOG | |
II | Destroy | |
II | Delete | |
II | Project | |
III | Join | |
III | Select where … AND/OR | |
III | Delete where … AND/OR | |
III | Import/Export in XML | |
optional | View, Constraint, … |
How did you implement
SURLY I
- Relations –
- Tuples –
- Attributes –
SURLY II
- Project –
- Destroy –
- Delete –
SURLY III
- Join –
- Select where –
- Delete where –
- Export/Import –
Things you did differently (e.g., than the SURLY spec)
Limitations of the current release.
Extra features you added – e.g., going beyond the SURLY I/II/III spec
Things you are especially proud of
Recommendations
Things you would do differently if starting over now.
Did SURLY meet your objectives for this course?
Suggestions on how to improve SURLY I/II/III assignment
Suggestions on how to improve the course?
Any other comments?
I. Core of SUR AND SURLY
A. SUR – Introduction
Data is recognized to be an important long-term asset of an enterprise (company, individual, …). Where formerly the emphasis was on libraries of programs that manipulate files of data according to each program’s particular view of the data, now increasingly enterprises are recognize the importance of representing large amounts of data in a uniform way in a central, formatted database. Advantages of this arrangement – availability of data, redundancy control, are well documented in textbooks on database management systems by Elmasri and Navathe, Date and many others.
A data model is a representation for data which is based on a small number of data object primitives and a well defined collection of primitive operations on the data objects. Over the years, several important data models have emerged in the database area – the hierarchical model is based on trees, the network model is based on graphs, the relational model is based on tables/mathematical relations, the object database model is based on OOP; other data models include or are based on entity-relationship, unified modeling language (UML), XML, and grid technology. Of these, the relational data model has been the industry workhorse over the last 30 years because it is simple and easy to work with. Relational query languages are easy to learn and complex queries are expressible in a straightforward way in terms of powerful operators on tables.
This report describes a small but powerful relational database management system called SUR. SUR, a single user relational DBMS, is designed to offer relational database facilities to a host language – C++, Java, LISP, … on a personal computer. SUR is useful in DBMS environments where (single) relations are never too large to fit in a program’s address space. This, of course, is a major restriction for large applications, but SUR has clear applicability for middle and small scale applications where large relations have fewer than O(many thousand) tuples. These applications include individual and small enterprise databases for small businesses, hobbyists, and home computers, The SUR language facility, SURLY, consists of a data definition language (DDL) and a data manipulation language (DML). The SURLY DDL has facilities for creating and destroying relations and secondary indices. The SURLY DML allows easy insertion and deletion of tuples and printing of relations, and offers a relationally complete algebra including relational assignment as well as the nestable operators UNION, SET-DIFFERENCE, PROJECT, SELECT, and JOIN. In a modular way, SURLY can be extended to include VIEWS, TRIGGERS, simple INTEGRITY CONSTRAINTS, LINKS, READ/WRITE, the calculus statement RETRIEVE, and a hierarchical formatted PRINT statement.[1] SURLY can be used as a batch or interactive language. A near variant ESURLY can be used executably. At the implementation level of SUR, memory is divided into NAME SPACE and POINTER SPACE (similar to LISP’s FULL- and FREE-WORD space). NAME SPACE stores components of relations uniquely as strings. POINTER SPACE is further divided into a list space for tuples, relation heaps, and trees, and a linear address space for hash tables.
This report is arranged in four sections.
- Section I covers SURLY’s RELATION, DESTROY, INSERT, DELETE, INPUT, and PRINT statements. Also the HEAP index (a linked list).
- Section II adds INDEX for storage structures TREE and HASH to make querying more efficient, as well as EXPORT and IMPORT statements.
- Section III covers the relational algebra: PROJECT, SELECT, JOIN, UNION, INTERSECTION as well as DELETE WHERE.
- Section IV is a list of extensions to the basic SURLY language. No computer implementation of SUR is described here. Instead, suggestions for an implementation are provided.
The scale and nature of SURLY gives students an opportunity to put into practice principles learned in this and earlier courses (especially structured programming and data structures).
B. The Syntax of SURLY
The syntax of the SURLY language can be described in a variant of BNF as shown in Figure 1. Figure 2 shows sample data from the CSIT_DEPARTMENT database. Students should feel free to modify the input language to be consistent with the implementation language they choose.[2]
Metasymbols
::= means is defined as
{x} means repeat x zero or more times
[x] means x is optional
x | y means either x or y
x || y means concatenate x to y
<x> means x{[,]x} e.g. x x x …or x,x,x
Rules
surlyinput ::= {command}
command ::= RELATION relationname (<attrib format>); |
INDEX indexname ON relationname
ORDER attriblist
STORAGE STRUCTURE storagestructure; |
DESTORY relname; |
INSERT relationname tuple; |
DELETE relationname [WHERE (qualifier)]; |
INPUT {relationname {tuple;} *} END_INPUT; |
PRINT <relexpr>; |
EXPORT <relationname>; |
IMPORT <relationname>; |
relationname = relexpr;
relexpr ::= relname |
JOIN relexprl AND relexpr2
OVER attriblist1 AND attriblist2 |
PROJECT relexpr OVER attriblist |
SELECT relexpr WHERE (qualifier) |
UNION relexprl AND relexpr2 |
SET_DIFFERENCE relexprl AND relexpr2 |
COPY relexpr |
ASSIGN relationname = relexpr |
PRINT relexpr |
(relexpr [GIVING attriblist])
storagestructure ::= HEAP | HASH | TREE (HEAP is default)
format ::= CHAR length | NUM length
length ::= an integer
relname ::= relationname | indexname
relationname ::= identifier
indexname ::= identifier
attrib ::= identifier
attriblist ::= (<attrib>) | attrib
tuple ::= <value>
value ::= character string varying
identifier ::= letter || {letter | number | _}
–length is implementation dependent
qualifier ::= qualifier1 | qualifier1 OR qualifier
qualifier1 ::= compare | compare AND qualifier1
compare ::= attrib relop value
relop ::= = | < | <= | >= | > | ~=
comment ::= /* comment */
SURLY Test Data (Sample Test File)
/* SURLY COMMAND FILE CONTAINING THE SAMPLE CSIT DEPARTMENT DATABASE */
RELATION COURSE (CNUM CHAR 8, TITLE CHAR 30, CREDITS NUM 4);
RELATION PREREQ (CNUM1 CHAR 8, CNUM2 CHAR 8);
RELATION OFFERING (CNUM CHAR 8, SECTION NUM 5, STARTHOUR CHAR 5, ENDHOUR CHAR 5, DAYS CHAR 5, ROOM CHAR 10, INSTRUCTOR CHAR 20);
RELATION STAFF (NAME CHAR 20, SPOUSE CHAR 10, RANK CHAR 5, CAMPUSADDR CHAR 10, EXTENSION CHAR 9);
RELATION INTERESTS (NAME CHAR 20, INTEREST CHAR 30);
RELATION DEPT (NAME CHAR 20, DEPT CHAR 4);
INSERT COURSE CMPS161 ‘ALGORITHM DESIGN AND IMPLEMENTATION I’ 3;
INSERT COURSE CMPS257 ‘DISCRETE STRUCTURES’ 3;
INSERT COURSE CMPS280 ‘ALGORITHM DESIGN AND IMPLEMENTATION II’ 3;
INSERT COURSE CMPS390 ‘DATA STRUCTURES’ 3;
INSERT COURSE CMPS400 INTERNSHIP 6;
INSERT COURSE CMPS439 ‘DATABASE SYSTEMS’ 3;
INSERT PREREQ CMPS161 MATH161;
INSERT PREREQ CMPS257 CMPS161;
INSERT PREREQ CMPS257 MATH165;
INSERT PREREQ CMPS280 CMPS161;
INSERT PREREQ CMPS390 CMPS257;
INSERT PREREQ CMPS390 CMPS280;
INSERT PREREQ CMPS390 CMPS285;
INSERT PREREQ CMPS439 CMPS390;
INSERT OFFERING CMPS161 27921 8:55 9:45 MW FAY122 BURRIS;
INSERT OFFERING CMPS390 27922 12:30 13:45 MW FAY213 DENEKE;
INSERT OFFERING CMPS390 27935 8:55 9:45 TR PER108 BURRIS;
INSERT OFFERING CMPS339 27950 9:30 10:45 TR FAY122 DENEKE;
INSERT OFFERING CMPS420 27974 14:00 15:15 MW FAY213 DENEKE;
INSERT OFFERING CMPS439 27977 9:30 10:45 MW FAY122 DENEKE;
INSERT STAFF WITTENBERG DON SEC A8C 0030;
INSERT STAFF DENEKE WHO ASSIS ‘FAY 329B’ 3769;
INSERT INTERESTS DENEKE AI;
INSERT INTERESTS DENEKE DBMS;
INSERT DEPT DENEKE CMPS;
INSERT DEPT BURRIS CMPS;
INSERT DEPT GREGORY MATH;
PRINT COURSE, PREREQ, OFFERING, STAFF, INTERESTS, DEPT;
C. The Lexical Analyzer
Legal SURLY input consists of SURLY symbols separated by zero or more blanks. SURLY symbols can be defined as follows:
‘character string containing no single quotes’
character string containing no blanks or break characters
/*comment*/
break characters: ( ) = ~= < <= > >= ; * ” ‘ ,
It is implementation-dependent whether SURLY symbols may be broken across line boundaries. To read your input, write function NEXTSYMBOL which has no arguments, reads and echo-prints the input, skips leading blanks and comments, and returns the next symbol, leaving the “cursor” one position past the end of the symbol (or on the next line}.
For example:
NEXTSYMBOL
A: SKIP BLANKS echoprinting;
CASE current character =
/* –find corresponding */; GO TO A;
, –GO TO A; (ignore commas}
‘ –read until matching ‘ and return string
( or ) or = or * –return I ( or ) or = or *
< or > or ~ –see if = follows and return <= >= ~= or < > ~
; –print carriage return and return ‘;’
ELSE –read until a blank or a break character is encountered and return the string read.
END CASE;
END NEXTSYMBOL;
It might be useful to you to write a function NEXTCHAR(CHAR} which reads and echoprints the next character, setting CHAR to that character and returning that character. NEXTCHAR will then be responsible for keeping track of line boundaries.
D. Basic SURLY Commands
1. RELATION relationname (<attrib format>);
Example.
RELATION COURSE (CNUM CHAR 8,
TITLE CHAR 30,
CREDITS NUM 4);
Description. RELATION enters a new relation into the database by simply adding a new relation descriptor entry to the CATALOG table. If a relation by that name already exists, DESTROY the old relation before creating the new relation. The new relation is stored as a ‘heap’ (see “storage structures”) and initially contains no tuples. The positional ordering of the attributes in the RELATION definition and the positional ordering of the values in a tuple should correspond one to one (see INSERT}. The format of an attribute consists of the type of the attribute (NUMeric or CHARacter} and the maximum length of the attribute. Character strings longer than “length” characters should be truncated at the right (ERRMSG1}; Numeric strings should contain only digits 0-9 (ERRMSG2) and are truncated to the left if longer than length (ERRMSG3). Both types of data will be stored as character strings, and data values may be shorter than the attributes maximum length.
2. INDEX indexname ON relationname
ORDER attriblist
STORAGE STRUCTURE storagestructure;
Example.
INDEX OFFERINGBYINSTRBYCOURSE ON OFFERING
ORDER (INSTRUCTOR COURSE)
STORAGE STRUCTURE TREE;
Description. INDEX is used to create secondary indices on existing relations in order to make retrieval and update with secondary keys more efficient. In order to maintain the integrity of the index, users will not be allowed to update secondary indices directly. However, when a primary relation is changed its secondary indices will automatically be updated by the system. If a DESTROY command is used on the primary relation, all of its secondary indices are destroyed. If a DESTROY command is used on a secondary index, just that index is destroyed. Secondary indices on other indices are not allowed. Secondary indices can be stored as either TREE or HASH storage structures (See “Storage Structures”).
3. DESTROY <relname>;
Example.
DESTROY COURSE, OFFERINGBYINSTRBYCOURSE;
/*Destroy the COURSE relation and all its secondary indices
and destroy the OFFERINGBYINSTRBYCOURSE secondary indices*/
Description. For each of the relnames specified,
- if the relname is a secondary index, delete it from the INDEX table and reclaim storage.
- if the relname is a primary relation, delete it from the CATALOG table, reclaim storage, and DESTROY any secondary indices as in step 1.
4. INSERT relationname tuple;
Example.
INSERT COURSE CMPS390 ‘DATA STRUCTURES’ 3;
Description. INSERT adds a new tuple to relationname. The relation must already exist and must not be an index. Tuple values must agree in type and order with the corresponding attribute list for the relation, with conversion occurring as specified in the section on the RELATION statement. If relationname has any secondary indices they must be updated as well. If too many or too few tuple variables are encountered in the tuple, an error message is generated (ERRMSG4) and the insertion is aborted.
5. DELETE relationname [WHERE qualifier];
Example.
DELETE OFFERING; /*delete all OFFERING tuples*/
DELETE COURSE WHERE CNUM < CMPS2OO AND INSTRUCTOR = DENEKE
OR CNUM >= CMPS300;
Description. DELETE removes tuples which satisfy the qualification (see the discussion on “Qualification”) from the relation, reclaims storage, and updates any secondary indices that may exist. If the WHERE clause is not present, the result is to delete all tuples in the relation – the result is a valid but empty relation. Note that DELETE WHERE is related to SELECT WHERE NOT.
NOTE: A relation may be emptied of tuples but not deleted using the DELETE command.
6. INPUT { relationname {tuple; } * } END_INPUT;
Example.
INPUT COURSE CMPS161 ‘ALGORITHM DESIGN AND IMPLEMENTATION I’ 3;
CMPS400 INTERNSHIP 6;*
INTERESTS DENEKE DBMS;
” AI;*
END_INPUT;
Description. The INPUT command simplifies insertion into relations by:
- allowing the user to specify sequences of ‘relationname tuple tuple… tuple*’, without the words INSERT and relationname for each tuple, and
- allows the user to specify ” (one double quote) as a tuple component indicating that the tuple component is the same as the tuple component in the corresponding position of the last tuple encountered. [This ditto feature is optional.]
As in INSERT, too many or too few values in a tuple is an error (ERRMSG4)
7. PRINT <relexpr>;
Example.
PRINT COURSE;
COURSE | ||
CNUM | TITLE | CREQ |
CS251O | STRUCTURED PROGRAMMING IN JAVA | 4 |
CS141O | BUSINESS FORTRAN | 3 |
CS341 | COBOL | 3 |
Description. PRINT formats and prints in tabular form each of the named relations in its argument list. If a secondary index is specified, PRINT prints the primary relation tuples in the order specified by the secondary index. What action is taken by PRINT when the tuple length exceeds the line length is implementation dependent. Attribute names are truncated to fit the specified format. If a relexpr is not a relname, no relation name is printed. Instead, *TEMPORARY* is printed for the table name.
E. The SURLY Interpreter
The SURLY interpreter has a very simple organization: read an operation, branch to code which reads the operation’s arguments, execute the operation and loop back to read the next operation:
DO WHILE (TRUE);
CASE NEXTSYMBOL =
‘RELATION’ –create a new relations descriptor in the RELATION class.
‘INDEX’ –create a new index descriptor in the INDEX class; add to the corresponding RELATION.INDICES; and build the new index with the indicated storage structure.
‘INPUT’ –DO WHILE (SETC (RELNAME, NEXTSYMBOL) ~= ‘END INPUT’);
REL# = RELNUM(RELNAME); –
DO WHILE (SETN(T, READTUPLE(REL#)) ~= 0);
CALL INSERT (REL#,T);
END;
END;
CALL MATCH(‘;’ );
‘INSERT’ -–CALL INSERT(SETN(REL#,RELNUM(NEXTSYMBOL)),
READTUPLE(REL#));
That is, insert tuple into relation and update any secondary indices.
‘DELETE’ –read RELNAME; if WHERE clause is not present, reclaim storage of all tuples.
‘DESTROY’ –delete all primary tuples and/or all secondary indices, and destroy relation and/or secondary index descriptors
‘PRINT’ –beautifully format the named relations and indices
ELSE –print ‘BYE BYE’ and STOP.
END CASE;
END DO WHILE;
Each of these operations is a module and can be written and tested more or less independently: INPUT calls INSERT, DESTROY calls DELETE, and so you may wish to write a “dummy” INSERT and DELETE to test INPUT and DESTROY. The programming of these modules is fairly straightforward though you will find some hints in the “Programming Notes” section.
[A high-end alternative is to use a compiler-compiler to read the SURLY grammar specification and translate to code that executes SURLY commands. For instance, see the javacc parser generator.]
G. INDEX statement (storage structures)
Refer to the example of storage structures in Figure 4 (below). The example shows the OFFERING relation stored as a heap (linked list of tuples, where each tuple is also a linked list) and two secondary indices, a bucket hash index called OHASH and a binary tree index called OTREE.
HEAP STORAGE (A LINKED LIST)
together (as well as other tuple-key combinations hashing to a giver address). Access to a tuple in hash storage is fast, about order (number of tuples in the relation/hash table size). So the hash table size should be approximately the same as the number of the tuples in the relation. To simplify hash table allocation, make all hash tables the same size –the average size of a relation, say 50. (So access is order (1). ) The hash function should provide as uniform a spread as possible; you may wish to base it on the key values themselves or alternatively on the unique storage ids (NAMETBL indices) of those values. One example of a hash function is:
HASH_INDEX = REMAINDER((Σ KEYSTORAGEIDS)*LARGEPRIME#, HASHTBLSIZE) + 1;
TREE STORAGE (A BINARY TREE)
TREE storage will be implemented with a binary tree, so nodes might look like the following:
H. EXPORT and IMPORT statements
EXPORT <relationname>;
For each relation in the list of relations, export the relation to a file of the same name (in a default directory).
To export a relation, make up an XML format that encodes the RELATION, INDEXes, and tuples for a relation and write that to the file.
IMPORT <relationname>;
For each relation in the list of relations, import the relation from a file of the same name (in a default directory). That means reading the XML statement.
(Alternatively, you can export a file by writing RELATION, INDEX, and INPUT statements and importing is just reading these. In.
I. Assignments
This looks like a healthy project – but we have a whole semester. Still, the best way to attack a big problem is to break it into sub-problems. For this kind of problem, a natural decomposition is to implement only a subset of the system’s rules. Then to add on modules as we go until the whole language has been implemented. The most natural set of rules to start with is one that implements a subset of SURLY.
Do not write and test your program all at once. You have several weeks to finish your assignment, but you won’t finish it if you wait to the last week or two to get started. First break the problem into pieces and decide to work on certain pieces each week. Test each piece with a driver before putting the pieces together.
Consider that you are implementing a prototypical SURLY. In this version, don’t spend a lot of time on handling input errors. You may assume error-free input.
Be sure to include a comprehensive collection of test cases with your assignment. Develop a file of test commands and read these in, executing them. This “regression test” will help you test your program.
II. Relational Algebra Commands
First we describe the rest of the basic SURLY commands.
A. The Relational Assignment Statement and Relexpr’s
relationname = relexpr;
REL EXPR
Let RELNUM be the index of a relation in the RELATION table. Consider implementing the function RELEXPR of no arguments, which reads relation- expressions composed of JOIN’s, PROJECT’s, relname’s, …and returns a RELNUM pointing to a new temporary relation. For example:
RELEXPR =
CASE NEXTSYMBOL =
‘JOIN’ DO;
RELNUMl = REL EXPR; MATCH (‘AND’ );
RELNUM2 = REL-EXPR; MATCH (‘OVER’)’
ATTRIBNUM1 = GETATTRIBNUM (RELNUM1, NEXTSYMBOL);
MATCH(‘AND’)
ATTRIBNUM2 = GETATTRIBNUM (RELNUM2, NEXTSYMBOL);
RETURN (JOIN (RELNUM1, RELNUM2,
ATTRIBNUM1, ATTRIBNUM2));
END;
‘PROJECT’ DO; …RETURN(PROJECT(RELNUM, ATTRIBLISTPTR)); END;
‘SELECT’ DO; …RETURN(SELECT(RELNUM, QUALIFIER)); END;
<a relation name> RETURN(RELNUM(the relation name));
<an index name> RETURN(RELPTR(INDNUM (the index name)));
END CASE;
Note that RELEXPR is RECURSIVE. If your implementation language does not have recursion, you may have to simplify JOIN, PROJECT and SELECT to take ‘relname’s instead of ‘relexpr’s. The disadvantage is that your SURLY users will have to use lots of relation assignment statements and must now keep track of what to name temporary relations and when to destroy them. Alternatively, you may limp along with some simulation of recursion. Also note: since the functions PROJECT, SELECT and JOIN take relations and not indices as arguments, the user gets no computational advantage from using indexname’s instead of relationnames in the corresponding SURLY commands; indexnames are included for convenience. SURLY will decide when to use the indices. Finally, refer back to Figure l and note that relexpr’s may be surrounded by parenthesis. This is for convenience when writing large nested expressions. The user is advised to use an indentation scheme when writing large expressions.
The Relation Assignment Statement
The relation assignment statement has the effect of creating a new relation descriptor (or renaming an old one, see below). Now that RELEXPR has been defined, .it should be a simple matter to add the relational assignment to the SURLY interpreter. Just replace the RELNAME of the temporary relation returned by RELEXPR with the name of the left side of the assignment. Notice that this has the effect of renaming (not copying) a relation if you say relationnamel = relationname2. What happens if you say relationname = indexname? Can two relations with the same name exist in your implementation?
Note that the relexpr (relexpr [GIVING attriblist]) does two things: it allows parenthesis to surround arbitrary relexpr’s. Like indentation, this is useful in writing nested relexprs. Secondly, the GIVING clause allows the attributes of the relexpr to be renamed. This is especially useful in JOIN’s and RETRIEVE’s if non-join fields in two or more relations are similarly named. Add something like the following code to the CASE statement in the definition of RELEXPR above.
‘(‘ DO;
RELNUM = REL EXPR;
IF MATCH (‘GIVING’)
THEN DO; MATCH(‘(‘);
RENAMEATTRIBS(RELNUM,READATTRIBS);
MATCH (‘)’);
END;
MATCH (‘)’);
RETURN (RELNUM);
END;
B. PROJECT
The interpreter, upon recognizing a ‘PROJECT’ command, goes on to read the projection’s arguments and calls:
RELNUMNEW = PROJECT ATTRIBLISTPTR FROM RELNUMOLD;
The PROJECT function should do the following:
- Create a new (temporary) relation, replete with attribute names and formats (as in the RELATION statement).
- Create a new (temporary) secondary index with storage structure TREE or HASH, you decide. (Like the INDEX statement). Both structures are initially empty.
- Now, for each tuple T in RELATION.ENTRYPOINT (RELNUMOLD) do; INSERT (RELNUMNEW, MAKEPROJECTEDTUPLE (T, ATTRIBLISTPTR))
- If RELATION.TEMPORARY? (RELNUMOLD) = true then DESTROY(RELNUMOLD) and put RELNUMOLD on the RELATION available space list.
The secondary index takes care of
- Making sure that the new relation has no duplicates.
- Making insertion relatively inexpensive.
The most obvious MAKEPROJECTEDTUPLE is order (M*N/2) where N = the number of attributes in RELATION (RELNUMOLD) and M = the number of attributes in the Key list. You can do better.
Example.
T1 = PROJECT CREDITS, CNUM FROM COURSE;
C. JOIN
The function
RELNUMNEW = JOIN RELNUM1, RELNUM2 OVER ATTRIBLISTPTR1 = ATTRIBLISTPTR2;
should do the following:
- Check the join attributes for compatible formats.
- Create a new (temporary) relation where attribute names are <attribute names from the first relation> then <attribute names from the second relation, minus the repeated attribute name>
- Create a (temporary) secondary index for the new relations.
- JOIN the two input relations by some efficient algorithm. Hint: if they don’t already exist, create secondary indices on one or both of the input relations where the ORDER is the JOIN attriblist. How does this help?
- For each input relation i= 1 to 2;
If RELATION.TEMPORARY?(RELNUM1) = true
then DESTROY (RELNUMi) and place RELATION(RELNUMi) on the
RELATION available space list.
Example:
if R1(A, B, C) = {<1, 2, 3>, <4, 5, 6>} and
R2(D, E) = {<1, 7>, <1, 8>,<4, 9>}, then
JOIN R1, R2 OVER A = D would result in
TEMP27(A, B, C, E} = {<1, 2, 3, 7>, <1, 2, 3, 8>, <4, 5, 6, 9>}
Example.
J = JOIN COURSE, PREREQ OVER CNUM = CNUM2;
D. SELECT and Qualifiers
The SELECT operation takes a relation and a qualification as arguments and returns a new (temporary) relation as output, composed of all tuples in the original that pass the qualification test. The Qualification is a logical expression composed of AND-ed and OR-ed comparisons where the comparisons are all of the form “attribute relational_operator value”. As usual, AND has higher precedence than OR.
You may wish to represent the qualification as an array or struct or class as follows:
QUALIFICATION ARRAY
AGE < 40 OR AGE < 40 {AND group 1
AND CNUM < CS3000 AND CNUM < CS3000 {AND group 1
OR AGE >= 40 OR AGE >= 40 {AND group 2
AND CNUM >= CS3000 AND CNUM >= CS3000 {AND group 2
AND CNUM < CS5000 AND CNUM < CS5000 {AND group 2
OR AGE = 60 OR AGE = 60 {AND group 3
OR CNUM ~= CS2510 OR CNUM ~= CS2510 {AND group 4
STOP
…
You may wish to write a function TESTTUPLE (TUPLEPTR, QUALIFICATION) which returns “true” or “false” depending on whether the tuple passes the qualification test.
There are a lot of ways to implement the actual SELECTion, ranging from symbolic rearrangement of the qualifier to take optimum advantage of indices, to brute force testing of each tuple in the relation. You are free to implement the selection algorithm any way you wish, but here are some thoughts:
- Since AND has higher precedence than OR, we can consider a qualifier to be composed of one or more AND-groups which are OR-ed together.
- If any AND-group contains only compares with “~=” relational operators, then do a linear scan, accepting tuples that pass the AND-group portion of the test and applying the whole test to any tuples which do not.
- For each AND-group do:
- If the AND group contains any compare with an “= ” Relational operator and the compare attribute has either an TREE or HASH storage structure, then use the index to find matching tuples and apply the whole qualification test to them.
- Set LOWER BOUND and UPPER BOUND = “undefined”. If the AND group contains a compare with “>” or “>=” and an TREE index on the compare attribute, set LOWER BOUND to the compare value. If the AND group contains a compare-with a “<” or “<=” on the same attribute, set UPPER BOUND to the compare value. If LOWER BOUND and UPPER BOUND are both defined and LOWER BOUND > UPPER BOUND, go to next AND group. (E.g., IAGE<50 AND AGE>701 can’t be true of any tuple.) Otherwise, apply the whole qualification test to all the tuples between LOWER BOUND and UPPER BOUND. (If either is undefined, start at the beginning –or–go to the end.)
- Give up examining successive AND-groups and do a linear scan. Since no helpful index exists for the current AND-group, we need to look at every tuple anyway.
If at any point in step 3, the number of tuples inserted into the new relation is greater than say 50% of the number of tuples in the input relation –or –if the number of tuples in the input relation is less than say 25 to start with, then give up and do a linear scan, testing every tuple. (This implies that you associate a cardinality counter with each relation in the RELATION table, and update it inside of INSERT and DELETE.)
Like PROJECT and JOIN, SELECT first creates a new temporary relation in the RELATION table, and a new index, then the SELECT logic inserts the selected tuples into the new relation, and finally the input relation is destroyed if it was a temporary.
Example.
S1 = SELECT PREREQ WHERE CNUM1 = CMPS439 OR CNUM1 > CMPS200
AND CNUM2 != MATH200;
E. DELETE WHERE
For simplicity in Part I, we ignored the “WHERE (qualifier)” clause of the DELETE command. DELETE with the WHERE clause is obviously similar to SELECT in that qualifying tuples are first selected, then deleted. So, add the WHERE clause to DELETE.
One way to implement DELETE WHERE is to mark tuples but not actually delete them. This has the benefit that secondary indices need not be updated on DELETE. Also, much of the SELECT WHERE logic can be used directly. Two drawbacks of this scheme are:
1) storage for deleted list nodes in the relations heap cannot be reclaimed.
2) access algorithms require extra logic to check whether a tuple has been deleted.
A simple way to mark heap nodes is to set the TUPLEPTR field to 0. Storage for the tuple itself can be garbage collected.
Example.
DELETE OFFERING WHERE STARTHOUR != 8:55 AND INSTRUCTOR = BURRIS;
F. Intermediate Results – PRINT and ASSIGN
relexpr ::= PRINT relexpr
relexpr ::= ASSIGN relationname = relexpr
Example:
Tl = PROJECT
SELECT OFFERING WHERE (CNUM < CS5OOO) OVER INSTRUCTOR;
T2= <same example with ‘ASSIGN MYTEMP=’ replacing ‘PRINT’>;
Description. Both PRINT and ASSIGN are identity functions in that each returns its evaluated argument relexpr unchanged. PRINT has the side-effect of printing the relexpr. This allows the user to check intermediate results. In the above example, the PRINT allows the user to verify that he is getting all the INSTRUCTOR’s that he requested with the SELECT.
ASSIGN has the side-effect of saving intermediate results in a user named relation. The new relation name can be used immediately, even within the relexpr being evaluated. This allows the user to compute common sub-expressions only once. Notice that allowing ASSIGN implies that the user can assume that relexpr’s are processed in a particular order (left to right).
This may be an undesirable assumption if SURLY is implemented in an environment that allows process spawning/parallel processing. In some implementations of SUR, the keyword ASSIGN is optional, allowing the compound assignment statement:
Rl = R2 = …= Rn = relexpr;
But since SURLY’s assignment involves re-naming a relation’s descriptor instead of creating multiple descriptors, this statement first creates the temporary relation resulting from evaluating the relexpr, then successively renames it RN …then R2 then Rl (at which point RN thru R2 are undefined). What are the repercussions in SURLY of using a descriptor-copying instead of a descriptor-renaming semantics for relational assignment?
Clearly both the PRINT and ASSIGN relexpr’s are redundant to SURLY’s PRINT command and assignment statement. But both commands are useful and require only a little code to implement.
G. UNION, INTERSECTION, SET-DI.FFERENCE, and COPY –optional
relexpr ::= UNION relexprl AND relexpr2
relexpr ::= INTERSECTION relexprl AND relexpr2
relexpr ::= SET DIFFERENCE relexprl AND relexpr2
relexpr ::= COPY relexpr
Example: Tl=SET_DIFFERENCE OFFERING AND
(T2=SELECT OFFERING WHERE (CNUM < CS5OOO));
T3=UNION T2 AND
SELECT OFFERING
WHERE (CNUM >= CS5000); /* EQUAL?(T1,T3) is TRUE */
T4=INTERSECT Tl AND T2;
T5=COPY Tl;
Description: UNION and SET_DIFFERENCE operate on union compatible relations (where corresponding components of tuples agree in type (CHAR/NUM) and degree (-arity)) to return a new relation. Compare this to INSERT and DELETE which modify their relational arguments. COPY returns a COPY of its relational argument.
H. Predicates EQUAL? arid SUBSET? –optional
relexpr :: EQUAL?(relexprl, relexpr2}
relexpr ::= SUBSET?(relexprl, relexpr2}
Descritpion. Given
RELATION TRUE (TRUTH VAL CHAR 9};
RELATION FALSE(TRUTH-VAL CHAR 9};
INPUT TRUE T; * FALSE F; * END_INPUT;
EQUAL? and SUBSET? take two union compatible arguments and return a temporary relation EQUAL? to (a copy of} TRUE and FALSE. For example,
Tl = SUBSET? (OFFERING, SELECT OFFERING WHERE (COURSE < CS4000}};
PRINT Tl, EQUAL?(Tl, OFFERING};
prints:
T1 |
TRUTH_VAL |
T |
TEMPORARY |
TRUTH_VAL |
F |
I. Pseudo-Relation EMPTY – optional
relexpr ::= EMPTY
Description. Pseudo relation EMPTY is union-compatible with any relation by definition. It is mainly useful in comparing a relation to the empty relation in loops in ESURLY (see below} -e.g. EQUAL?(relexpr,EMPTY} is true if relexpr has cardinality 0. The semantics of EMPTY are:
- if are argument of EQUAL? is EMPTY, EQUAL? returns true iff the other argument is EMPTY or a relation with cardinality 0.
- SUBSET?(relexpr, EMPTY} = True
- SUBSET? (EMPTY, relexpr} = EQUAL?(relexpr, EMPTY}
- if exactly one argument of UNION or if the second argument of SET_DIFFERENCE is EMPTY then return a COPY of the other relexpr
- if both arguments of UNION -or- the first argument of SET DIFFERENCE -or- any argument of JOIN, PROJECT, SELECT, COPY, or PRINT is EMPTY, then return EMPTY
- ‘PRINT EMPTY’ prints EMFTY as the relation name but no table is printed
- ‘ASSIGN relname = EMPTY’ is an error (so INSERT, DELETE, …are undefined on the relation EMPTY}.
- CARDINALITY(RELNUM(‘EMPTY’ }} = 0
J. Executable SURLY (ESURLY)
So far, SURLY is an interpreted language to be read from some file. With a little care, you can write ESURLY, or executable SURLY, which allows a SUR user to write programs containing SUR commands. Consider the following program segment:
CALL ASSIGN (‘OLD EMPLOYEES’,
PROJECT (SELECT (‘EMPLOYEES’, ‘AGE>30 AND AGE<70’)
‘NAME ADDRESS PHONE AGE’));
CALL PRINT (‘OLD_EMPLOYEES’);
Here we are printing the NAME, ADDRESS, PHONE, and AGE of all employees between 30 and 70 years old. Here, the ESURLY commands, JOIN, SELECT and PROJECT are functions which return relation names of temporary relations. The other SURLY commands are subroutines. All arguments are character strings.
The advantage of ESURLY is that now control structures in the host language can be used to control execution of ESURLY statements. Consider the problem of finding all prerequisites of some course, say CS45l0 (transitive closure). This is a sticky problem in SURLY since you don’t know in advance how many JOIN’s will be involved. But in ESURLY you might write:
CALL RELATION (‘RESULT’ , ‘CNUM CHAR 8’);
CALL ASSIGN (‘TEMP’,
PROJECT (SELECT(‘PREREQ’, ‘CNUMl = CS45l0’), CNUM2));
DO WHILE (CARDINALITY (‘TEMP’ ) > 0);
CALL UNION (‘RESULT’, ‘TEMP’);
CALL ASSIGN (‘TEMP’ ,
PROJECT (GIVING(JOIN (‘TEMP’,’PREREQ’,’CNUM2′ ,’CNUM1′ ),
‘CNUMl CNUM2’)
CNUM2));
END;
CALL PRINT (‘RESULT’);
Here UNION takes two (union compatible) relation names as arguments, and copies (using INSERT) all tuples in the second relation into the first. CARDINALITY is a function that takes a relationname as its only argument and returns the number of tuples (the cardinality) currently in the relation.
K. Interactive SURLY
Depending on your host language and its host machine, you may wish to implement interactive SURLY, where the input/output files are both the terminal. Unless your interpreter has good error recovery, be careful typing.
You may wish to make some character (say ‘$’) the break character such that if NEXTSYMBOL reads that character it returns to the start of the interpreter loop. Also you will need to define a log out command (say STOP) to gracefully end the interpreter loop.
L. Example Queries
Set up the sample database as in Part I. Then try the following:
/* TO TEST PROJECTION */
I = PROJECT INTEREST FROM INTERESTS;
PRINT I;
/* TO TEST SELECTION */
OFFl = SELECT OFFERING
WHERE CNUM >= CMPS200 AND
CNUM <= CMPS40O AND
START_HOUR = 9:00 AND
DAYS = MW;
OFF2 = SELECT OFFERING
WHERE INSTRUCTOR = DENEKE OR
INSTRUCTOR = BURRIS AND
START_HOUR = 8:00;
PRINT OFF1, OFF2;
/* TO TEST JOIN */
DEPT_INT = JOIN DEPT, INTERESTS OVER NAME = NAME;
PRINT DEPT_INT;
/* JOINS, PROJECTS, and SELECTS can be nested – they return tables */
/* Try the following queries: */
Find all courses taught by a given instructor.
Find all faculty interested in ‘AI’ or ‘DBMS’.
Find all faculty and the section numbers of the courses they teach.
Find all times between 8 to 5 on MTWR when no faculty teach this semester.
Find all prerequisites for a given course.
III. Extensions to SURLY
The following are extensions of basic SURLY, beyond the scope of Assignments I, II and III. You should read and understand this section but you are not required to implement any of these commands.
A. VIEW
command ::= VIEW viewname = relexpr END_VIEW;
relname ::= relationname | indexname | viewname
The VIEW command allows a user to create a “virtual” relation by specifying how to build the relation without actually building it. The relation is virtual in the sense that it does not actually exist. though it appears to exist to the user. The user does see a difference between real and virtual relations though in that virtual relations cannot be updated – since views don’t actually exist, insertion and deletion are undefined operations on them. Views do have an advantage over explicit relations in that they are always up to date. Consider an EMPLOYEES relation and
VIEW FEMALE_EMPLOYEES = SELECT EMPLOYEES WHERE SEX = FEMALE; END_VIEW;
Now if we insert a few more female employees into EMPLOYEE and PRINT-FEMALE EMPLOYEES; an up to date view of the FEMALE EMPLOYEES is the result. It we had just used the relation assignment statement FEMALE_EMPLOYEES = SELICT EMPLOYEES WHERE SEX = FEMALE; then inserted the new female employees into EMPLOYEES. And then printed FEMALE_EMPLOYEES. the result would not have included the new employees. The binding time of a view is immediate. whereas derived relations (real. non-base relations) may have been bound arbitrarily many operations ago.
The mechanism whereby views are implemented is very simple. and might be called “deferred evaluation”. Rather than evaluating the relexpr when the view is created, the source string representing the relexpr is saved. This string is evaluated whenever the associated viewname appears in a PRINT or a relation assignment statement. The DESTROY command can destroy views.
B. TRIGGER
command ::= TRIGGER relationname ON operation commands END TRIGGER;
operation ::= INSERT | DELETE
Derived relations are efficient in that they are not constantly recreated, but they may easily become out of date if the underlying base relations are modified. Views are up-to-date but are expensive to re-create. The TRIGGER command gives the user the advantages of both derived relations and views. A trigger is a sequence of commands to be invoked whenever some operation on a relation occurs. For instance, given the base relation BUY (buyer, thing-sold. seller, time), and the derived relation OWN (person, thing), a trigger can be defined to automatically update the OWN relation whenever an insertion is made into the BUY relation:
TRIGGER BUY ON INSERT
DELETE OWN WHERE PERSON = $SELLER$ AND THING = $THINGSOLD$;
INSERT OWN $BUYER$ $THINGSOLD$;
END_TRIGGER;
Now if “INSERT BUY JOHN CAR5l MARY 8-3-79;” is executed, the trigger automatically deletes the fact that Mary owns car5l and inserts the fact that now John owns it.
Probably the best way to implement triggers is to store the trigger command body as a string associated with the triggered relation. Modify INSERT and DELETE to check for triggers when inserting/deleting a tuple into a relation. The $attrib$ notation is used in triggers to mean: “replace $attrib$ by the corresponding tuple field before executing the code”. A relation need have only one trigger (of arbitrary complexity) for insertion and one for deletion, so to destroy a trigger, replace it by a null trigger:
TRIGGER BUY ON INSERT ” END TRIGGER.
C. INTEGRITY CONSTRAINT
command ::= INTEGRITY_CONSTRAINT relationname WHERE (qualifier);
The INTEGRITY_CONSTRAINT command associates with a relation a membership test (the qualification) that new tuples must pass before they can be inserted. For example:
INTEGRITY_CONSTRAINT INSTRUCTOR WHERE (IAGE > 17 AND IAGE < 75);
only allows a user to create new instructor tuples when the instructor’s age is in a legal range –tuples like <Thompson 92> should be printed as errors and should not be inserted. To implement the INTEGRITY_CONSTRAINT command, store the qualification with the relation in the RELATION table and change INSERT to check for and evaluate a qualification if it’s present, before inserting the tuple. Note: there may be several integrity statements per relation. If you want to support destroying integrity statements, then add a name field to name each integrity statement.
D. RETRIEVE – a calculus operator for SURLY
command ::= RANGE tuplevar IS relationname;
relexpr ::= RETRIEVE (<rel attrib>) WHERE (qual)
tuplevar ::= identifier
relattrib ::= tuplevar.attrib
qual ::= cf | cf AND qual | cf OR qual
cf ::= relattrib relop value |
relattrib = relattrib
Example: Given relations COURSE, OFFERING, and STAFF from Figure 2:
PRINT Tl =(RETRIEVE (STAFF.RANK,COURSE.TITLE)
WHERE (COURSE.CNUM = OFFERING.CNUM AND
OFFERING.CNUM < CS5OOO AND
OFFERING.INSTRUCTOR = STAFF.NAME)
GIVING (RANK,COURSE_TITLE));
EQUAL? (Tl, PROJECT (JOIN (JOIN COURSE
AND SELECT OFFERING WHERE (CNUM < CS5OO0)
OVER CNUM AND CNUM)
AND STAFF
OVER INSTRUCTOR AND NAME)
OVER (RANK,TITLE)
GIVING (RANK,COURSE_TITLE));
would print
T1 | |
RANK_ | COURSE_TITLE |
… | … |
T2 |
TRUTH_VAL |
T |
Description. Note from the above example, in the qualification,
relattrib = relattrib is a JOIN
relattrib relop value is a SELECT
RETRIEVE(<rel attrib>) is a PROJECT
AND is similar to INTERSECTION
OR is similar to UNION
Before implementing RETRIEVE, read the chapter on query optimization. Providing the user with both an algebra and a calculus query ability is really a more user-oriented so1ution to the usual problem of which sort of query language is best for the user.
An interesting extension to SURLY relating to RETRIEVE might be to translate RETRIEVE into a sequence of SELECT, PROJECT, JOIN, …’s and then allow the user to examine the translation.
E. Hierarchical Formatted PRINT
One problem with relational databases is that a database consists of a flat collection of tables. In hierarchical databases, data is stored in tree form and viewed by the user in tree form. Several well known problems occur when data is stored in trees –the insert/delete/update anomalies, data redundancy, query asymmetry –and these problems provide the motivation behind Codd’s normal forms which require that domains of relations be simple or atomic and not structured (subtrees).[1]
Nevertheless, hierarchies are a very natural way to view some kinds of data. Because hierarchies are a useful way to view data, an extension to SURLY’s PRINT command provides a formatted way to print hierarchies of relations. Figure 5 shows the form of the output for a hierarchical PRINT command given the sample data shown.
command ::= PRINT (<printobj>) FORMAT (<fspec>);
printobj ::= relexpr | hspec | value
hspec ::= relname (<attribspec>)
[WHERE (qualifier)]
[KEY (<attrib [direction]>)]
attribspec ::= attrib | value | INDEX |
attriblist = attriblist OF hspec
fspec ::= A[(length)] | I(length) |
SKIP[(length)] | X[(length)] |
(fspec) | n fspec
The following algorithm approximates how a hierarchical print can be interpreted.
PRINTSPEC (hspec, format of the form(fspec), level initially 1) RECURSIVE;
1) TEMP = relationname from hspec
2) if the WHERE qualifier is present
then TEMP = SELECT relname WHERE (qualifier)
3) if the KEY clause is present
then TEMP = order TEMP by the KEY
4) for each tuple in TEMP (letting INDEX (current hspec level) vary
from 1 to CARDINALITY(TEMP)) do:
CASE next symbol is a
attrib –print the value of attrib (current tuple-id)
value –print the value (strings must be in quotes)
INDEX –print INDEX(level)
attriblist = attriblist2 of hspec2 –a JOIN-like recursive step
PRINTSPEC (Select from relationname2 just those tuples
whose attriblist2 JOIN fields match attriblistl JOIN fields,
FORMAT = next unlimited repeating group,
level -= level + 1)
END PRINTSPEC
Good old recursion! Do you see the SELECT, PROJECT and JOIN-like operations that make up this PRINT?
A note about interpreting FORMATS: Since hspec’s are variable length, formats for them are specified using ‘unlimited repeating groups.’ When an hspec has been printed, the current unlimited repeating group is exited.
INSTRUCTOR | |
NAME | AGE |
Thompson | 30 |
Evans | 30 |
Hughes | 20 |
COURSE OFFERING | |||
INAME | TIME | PLACE | CNUM |
Thompson | MWF8 | A101 | CS4570 |
Thompson | TR9 | A101 | CS5570 |
Evans | TR11 | A103 | CS4510 |
Hughes | TR5 | A102 | CS6050 |
COURSE | |
CNUM | DESCR |
CS4570 | ‘DBMS1’ |
CS5570 | ‘DBMS2’ |
CS4510 | ‘Data Structures’ |
PRINT (“COURSE OFFERING BY AGE BY PROFESSOR
INSTRUCTOR (‘AGE — ’, AGE, NAME = INAME OF
COURSE_OFFERING (INDEX, INAME, CNUM = CNUM OF
COURSE (‘COURSE=’, CNUM, ‘DESCRIPTION=’, DESCR))
WHERE (CNUM < ‘CS6000’)
ORDER (INAME ASCENDING, CNUM DESCENDING))
ORDER (AGE ASCENDING))
FORMAT (A, SKIP, <= title
(SKIP, A, I(3), <= ‘age—‘, age
SKIP, I(3), X(3), A, <= index, instructor’s name
SKIP, X(8) 2 A, X(5), 2 A)))); <= ‘course=’, cnum,
‘description=‘, descr
prints
COURSE OFFERING BY AGE BY PROFESSOR
AGE — 20
1 HUGHES <= no tuples where CNUM < ‘CS6000’
AGE — 30
1 EVANS
COURSE = CS4510 DESCRIPTION = DATA STRUCTURES
2 THOMPSON
COURSE = CS5570 DESCRIPTION = DBMS2
COURSE = CS4570 DESCRIPTION = DBMS1
Figure 4: Sample Data, A Hierarchical, Formatted PRINT Command, and the Corresponding Output
F. LINK
command ::= LINK relationnamel AND relationname2
OVER attriblist1 AND attriblist2
Since some relations will often be joined over join domains, it may speed things up if we maintain an ‘index” which relates tuples in the first relation to the tuples they will join with in the second relation. To this end, implement the LINK command. You will have to modify JOIN to take advantage of this new associative index. Also, be sure to update the LINK when INSERT’s and DELETE’s occur. How does LINK compare with the DBTG SET? Links are non-information bearing or automatic associations (that is, like indices, they are there for efficiency and add no new information).
G. INDEX Revisited
Modify the SURLY index statement to:
INDEX indexname ON relationname
ORDER (attrib [direction] {attrib [direction]})
STORAGE STRUCTURE storage structure;
where
direction ::= ASCENDING | DESCENDING
storagestructure ::= HEAP | TREE | HASH [(tablesize)]
If the storage structure is TREE, the “direction” of ORDER attributes may now be given (the default is ASCENDING); if the storage structure is HASH, the user may specify the size of the primary hash table for this index (or it may be defaulted to, say 25). This will involve dynamic memory management.
H. B-TREE
Replace binary trees by B-trees as SURLY’s TREE storage structure. This change is transparent to the SUR user (physical data independence) except that it may speed up access.
I. LOG, UNDO, DUMP, RESTORE
Implement a LOG of all SURLY transactions (commands) recording what tuples are inserted/deleted for a given command (INSERT’s, DELETE’s, INPUT’s). Try to figure out how to implement an UNDO statement which has the effect of restoring a SUR database to a previous state. For example:
DELETE STAFF WHERE RANK = GTA;
PRINT STAFF;
UNDO;
Here, all GTA tuples are deleted from the STAFF relation, then the relation is printed, and finally the relation is restored to its previous state. UNDO should optionally print the transaction being “undone”; and the user should be able to call it several times in a row (UNDO; UNDO; …). , .
Periodically dump/export the data base (RELATION, INDEX, NAME_SPACE, LIST_SPACE, and any system variables) along with the current log. A new log should be started. A measure of how often to dump the database might be in terms of the number of operations (INSERT’s and DELETE’s) since the last dump. Implement this second method by allowing the database user to use the “DUMP [(number-of-operations)];” command (DUMP aka EXPORT).
If “number-of-operations” is not present, the database is dumped immediately. If “number-of-operations” is present, then dump the database automatically after the specified number of operations. Be sure to print a message telling, the user that a DATABASE DUMP has occurred (is occurring). Also implement a restore operation which reads a database dump file and restores the database to its state at the time the dump occurred.
The DUMP/EXPORT and RESTORE/IMPORT operations should be useful to anyone wishing to save the current state of the world at one point, then return to it at some later point (like after lunch, for an interactive SURLY user).
J. READ/WRITE
command ::= READ filename;
command ::= WRITE filename <relation>;
filename ::= identifier implementation dependent
Description. READ reads a file formatted as a sequence of SURLY commands. WRITE writes a set of <RELATION descriptor, INDEX descriptor(s), INPUT-tuple descriptor> onto the specified file in a form that can be read by READ.
If DUMP is implemented by READ and RESTORE by WRITE (instead of saving NAME_SPACE and POINTER_SPACE), then a user can gain two benefits:
- A DUMP can clear NAME_SPACE, which can become clogged with strings from relations not in use.- So another measure of when. to DUMP is when NAME_SPACE is getting full.
- If DELETE WHERE works by marking deleted tuples, then this wasted space can be reclaimed if POINTER_SPACE is re-initialized after a DUMP.
[1] Note: XPATH and XQUERY are modern query languages for querying tree-structured XML data.
Answer:
Introduction
The work is based on the single user relational DBMS which is set for the database setup which includes the host languages for the Java, C++ and the LISP on the personal computer. The setup is based on the commands for the utility purpose as well as SURLY designing the implementation in a modularised manner.
SUR
The work is based on the data which could be recognized in the form of a long term asset where the arrangements are based on the data availability and the control in redundancy, with proper documentation of the database management system. (Harrington et al., 2016). There have been other smaller number of the data objects where the entity relationship is set to hold the unified modelling language with a proper relational query setting. The purpose is based on the design which is for facilitating the host language in the DBMS environment. The larger relations are set to take hold of the facility and then allow the insertion process with printing and deleting the tuples. It offers the standards with the NAME SPACE along with handling the section covers for the storage structure, TREE and HASH to query in an efficient manner.
Commands
The standards are for the INDEX which is for creating the secondary indices for the existing relations. It is based on integrity of the index where the users are not able to work on updating the secondary indices where the relations are set for the secondary index. (Connell et al., 2016). The RELATION enters into the form with additional relations that set for the CATALOG table. The relations are set to order the attributes with the positional ordering of the values. The character strings are longer than the length characters with truncates at the right. The INSERT is based on new tuple to the relation name where the relations are set for the tuple values with the corresponding attribute list which is set by accounting in the tuple and the error message is generated. (McMinn et al., 2016).
Conclusion
The relational database works on the flat table collection where the data is set in the form of tree. It is mainly for the users in the tree form where the data is stored in the form of trees with the insertion, deletion and the update on the anomalies, data redundancy and the query asymmetry. The relations are set for the simple or the atomic structure.
Reference
Harrington, Jan L. Relational database design and implementation. Morgan Kaufmann, 2016.
Connell, Kevin Patrick, Andrew Mark Winterbauer, Kari Danielle Wittgens, and Jason Alex Haley. “Maintaining a relational database and its schema in response to a stream of XML messages based on one or more arbitrary and evolving XML schemas.” U.S. Patent 9,361,398, issued June 7, 2016.
McMinn, Phil, C. J. Wright, C. Kinneer, C. J. McCurdy, M. Camara, and G. M. Kapfhammer. “SchemaAnalyst: Search-based test data generation for relational database schemas.” In Proceedings of the 32nd International Conference on Software Maintenance and Evolution. 2016.