QUESTION
EEE2025 Data Structures and Computer Algorithms:
2011/2012 Assignment
Data storage and retrieval: a comparison of hashing and
directed search of sorted data
This assignment involves writing C programs and functions which will store
lists of integers in (a) a hash table and (b) a sorted array, and then will find
given values within either the hash table or the sorted array. An evaluation
and comparison of the run-time and complexity of these algorithms will be
performed. The particular algorithms chosen for study are:
• Hashing
• Insertion sort
• Binary search
You should produce 2 main programs: hash.c and sorted.c. One program will
implement the storage and retrieval using hashing. The other will use insertion
sort to put the data sorted into an array, and then binary search to find data.
Their functionality is defined as follows:
Command line parameters
Both of your programs should read data from a file specified on the
command line with parameter “-i”. This file will contain a list of integers
to be stored, with one integer per line, as in the example file shown
later, input.txt. Both programs should read another set of integers from
a similar file specified on the command line with parameter “-r”; these
integers are values that should be retrieved from the data-store. In the
case of hash.c, the program should take an input parameter, “-s”,
specifying the size of the hash table.
Data storage
The program hash.c should initialise a hash table that uses quadratic
probing for collision handling. An example hash table definition is given
later. You may use this in your program. It should then store the
integers read in from the input file in the hash table using the hash
function: f(k) = k mod TableSize and quadratic probing. It is not
necessary for you to implement rehashing.
The program sorted.c should store the integers in an array in the order
in which they are read from the file, and then sort the array using the
insertion sort algorithm.
Data retrieval
[1/ 5 ]
The program hash.c should perform a find operation against the hash
table for each item read in from the file containing values to be
retrieved. Again, it should use quadratic probing to find items where
there might have been a collision.
The program sorted.c should use a binary search algorithm against the
data in the sorted array for each item read in from the file containing
values to be retrieved.
The programs should be structured so that they can produce statistics on the
run-time of functions for storing and retrieving data. The programs should print
out results which consists of a summary header, a main body of results if
debug mode is turned on, and a summary footer. The header should provide
details of the input parameters – input filename and retrieval filename, number
of items stored in the list, number of items searched for and number found,
and which method of storage and retrieval has been used (i.e. hashing or
sorting).
The program should have a debug mode in which the storage details (i.e. for
hashing, the details of collisions that have occurred, and for insertion sort,
number of comparisons and swaps performed) and retrieval results (i.e. which
items were found and which not found) are printed. The summary footer
should give details of run-time for both time to store the data and time to
retrieve the data. In the case of hashing it should specify as a percentage how
full the hash table is.
A typical output from a single run of the programme should look something
like:
Data storage and retrieval: a comparison of hashing and
directed search of sorted data
================================
Storage Method: Hashing
Input data loaded from file input.txt.
Retrieval data loaded from file find.txt
Number of items stored in list: 500
Number of items searched: 300
Number of items found: 280
Debug Output:
Storage details:
Collision occurred saving item with value 38 at hash table location 08
Collision occurred saving item with value 38 at hash table location 09
Retrieval details:
[2/ 5 ]
Value 38 found in table
Value 45 not found in table
Execution times:
Time to store data: 1 ms
Time to retrieve data: 2 ms
Hash table is 50% full.
================================
The final objective of the exercise is to explore and comment on the relative
computational complexity of each algorithm. As a minimum you should
produce a plot that has a curve for each storage method and shows the
growth of execution time versus list length for data storage and growth of
execution time versus list length for data retrieval. You should do this for three
types of data: random, sorted ascending and sorted descending. You should
comment on your measurements and make some considered observations
about the relative advantages of each method.
For this exercise you will need to generate files containing lists of random
numbers to be stored. The function drand48 should be a suitable random
number generator. To create input data that is already sorted, you could use a
spreadsheet, e.g. Excel or OpenOffice to sort your random data.
You will also have to consider how to measure the execution time of a
function. To do this accurately is not as simple as it first appears. C has a set
of functions contained in <timer.h> to measure elapsed CPU time. The
function “clock” can be used as shown at
http://www.aquaphoenix.com/ref/gnu_c_library/libc_291.html – SEC291.
However, this only has a resolution of 10 milliseconds and for small
sequences the execution time may be less than 10 milliseconds. In such
cases you will need to structure your code so that the same operation is
performed many times (1,000 or more?) and the total time is measured. The
time for a single execution will then have to be derived from knowledge of the
total time and the number of repetitions.
All programs should be clearly structured with good indentation to indicate the
structure. Aim to make all code largely self-documenting. Comments should
indicate the purpose and means of implementation of each program section.
Procedures should be kept self-contained with well-defined interfaces to other
procedures. Each procedure should contain a commented header explaining:
• Purpose of the function
• Description of all input and output parameter
• List of all functions which are called by the function
[3/ 5 ]
• Revision number, date and name of author
Example type definition for hash table:
struct HshTble;
typedef struct HshTble *HashTable;
enum EntryStatus { Occupied, Empty, Deleted };
struct HashEntry
{
int Element;
enum EntryStatus Info;
};
typedef struct HashEntry Entry;
struct HshTble
{
int TableSize;
Entry *Array;
}
Assessment
When trying to solve a problem using a computer the major effort should be
directed at design of the algorithm and choice of the best data structures.
Actual coding is only a small part of the exercise. The deliverables for the
project will be:
A short written report which details:
The main design of the programs. List the main functions and describe their
purpose. Indicate the relationship between functions using a structure
diagram that shows the invocation hierarchy.
1. A section presenting the data on the execution times of the
algorithms and commenting on the accuracy of the results. It should
also include a statement of the big-oh complexity of the various
methods.
The report should not exceed 5 pages in length. Any pages beyond 5 will
be ignored for assessment purposes. The report should be handed into the
Undergraduate Office.
A C source code program:
This should be submitted in two ways:
[4/ 5 ]
1. As a paper listing which should be appended to the written report.
Such a listing can be printed off using the Unix a2ps command.
2. The program must be capable of being compiled on Penguin EE
machines. If students develop code elsewhere then they need to
ensure it meets this requirement prior to submitting it.
Program will involve an assessment of the code and the reported results.
The assignment must send in the report and submit the code prior to the
deadline.
* Source code will be compared to check that it is original code and has not
been written in collaboration with other sources.
* Remember the steps in problem solving using computers: defining the
problem, outlining a solution, developing an algorithm, programming the
algorithm and testing the program.
* make sure that your algorithm is correct, and familiarise yourself with its
behaviour, before you start to code it.
* Design your program on paper before starting to work on the computer. Only
with considerable experience is it possible to hand-craft code directly into the
editor, and even then the probability of success is considerably diminished.
* If you find you are running inordinate numbers of programs and not
converging to the result that you expect then stop, go away and think about
the problem. Re-running a failing program will not change its results – only
listing, understanding what you have told the computer to do and then
correcting it will make a difference!
* In the early stages of testing include extra diagnostic statements in your
code so that you can confirm in the initial stages that your algorithm is working
as expected.
[5/ 5 ]
SOLUTION
Hashing
Hashing is a process to shorten a string against a string value. It is a table with two data values.
The bigger value is located by small value. For example a phone database with phone number and
Persons’s name.
Some people may hash function. Here I am using a value supplied by the user to identify the bigger
string.
123 swapan
342 debnath
563 sourav
101 vivek
323 jamil
So 123 is the id you have to remember to invoke a person swapan.
So we take input code and name and storing them in a test.txt data file. This operation is being done by
Hash.cpp
Initially I have reserved a space for 5 person. If you need big size change the program array.
Run hash,exe to create data in test.txt.
The sorted.cpp reads the test.txt and sorting it. When you supply a no. it is searching the existence
Of that no. in the array.
Better I say if you give 101 you will get vivek
The hash.cpp
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
void main(void)
{
int code1;
int c, first, last, middle, m, n, search, x[5], norec, i;
int room;
char name[40];
FILE* data_in;
clrscr();
// the no you have to supply. Keep maximum 5 unless you increase array.
printf(“Enter No. of Record to Put\n”);
scanf(“%d”,&norec);
printf(“Total Rec %d\n”, norec);
if (norec > 0)
{
data_in =fopen(“test.txt”, “w”);
for (i=1; i<= norec; i++)
{
// reading and transferring into file.
printf(“Enter Code & Name\n “);
fscanf(stdin, “%d %20s”, &code1, name);
fprintf(data_in , “%d %s\n” , code1, name );
}
}
fclose (data_in);
}
The sorted.cpp
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <string.h>
void main(void)
{
int code1;
int c, first, last, middle, m, n, search, x[5], norec, i;
int room, k, d, swap;
char *aname[5][40], *swapname;
FILE* data_in;
clrscr();
m=0;
data_in =fopen(“test.txt”, “r”);
while (!feof(data_in)){
fscanf(data_in , “%d %s\n” , &x[m], &aname[m][40] );
printf(“%d %s\n”, x[m], aname[m][40]);
m++;
}
fclose (data_in);
printf(“Total Rec %d\n”, m);
// insertion sort is performing here
for ( c = 1 ; c <= m – 1 ; c++ )
{
for ( d = 0 ; d <= c – 1 ; d++ )
{
if ( x[c] < x[d] )
{
swap = x[d];
x[d] = x[c];
strcpy(swapname,aname[d][40]);
strcpy(aname[d][40],aname[c][40]);
for ( k = c ; k > d ; k– )
{ x[k] = x[k-1];
strcpy(aname[k][40], aname[k-1][40]);
}
x[k+1] = swap;
strcpy(aname[k+1][40], swapname);
}
}
}
for (i=0;i<m;i++)
{printf(“%d %s\n”, x[i], &aname[i][40] );
}
printf(“Enter value to find\n”);
scanf(“%d”,&search);
first = 0;
last = m – 1;
middle = (first+last)/2;
while( first <= last )
{
if ( x[middle] < search )
first = middle + 1;
else if ( x[middle] == search )
{
printf(“%d hashvalue found at location %d against %s.\n”, search, middle+1, &aname[middle][40] );
break;
}
else
last = middle – 1;
middle = (first + last)/2;
}
if ( first > last )
printf(“Not found! %d is not present in the list.\n”, search);
}
JC65
But you can order it from our service and receive complete high-quality custom paper. Our service offers “Technology” essay sample that was written by professional writer. If you like one, you have an opportunity to buy a similar paper. Any of the academic papers will be written from scratch, according to all customers’ specifications, expectations and highest standards.”