Description
Overview
A relational database is the most common
means of data storage and retrieval in the
modern age. A relational database model
stores data into one or more tables, where
each table has columns and rows. The
columns represent a value that can be
attributed to an entry (such as “color”,
“name”, or “ID”) and the rows represent
individual entries or records (such as
“Paoletti”, “my cat”, or “BBB 1695”). You
may find it helpful to think about rows as
objects and columns as descriptors for
those objects. For instance, the table
pictured to the right has each row
corresponding to a car (a data type), and the columns group a car’s vendors, model, etc. such that this
information can be easily retrieved. Rows in relational databases are also called records or tuples, but
in this project we will use the terminology “row” for consistency. Rows in Project 3 are not guaranteed
to be unique.
However, a database can do much more than simply store information. You must be able to retrieve
specific information in an efficient manner. For relational databases, the structured query language
(SQL) is a defined method of retrieving information programmatically. It looks like real English, where a
valid command for the above table could be “SELECT Model from Cars marketplace where Price
< 30000”, which would return a list of models [Corvette, Corvette, Malibu, Malibu, Malibu]. Note that
the version of SQL used in this project is simplified and modified somewhat from a typical query
language.
For this project, you will write a program to emulate a basic relational database with an interface based
on a subset of a standard query language. Your executable will be called silly. You will gain
experience writing code that makes use of multiple interacting data structures. The design portion of
this project is significant; spend plenty of time thinking about what data structures you will use and how
they will interact.
Version 2020-10-25 1
Current version by: Cris Noujaim, Shahab Tajik, Ankit Shah, and David Paoletti
Originally composed by: David Snider and Genevieve Flaspohler
Special thanks to Waleed Khan
Table of Contents
Overview 1
Table of Contents 2
Learning Goals 3
Project Specification 3
Program Arguments 3
User Commands 3
Table Manipulation Commands 4
CREATE – add a new table to the database 4
INSERT INTO – insert new data into table 4
DELETE FROM – delete specific data from table 5
GENERATE INDEX – create a search index on the specified column 6
PRINT – print specified rows 7
JOIN – join two tables and print result 8
REMOVE – remove existing table from the database 10
QUIT – exit the program 10
# – comment / no operation (useful for adding comments to command files) 10
Error Checking 11
Testing and Debugging 11
Submission to the autograder 12
Libraries and Restrictions 13
Grading 13
Checkpoint 13
Test Cases 14
Appendix A: Example from the Spec 16
Appendix B: Variant Types and You 17
Appendix C: Project Tips, Tricks, and Things to Avoid 18
2 EECS 281
Learning Goals
● Selecting appropriate data structures for a given problem. Now that you know how to use
various abstract data types, it’s important that you are able to evaluate which will be the optimal
for a set of tasks.
● Evaluating the runtime and storage tradeoffs for storing and accessing data contained in
multiple data structures.
● Designing a range of algorithms to handle different situations
● Evaluating different representations of the same data.
Project Specification
Program Arguments
silly should accept the following (both optional) command line arguments:
● -h, –help
This causes the program to print a helpful message about how to use the program and then
immediately exit(0).
● -q, –quiet
This causes the program to run in quiet mode. In quiet mode, your program will run very
similarly to normal, except that it will print only numerical summaries of the rows affected by the
command, not any of the actual data. Quiet mode exists so that we may stress test your
program without overloading the autograder with too much output. This flag only affects the
JOIN and PRINT commands and specific instructions for quiet mode output is given for these
commands. Otherwise, if there is no mention of quiet mode with respect to a given piece of
output, you may assume it is always printed. You can implement this feature last; with a
well-built project, adding this functionality should be very simple.
You may find it simpler to check for these flags directly rather than use getopt.
User Commands
On startup, your program should prepare to accept commands from the user. These commands may or
may not take the form of redirected input. Your program should print “% ” as a prompt to the user
before reading each command. Commands are specified by a command keyword followed by a
required space character and then the command-specific arguments where applicable. For example, to
delete the rows from table1 where the entries in column name equal “John”, one would input:
DELETE FROM table1 WHERE name = John. Appendix A contains an example program illustrating the
use of these commands.
3 EECS 281
Table Manipulation Commands
In the following commands, output examples are given. These examples are cumulative throughout the
table manipulation command part of the spec (i.e. the table created in the example output for the
CREATE command is the same table that DELETE is performed on during the delete command, and
so on, so you can follow the state of the table throughout the spec).
CREATE – add a new table to the database
Syntax: CREATE … … Creates a new table with columns (where N > 0). Each column contains data of type and is accessed with the name . Table names and column names are guaranteed to be
space-free. No two columns in the same table will have the same name (you do not need to check).
Valid data types for coltype are {double, int, bool, string}. This table is initially empty.
Output: Print the following on a single line followed by a newline:
New table with column(s) … created
Possible errors:
1. A table named already exists in the database
Given the following as input:
CREATE 281class 3 string string bool emotion person Y/N
The output should be:
New table 281class with column(s) emotion person Y/N created
INSERT INTO – insert new data into table
Syntax:
INSERT INTO ROWS
…
…
…
…
Inserts new rows (where N > 0) into the table specified by . The number of values in
each line after the first, or M in the example, is guaranteed to be equal to the number of columns in the
table. The first value, , should be inserted into the first column of the table in the first
inserted row, the second value, , into the second column of the table in the first inserted row,
and so on. Additionally, the types of the values are guaranteed to be the same as the types of the
4 EECS 281
columns they are inserted into. For example, if the second column of the table contains integers,
is guaranteed to be an int. Further, string items are guaranteed to be a single string of
whitespace delimited characters (i.e. “foo bar” is invalid, but “foo_bar” is acceptable).
Output: Print the message shown below, followed by a newline, where is the number of rows
inserted, is the index of the first row added in the table, and is the index of the last
row added to the table, 0 based. So, if there were K rows in the table prior to insertion, = K,
and = K + N – 1.
Added rows to from position to
Possible errors:
1. is not the name of a table in the database
Given the following as input:
INSERT INTO 281class 8 ROWS
happy Darden true
stressed students false
busy office_hours true
stressed students true
stressed Paoletti true
happy Darden true
happy Sith true
victorious Sith true
The output should be:
Added 8 rows to 281class from position 0 to 7
DELETE FROM – delete specific data from table
Syntax: DELETE FROM WHERE
Deletes all rows from the table specified by where the value of the entry in satisfies the operation with the given value . You can assume that will always
be of the same type as . For example, to delete all rows from table1 where the entries in
column name equal “John”, the command would be: DELETE FROM table1 WHERE name = John. Or,
to delete all rows from tableSmall where the entries in column size are greater than 15, the
command would be: DELETE FROM tableSmall WHERE size > 15. For simplicity, is strictly
limited to the set { <, > , = }.
Output (with example): Print the number of rows deleted from the table as shown below, followed by a
newline:
Deleted rows from 5
Possible errors:
1. is not the name of a table in the database.
2. is not the name of a column in the table specified by Given the following as input:
DELETE FROM 281class WHERE person = Darden
The output should be:
Deleted 2 rows from 281class
The search is case sensitive (which makes it easier to code): if we had deleted WHERE person =
darden, no rows would have been removed.
GENERATE INDEX – create a search index on the specified column
Syntax: GENERATE FOR INDEX ON Directs the program to create an index of the type on the column in the table
, where is strictly limited to the set {hash, bst}, denoting a hash table
index and a binary search tree index respectively. Given the hash on column
, the program should create a hash table that allows a row in the table to be found rapidly
given a particular value in the column . Given the bst on column ,
the program should create an binary search tree that allows rows in the table to be found rapidly given
a particular value in the column . Only one user-generated Index may exist per table, at
any one time. If an index is requested on a table that already has one, discard the old index before
building the new one.
When bst is the specified index type, you should make use of a std::map<>; when hash is the
specified index type, you should utilize a std::unordered_map<>. It is acceptable for both types to
exist at the same time, but only one (at most) should be in use at any given time (i.e. contain data).
An index is a tool used in databases to speed up future commands. A hash index creates a hash table,
which associates values in a specified column with a collection of rows in the table for which the index
was created. Similarly, a bst index creates a binary search tree which associates values in a specified
column with a collection of rows. Both are useful for different types of commands. In order to get the
correct output in all cases, you must use an index when it is appropriate. Further, you must remember
to update your indices upon edits to the table. bst indices are ordered by operator< for the type in
the specified column.
Output: Print the message shown below, followed by a newline
Created index for table on column Possible errors:
6 EECS 281
1. is not the name of a table in the database.
2. is not the name of a column in the table specified by Given the following as input:
GENERATE FOR 281class hash INDEX ON emotion
The output should be:
Created hash index for table 281class on column emotion
Note that in this example, the user program should now have created an index of type hash table that
maps from specific emotions of type string to rows in the table 281class,allowing the program to
search for all rows where emotion = happy quickly, among other things.
PRINT – print specified rows
Syntax: PRINT FROM … [WHERE | ALL ]
Directs the program to print the columns specified by , , …
from some/all rows in . If there is no condition (i.e. statement is of the
form PRINT … ALL), the matching columns from all rows of the table are printed. If there is a
condition (i.e. statement is of the form PRINT … WHERE ), only rows,
whose value pass the condition, are printed. The rules for the conditional portion are the
same as for the DELETE FROM statement. It is not guaranteed that the columns in the command are
listed in the same order as they exist in the table, nor is it guaranteed that all columns will be listed.
The table must be searched as follows to ensure compatibility with the autograder:
1. If no index exists or there is a hash index on the conditional column, the results should be
printed in order of insertion into the table.
2. If a bst index exists on the conditional column, the results should be printed in the order in the
BST (least item to greatest item for std::map<> constructed with the default std::less<>
operator), with ties broken by order of insertion into the table.
Output : Print the names of the specified columns, followed by the values of each of the specified
columns in each row, separated by space. Every line should be followed by a newline.
… …
…
…
Once all the data has been printed, print the following, followed by a newline, where is the number
of rows printed:
7 EECS 281
Printed matching rows from In quiet mode, do not print the s or any of the values. Print only the following:
Printed matching rows from Possible errors:
1. is not the name of a table in the database.
2. is not the name of a column in the table specified by 3. One (or more ) of the s are not the name of a column in the table specified by
(only print the name of the first such column encountered)
Given the following as input:
PRINT FROM 281class 2 person emotion WHERE Y/N = true
The output should be:
person emotion
office_hours busy
students stressed
Paoletti stressed
Sith happy
Sith victorious
Printed 5 matching rows from 281class
Or in quiet mode:
Printed 5 matching rows from 281class
JOIN – join two tables and print result
Syntax: JOIN AND WHERE = AND PRINT
<1|2> <1|2> … <1|2>
Directs the program to print the data in columns, specified by ,
, … . The s will be the names of
columns in either the first table or the second table , as specified by the
<1/2> argument directly following each .
The JOIN command is unique in that it accesses data from multiple tables. The rules for the conditional
portion are the same as for the DELETE FROM statement except that for the JOIN command, will
be strictly limited to {=} for simplicity. It is not guaranteed that the columns are listed in the same
order as they exist in the table, nor is it guaranteed that all columns will be listed.
The JOIN must be accomplished as follows in order to insure compatibility with the autograder:
1. Iterate through the first table from beginning to end
8 EECS 281
2. For each row’s respective value in , find matching values in , if any exist
3. For each match found, print the column values in the matching rows in the order specified by
the JOIN command
4. The matching rows from the second table must be selected in the order of insertion into that
table.
5. If no rows in the second table match a row in the first table, that row is ignored from the join.
Output : Print the names of the specified columns, followed by the values of each of the specified
columns in each row, separated by space. Every line should be followed by a newline.
… …
…
…
Once all the data has been printed, print the following, followed by a newline, where N is the number of
rows printed
Printed rows from joining to In quiet mode, do not print the s or any of the values. Print only the following:
Printed rows from joining to Possible errors:
1. is not the name of a table in the database.
2. One (or more ) of the s or s are not the name of a column in the
table specified by (only print the name of the first such column encountered)
Given the following as input:
CREATE pets 3 string bool bool Name likes_cats? likes_dogs?
INSERT INTO pets 2 ROWS
Sith true true
Paoletti true false
JOIN pets AND 281class WHERE Name = person AND PRINT 3 Name 1 emotion 2 likes_dogs? 1
The join specific output should be (Note: the CREATE and INSERT INTO commands will generate their
own output, but for simplicity, they are not included in this example):
Name emotion likes_dogs?
Sith happy true
Sith victorious true
Paoletti stressed false
Printed 3 rows from joining pets to 281class
Or in quiet mode:
Printed 3 rows from joining pets to 281class
9 EECS 281
Note that the JOIN is case sensitive and does not create a new table.
REMOVE – remove existing table from the database
Syntax:
REMOVE Removes the table specified by and all associated data from the database, including any
created index.
Output: Print a confirmation of table deletion, followed by a newline, as follows:
Table deleted
Possible errors:
1. is not the name of a table in the database
Given the following as input:
REMOVE pets
REMOVE 281class
The output should be:
Table pets deleted
Table 281class deleted
QUIT – exit the program
Syntax:
QUIT
Cleans up all internal data (i.e. no memory leaks) and exits the program. Note that the program must exit
with a return 0 from main().
Output: Print a goodbye message, followed by a newline.
Thanks for being silly!
Possible errors: None, except for lacking a QUIT command. Every interactive session or redirected input
file should end with a QUIT command.
# – comment / no operation (useful for adding comments to command files)
10 EECS 281
Syntax:
# Any text on a line that begins with # is ignored
Discard any lines beginning with #. This command does not produce any output nor can it generate any
errors.
Error Checking
Except for the errors specifically noted above and below, we will not test your error handling. However,
we recommend that you implement a robust error handling and reporting mechanism to make your own
testing and debugging easier. Normally, you would print error messages to the standard error stream
(stderr via cerr), rather than cout. For this project, however, you must print the specified error
messages to stdout via cout so that they may be tested in conjunction with the rest of the
project. DO NOT exit() when one of these errors occurs, display the error and keep running.
As stated in the Table Manipulation Commands section, there are a few specific errors that your code
must check for and print the appropriate output. They are as follows:
Error 1: A table named already exists in the database
Output: Error: Cannot create already existing table Error 2: is not the name of a table in the database
Output: Error: does not name a table in the database
Error 3: is not the name of a column in the table specified by Output: Error: does not name a column in Error 4: Unrecognized first letter of command (i.e. not one of CREATE, PRINT, REMOVE, #, etc)
Output: Error: unrecognized command
For all input errors, you should print the matching response, with the braced variables replaced with the
items from the offending command and followed by a newline, clear the rest of the command input, and
reprompt the user (“% “) as usual. For most commands, clearing the rest of the input line should
suffice. If there is an error in the insert command, however, the program should clear as many lines as
there are rows to be inserted so that the command is fully flushed. Do not terminate the program.
Other than the errors noted above, commands will be well formed. For simplicity, this also holds true
when clearing away an erroneous command.
Testing and Debugging
A major part of this project is to prepare a suite of test files that will help expose defects in your
program. Each test file should be a text file containing a series of table manipulation commands to run.
Your test files will be run against several buggy project solutions. If your test file causes the correct
program and an incorrect program to produce different output, the test file is said to expose that bug.
11 EECS 281
Test files should contain valid SillyQL commands, and should be named
test-n-table-commands.txt, where 0 < n ≤ 15. Make sure that every test file ends in a QUIT
command.
Your test files may contain no more than 35 lines in any one file. You may submit up to 15 test files
(though it is possible to get full credit with fewer test files). Note that the tests the autograder runs on
your solution are NOT limited to 35 lines in a file; your solution should not impose any size limits (as
long as sufficient system memory is available).
Submitting to the autograder
Do all of your work (with all needed files, as well as test files) in some directory other than your home
directory. This will be your “submit directory”. Before you turn in your code, be sure that:
● Every source code and header file contains the following project identifier in a comment at the
top of the file:
● // Project Identifier: C0F4DFE8B340D81183C208F70F9D2D797908754D
● The Makefile must also have this identifier (in the first TODO block).
● You have deleted all .o files and your executable(s). Your Makefile should include a rule or
rules that cause make clean to accomplish this.
● Your makefile is called Makefile. To confirm that your Makefile is behaving appropriately, check
that “make -R -r” builds your code without compiler errors and generates an executable file
called silly. (Note that the command line options -R and -r disable automatic build rules,
which will not work on the autograder).
● Your Makefile specifies that you are compiling with the gcc optimization option -O3 (This is the
letter “O,” not the number “0”). This is extremely important for getting all of the performance
points, as -O3 can speed up code by an order of magnitude.
● Your test files have names of the form test-n-table-commands.txt, and no other project
file names begin with “test.” Up to 15 test files may be submitted.
● The total size of your program and test files does not exceed 2MB.
● You don’t have any unneeded files in your submit directory.
● Your code compiles and runs correctly using version 6.2.0 of the g++ compiler. This is available
on the CAEN Linux systems (that you can access via login.engin.umich.edu). Even if everything
seems to work on another operating system or with different versions of gcc, the course staff will
not support anything other than gcc 6.2.0 running on Linux. To compile with g++ version 6.2.0
on CAEN you must put the following at the top of your Makefile (or use our provided
Makefile):
PATH := /usr/um/gcc-6.2.0/bin:$(PATH)
LD_LIBRARY_PATH := /usr/um/gcc-6.2.0/lib64
LD_RUN_PATH := /usr/um/gcc-6.2.0/lib64
Turn in the following files:
● All of your .h and .cpp files for the project
● Your Makefile
● Your test files
12 EECS 281
You must prepare a compressed tar archive (.tar.gz file) of all of your files to submit to the autograder.
One way to do this is to have all of your files for submission (and nothing else) in one directory. Go into
this directory and run this command:
% dos2unix *; tar cvzf submit.tar.gz *.cpp *.h Makefile test*.txt
to prepare a suitable file in your working directory.
Submit your project files directly to either of the two autograders at: https://g281-1.eecs.umich.edu or
https://g281-2.eecs.umich.edu. Note that when the autograders are turned on and accepting
submissions, there will be an announcement on Piazza. The auto graders are identical and your daily
submission limit will be shared (and kept track of) between them. You may submit up to two times per
calendar day with autograder feedback. For this purpose, days begin and end at midnight (Ann Arbor
local time). We will count only your best submission for your grade. If you would instead like us to
use your LAST submission, see the autograder FAQ page, or use this form.
We strongly recommend that you use some form of revision control (ie: SVN, GIT, etc) and that you
“commit” your files every time you upload to the autograder so that you can always retrieve an older
version of the code as needed. Please refer to your discussion slides and CTools regarding the use of
version control.
Please make sure that you read all messages shown at the top section of your autograder results!
These messages often help explain some of the issues you are having (such as losing points for having
a bad Makefile or why you are segfaulting). Also be sure to note if the autograder shows that one of
your own test files exposes a bug in your solution (at the bottom).
Libraries and Restrictions
The use of the C/C++ standard libraries is highly encouraged for this project, especially functions in the
header and container data structures. The smart pointer facilities, and thread/atomics
libraries are prohibited. As always, the use of libraries not included in the C/C++ standard libraries is
forbidden.
Grading
● 80 points — Your grade will be derived from correctness and performance (runtime). Details will
be determined by the autograder.
● 10 points — Test file coverage (effectiveness at exposing buggy solutions).
● 10 points — No memory leaks. Make sure to run your code under valgrind before each submit.
(This is also a good idea because it will let you know if you have undefined behavior, which will
cause your code to crash on the autograder.)
When you start submitting test files to the autograder, it will tell you (in the section called “Scoring
student test files”) how many bugs exist, the number needed to start earning points, and the number
needed for full points. It will also tell you how many are needed to start earning an extra submit/day!
13 EECS 281
Checkpoint
This project has a few test cases named CP* that represent a checkpoint. The checkpoint involves
implementing base functionality for several commands, and you should have it completed within 10
days of receiving the project specification (less in the Spring; about halfway between receiving this
document and the final due date).
The checkpoint test cases will only be testing a subset of the functionality of your code. The
functionality we will be testing are listed as follows:
● # (comment) Used in all 3 checkpoints
● CREATE Used in all 3 checkpoints
● INSERT INTO Used in checkpoints 2 and 3
● PRINT FROM … ALL Used in checkpoints 2 and 3
● REMOVE Used in all 3 checkpoints
● QUIT Used in all 3 checkpoints
For example, checkpoint 1 has a comment, a few create and remove commands, and quits. None of
the commands produce errors.
Test Cases
All test cases on the autograder test the QUIT command, because all valid input files must end with it.
Many include comment(s) so that we know what the test case is doing. Most of the test case names on
the autograder are fairly self-explanatory, but here’s a guide:
Appendix: The test case from Appendix A (see below)
CP*: Checkpoint cases (see above)
CREATE*: Primarily tests the CREATE command, but also PRINT and REMOVE
INSERT*: Primarily tests INSERT, but also needs CREATE, PRINT, and REMOVE
DELETE*: Primarily DELETE; also needs CREATE, INSERT; sometimes REMOVE and GENERATE
GENERATE*: Primarily GENERATE, needs CREATE, INSERT, PRINT; sometimes REMOVE,
DELETE
JOIN*: Primarily JOIN, but also needs CREATE, and INSERT; some test REMOVE and GENERATE
PRINT*: Primarily tests PRINT, but also needs CREATE, INSERT, and DELETE
REMOVE*: Primarily tests REMOVE, but also needs CREATE, INSERT, and sometimes GENERATE
Short*: Tests all commands, “short” only with respect to Medium* and Long*; always run in quiet mode
Medium*: Tests all commands; always run in quiet mode
Long*: Tests all commands; always run in quiet mode
CP*: Tests the commands listed in the checkpoint section above.
14 EECS 281
Appendix A: Example from the Spec
Given the following as input:
#Awesome Spec Example!
CREATE 281class 3 string string bool emotion person Y/N
INSERT INTO 281class 8 ROWS
happy Darden true
stressed students false
busy office_hours true
stressed students true
stressed Paoletti true
happy Darden true
happy Sith true
victorious Sith true
DELETE FROM 281class WHERE person = Darden
GENERATE FOR 281class hash INDEX ON emotion
PRINT FROM 281class 2 person emotion WHERE Y/N = true
CREATE pets 3 string bool bool Name likes_cats? likes_dogs?
INSERT INTO pets 2 ROWS
Sith true true
Paoletti true false
JOIN pets AND 281class WHERE Name = person AND PRINT 3 Name 1 emotion 2 likes_dogs? 1
REMOVE pets
REMOVE 281class
QUIT
The output should be: (prompt characters and commands not included)
New table 281class with column(s) emotion person Y/N created
Added 8 rows to 281class from position 0 to 7
Deleted 2 rows from 281class
Created hash index for table 281class on column emotion
person emotion
office_hours busy
students stressed
Paoletti stressed
Sith happy
Sith victorious
Printed 5 matching rows from 281class
New table pets with column(s) Name likes_cats? likes_dogs? created
Added 2 rows to pets from position 0 to 1
Name emotion likes_dogs?
Sith happy true
Sith victorious true
Paoletti stressed false
Printed 3 rows from joining pets to 281class
Table pets deleted
Table 281class deleted
Thanks for being silly!
15 EECS 281
Appendix B: Variant Types and You
One of the most difficult parts of implementing an efficient database is handling the problem of
heterogenous rows, or the fact that rows in the tables contain multiple types. If you wanted to store your
rows as vector, what would T be? int? string? bool? The answer, in reality, is that you must be
able to store any of the valid types in that row. Unfortunately the current C++ standard does not give us
a good way of implementing these types of heterogeneous containers when the types of each column
aren’t known at compile time (check out std::tuple<> for when you do!).
To remedy this, the course staff has created a special class for you to store in your tables, aptly named
TableEntry. This TableEntry is an implementation of what is known as a variant type, as it can be
a variety of different types. While we could modify the TableEntry class to contain arbitrary types, for
simplicity we have limited the types that can be represented to the set of those that are valid to appear
in this project. Most of how to use the TableEntry should be intuitive from the comments in the
header file. In addition to that, see the below example for further instructions.
The one major downside to our implementation of TableEntry is that it doesn’t play nicely when you
try to compare two TableEntry objects that contain different types, or try to compare a TableEntry
with a type other than the one it represents internally. To combat that we’ve placed assertions in key
locations such that if you do not compile with -DNDEBUG (by using make debug with the provided
Makefile), these assertions will fire rather than let you have undefined behavior. If you’re getting
weird results with this type, try running under debug conditions and see if an assertion fires. We
attached a comment to the assertions so that you can see what the issue is. In order to see where the
issue stems from, you must use a debugger and read the stack trace to find the line of your code that
calls the offending method. We suggest against delving into the implementation of this class; it’s pretty
complex.
An example:
#include “TableEntry.h”
#include
#include


