Description
CSCI 330 Assignment 1 (Java Exercise: Data Mining) Solved
Goals of the Assignment:
The purpose of this assignment is to:
1. understand how to process data in a regular text file using Java (this is a simple
example of data mining) and
2. make sure that you did not forget how to use Java
Description of Input File and Problem Statement
The name of the input file is StockmarketInput.txt, which is available in the Assignment
section of your Canvas. This input file contains stock information for the last 25 years
(1990-2015) for the following seven big technology companies:
1. Apple (AAPL)
2. Facebook (FB)
3. Google (GOOG)
4. International Business Machine (IBM)
5. Intel Corporation (INTC)
6. Microsoft (MSFT)
7. Yahoo (YHOO)
This data is publicly available. The data is sorted by ticker symbol and date.
Each line of the StockmarketInput.txt file contains the data for a particular stock (ticker
symbol) for one day. The values are separated by a tab(‘\t’).
Here is an example from the data for Microsoft (ticker symbol MSFT):
MSFT 8/18/2004 26.93 27.5 26.89 27.46 58844000 19.46
MSFT 8/17/2004 27.22 27.38 26.98 27.05 56879700 19.17
MSFT 8/16/2004 27.03 27.2 26.96 27.09 54347200 19.19
MSFT 8/13/2004 27.01 27.25 26.98 27.02 43333200 19.14
MSFT 8/12/2004 27.23 27.31 26.86 26.88 50279700 19.05
MSFT 8/11/2004 27.39 27.51 27.2 27.41 53097300 19.42
There are eight fields in each line. A description of each value of the first line (yellow
marked) is given below. All other lines of this example, as well as the entire
StockmarketInput.txt, will have similar meanings.
1. ticker symbol (MSFT),
2. date (8/18/2004),
3. opening price (26.93),
4. high price (27.50),
5. low price (26.89),
6. closing price (27.46),
7. volume or number of shares traded on that day (58844000 shares), and
8. adjusted closing price (19.46).
The opening and closing prices are the prices of the first and last trade, respectively, on the
given date. The high and low prices are the highest and lowest prices of any trade during
the day. The adjusted price includes corrections to the price to make comparisons across
days easier.
What to do
1. For each company, identify the crazy days in terms of stock price fluctuation. Here is
how to calculate the crazy days.
Suppose at Date D, the high price of the share of company X is Hx, and Low price is Lx. Then D
will be called crazy day if
(Hx-Lx)/Hx >= 15%
Let’s consider the example below (yellow marked), which is taken from
StockmarketInput.txt.
YHOO 4/5/2000 162 169.88 158.5 165.56 27371800 82.78
YHOO 4/4/2000 165 171 132.75 167.38 42528400 83.69
YHOO 4/3/2000 168.75 173 159.38 160.13 19322800 80.06
Notice that, for 4/4/200, the high price of the share for Yahoo was $171, and the low price
was $132.75. The fluctuation of the stock price is (171-132.75)/171=22.37%, which is
more than 15%. Hence, 4/4/2000 is considered as a crazy day for Yahoo.
For this assignment, identity all crazy days for each company. Also, identify the craziest
day for that particular company, i.e., on which day of the entire history of the stock price,
the stock price fluctuated most.
2. Stock split:
Occasionally, a stock may split by some ratio, say 2-for-1. This means that every old share is
replaced by some number of new shares, in these case two. Each new share is worth some
fraction of the value of the old share. In this case, each is worth half the old value. Thus, the
value of each person’s holding will remain unchanged.
We recognize such a stock split in the data when the closing value of day “x” is twice the
opening value on the day “x + 1”, the next trading day. There is a complication at this point
because the price of the first trade of the new day does not have to be the same as the price
of the last trade of the prior day. The new opening price does not have to be exactly half of
the previous day’s closing price.
Let’s consider the example below (yellow marked), which is taken from
StockmarketInput.txt.
AAPL 3/1/2005 44.99 45.11 44.16 44.5 117047000 5.94
AAPL 2/28/2005 44.68 45.14 43.96 44.86 162902600 5.99
AAPL 2/25/2005 89.62 89.91 88.19 88.99 228877600 5.94
AAPL 2/24/2005 88.48 89.31 87.73 88.93 379757000 5.94
Notice that the closing price on 2/25/2005 was $88.99, and the opening price on
2/28/2005 was $44.68, giving a ratio of 88.99/44.68 = 1.991, which is slightly lower than 2.
Nevertheless, AAPL stock experienced a 2:1 stock split on 1997.04.02, and the small
variation is due to the initial trading activity on 2/28/2005as described above.
To accommodate such variations, we use the following definition, where Cx is the closing
price on day x and Ox+1 is the opening price on the next trading day x + 1.
Definitions:
A 2:1 stock split occurs if | Cx / Ox+1 – 2.0| < 0.20
A 3:1 stock split occurs if | Cx / Ox+1 – 3.0| < 0.30
A 3:2 stock split occurs if | Cx / Ox+1 – 1.5| < 0.15
Reminder: |x| represents the absolute value of the number x.
For this assignment, you have to detect all splits of ratios 2:1, 3:1, and 3:2 using the
above formulas.
Note that “next day” when computing the closing/opening ratio is the next trading day,
which may not be the next day on the calendar due to weekends, holidays, etc. For this
assignment, adjacent lines in StockmarketInput.txt represent adjacent trading days.
Output
Your java program should read StockmarketInput.txt and output all splits, as defined
above, to System.out. The output file named StockmarketOutput.txt is available on canvas.
You have to follow the same format, as shown in the output file.
Note that the split is attributed to the day with the higher closing price. The two prices
are the closing price for the day of the split and the opening price for the next trading day.
StockmarketInput.txt is ordered in descending chronological order, which means that the
data for tomorrow comes before the data for today.
Good Design and Style
Although bad design and style will not hurt the performance of your program, it is of
utmost importance, especially when you will work in the real world with other
professionals, to follow good design and style for coding.
It will be helpful not only for your
colleagues to understand your code better but also for you as well when you will revisit
your code later.
Here are some guidelines for good design and style:
The program is logically organized so that a reader can quickly and easily
understand the program structure.
Functions and variables have meaningful names.
The program has a proper indentation.
The program follows an object-oriented approach.
Point Distribution
Grading Criteria Points
The program gives the correct output 14
The program has good design and style 2
The write up is complete (YourLastName-Assignment1.pdf) 2
The output format is correct 1
The filename format is correct 1
Submission Instructions
You have to submit the following on Canvas.
1. Java source code.
▪ The file name containing the main method should be
YourLastNameAssignment1.java.
▪ If you have more than one java files:
• Please keep all java files in a single folder.
• Zip the folder and name it YourLastName-Assignment1.zip.
2. A pdf file (Filename format: YourLastName-Assignment1.pdf) with the following
write up (If you have zip file for your java source code in the previous step,
please DON’T include PDF file in your zip file):
▪ Your full name
▪ Does your program compile correctly? Yes/No
▪ Does your program run correctly? Yes/No
▪ An acknowledgment and discussion of any parts of the program that are not
working. Failure to disclose obvious problems will result in additional penalties.
▪ An acknowledgment and discussion of any parts of the program that appear to
be inefficient (in either time or space complexity).
▪ How many hours did you spend on completing this assignment?
▪ Your expected score (please fill out the following table)
Grading Criteria Points Possible Your Expected Score
The program gives the correct output 14
The program has good design and style 2
The write up is complete (YourLastNameAssignment1.pdf)
2
The output format is correct 1
The filename format is correct 1
Upload the following two files on canvas
1. YourLastNameAssignment1.java or YourLastName-Assignment1.zip
2. YourLastName-Assignment1.pdf
Late Policy:
No late work will be accepted.
Academic Dishonesty
Copying files from someone else and claiming they are your own is plagiarism.
Providing files that you created to another student or being party to such actions also
amounts to academic dishonesty.
CSCI 330 Assignment 2 (Stock Investment Strategy) Solved
1. Goal of the assignment
The purpose of this assignment is to use SQL to access a database from a Java program.
2. Problem Statement
Using a database of stock price data, write a Java program that computes the gain or loss
from the trading strategy described below (see the section titled “Processing”).
2.1 ConnectionParameters.txt
For this assignment, you will need to access the database called johnson330, hosted at
mysql.cs.wwu.edu. You should have read-only access to the johnson330 database. We
have already provided your credentials to access the database on that server.
Section 5.1 of
our book discussed how to access the database from java. To connect to the database, you
need an external text file named ConnectionParameters.txt.
There are two good reasons for moving the connection parameters to an external text file:
1. Security. If you hardcode the connection parameters into your program, the
password is vulnerable to some clever person decoding the executable. Also,
disassembler programs (translate machine language to assembly language) do all
the hard work of doing this for you.
2. Testing. You do not have to change a program to move it from a test environment to
an operational environment. Moving the connection parameters to an external file
enables this.
If you are running your program from a lab computer, ConnectionParameters.txt should
look as follows:
dburl=jdbc:mysql://mysql.cs.wwu.edu/johnson330
user=your_user_name
password=your_password_we_provided
Please ensure that this external text file (ConnectionParameters.txt) doesn’t contain any
extra lines at the end.
2.2 Remote Database Connection
To connect to the remote database server (mysql.cs.wwu.edu) from off-campus computers
(e.g., your laptop), first, you will have to install VPN (Virtual Private Network). Here are the
details on how to do that: https://atus.wwu.edu/kb/vpn-virtual-private-network
Second, you will have to set up SSH Tunneling using the following command in the
terminal/command prompt/PowerShell.
ssh -N -p922 –L4321:mysql.cs.wwu.edu:3306 your_user_name@proxy.cs.wwu.edu
It will ask for a password; please use your CS password (which you use to log in to any lab
computer). Here is a screenshot from Windows Command prompt.
Please notice that after you enter the password, the cursor will go to the next line and
start blinking. It means that the connection is successful. There will be no additional
message.
Since you are running your program from an off-campus computer, you need to change
ConnectionParameters.txt in the following way:
dburl=jdbc:mysql://127.0.0.1:4321/johnson330
user=your_user_name
password=your_password_we_provided
3. Database Schema
The database schema is as follows:
Tables Columns Primary Key Foreign keys
company Ticker
Name
Industry
Location
Ticker
pricevolume Ticker
TransDate
OpenPrice
Ticker
TransDate
Ticker
HighPrice
LowPrice
ClosePrice
Volume
AdjustedClose
dividend Ticker
DivDate
Amount
Ticker
DivDate
Ticker
We only need to use the first two tables (company and pricevolume) for this
assignment.
4. Processing
Your program should proceed as described below.
1 Connect to the database.
2 Repeat
2.1 Request a ticker symbol and optional start and end dates from System.in. The
loop (step 2) exits, and the program terminates when an empty string, or a string
containing only spaces, is submitted as input.
2.2 Retrieve the full company name from the company table and print it on the
console. If the company is not found, indicate that the stock is not in the database
and start the loop again by requesting user input.
2.3 Retrieve all the pricevolume data in the input data range for the ticker. If no
dates were specified, retrieve all pricevolume data. Because the first analysis phase
involves adjusting for splits, it is useful to request the data in reverse chronological
order. For example, to retrieve the data for all dates for a ticker symbol INTC, you
could use the following SQL:
select * from pricevolume where Ticker = ‘INTC’ order by TransDate DESC
2.4 To prepare for the investment strategy computation (2.6 and following), scan
the data in reverse chronological order, and identify stock splits:
• A 2:1 stock split occurs if | Cx / Ox+1 – 2.0| < 0.20
• A 3:1 stock split occurs if | Cx / Ox+1 – 3.0| < 0.30
• A 3:2 stock split occurs if | Cx / Ox+1 – 1.5| < 0.15
Where Cx is the closing price on day x, and Ox+1 is the opening price on the next
trading day x + 1.
2.5 To adjust for splits on a given day (meaning the split occurs between that day
and the next day), all price data for the given day and earlier must be divided by 2
(or 3 or 1.5 depending on the split ratio).
Each row of the pricevolume table
represents one trading day, so the open, high, low, and close prices for that day must
be adjusted.
Note that after adjusting all price data on the given day, the algorithm must continue
scanning to detect splits in the adjusted data. If another 2:1 split appears, for
example, then earlier data, already adjusted for the first split, would again be divided
by 2.
You should be able to accomplish all adjustments in one pass over the data by keeping
track of the total divisor. Initialize the divisor to one and adjust it upward as you
encounter splits.
2.6 From this point forward, all references to price data refer to the adjusted data
from the previous step. With the adjusted data stored in your program, scan forward
in time to implement the following investment strategy. In the remaining steps,
• d will refer to a trading day
• (d+1) will refer to the next trading day
• (d-1) will refer to the prior trading day
• close(d) closing price for day d.
• open(d) opening price for day d and so on
2.7 Maintain a moving average of the closing prices over a 50-day window. So for
a given trading day d, the 50-day average is the average closing price for the 50
previous trading days (days d-50 to d-1).
2.8 If there are less than 51 days of data, do no trading and report a net gain of
zero and repeat from step 2 to get the next user input.
2.9 If there are more than 51 days of data, compute 50-day average for the first
fifty days. Proceeding forward from day 51 through the second-to-last trading day in
the data set, execute the following strategy:
• Track current cash and shares, both of which start at zero. When buying stock,
cash decreases, and shares increase. When selling stock, cash increases, and
shares decrease. Since cash starts at zero, we must borrow money to buy the
initial shares. Disregard this complication.
• (Buy criterion) If the close(d) < 50-day average and close(d) is less than open(d)
by 3% or more (close(d) / open(d) <= 0.97), buy 100 shares of the stock at price
open(d+1).
• (Sell criterion) If the buy criterion is not met, then if shares >= 100 and open(d) >
50-day average and open(d) exceeds close(d-1) by 1% or more (open(d) / close(d1) >= 1.01), sell 100 shares at price (open(d) + close(d))/2.
• (Transaction Fee) For either a buy or sell transaction, cash is reduced by a
transaction fee of $8.00.
• If neither the buy nor the sell criterion is met, do not trade on that day.
• Regardless of trading activity, update the 50-day average to reflect the average
over the last 50 days, and continue with day d+1.
2.10 After having processed the data through the second-to-last day, if there are
any shares remaining, on the last day, add open(d) * shares remaining to cash to
account for the value of those remaining shares (No transaction fee applies to this).
5. Sample Output
Database connection is established.
Enter a ticker symbol [start/end dates]: INTC
Intel Corp.
2:1 split on 2000.07.28 129.12 –> 65.44
2:1 split on 1999.04.09 130.81 –> 61.62
2:1 split on 1997.07.11 153.81 –> 77.25
2:1 split on 1995.06.16 116.12 –> 58.50
2:1 split on 1993.06.04 112.75 –> 60.12
3:2 split on 1987.10.28 31.75 –> 21.75
6 splits in 7470 trading days
Executing investment strategy
Transactions executed: 690
Net cash: 14717.72
Enter ticker symbol [start/end dates]: INTC 1980.01.01 1999.12.31
Intel Corp.
2:1 split on 1999.04.09 130.81 –> 61.62
2:1 split on 1997.07.11 153.81 –> 77.25
2:1 split on 1995.06.16 116.12 –> 58.50
2:1 split on 1993.06.04 112.75 –> 60.12
3:2 split on 1987.10.28 31.75 –> 21.75
5 splits in 3791 trading days
Executing investment strategy
Transactions executed: 358
Net cash: 44953.95
Enter ticker symbol [start/end dates]: T
AT&T Inc
2:1 split on 1998.03.19 83.75 –> 42.12
2:1 split on 1993.05.25 74.75 –> 37.62
3:1 split on 1987.05.22 102.50 –> 36.00
3 splits in 7470 trading days
Executing investment strategy
Transactions executed: 260
Net cash: 2028.67
Enter ticker symbol [start/end dates]: T 2000.01.01 2014.08.18
AT&T Inc
0 splits in 3679 trading days
Executing investment strategy
Transactions executed: 148
Net cash: -1568.00
Enter ticker symbol [start/end dates]: BAC
Bank of America Corp
2:1 split on 2004.08.27 89.01 –> 44.79
2:1 split on 1997.02.27 122.50 –> 61.25
2:1 split on 1986.11.20 42.62 –> 21.50
3 splits in 7116 trading days
Executing investment strategy
Transactions executed: 534
Net cash: 41846.00
Enter ticker symbol [start/end dates]: XX
XX not found in database.
Enter ticker symbol [start/end dates]:
Database connection closed
6. Constraints
1. You must use PreparedStatements when you are using SQL statements where values
are filled in. This is a good practice for avoiding SQL injection attacks (A HUGE security
issue).
2. You will likely need two SQL statements for reading the stock trading data from the
database (step 2.3), one for no date range specified and one for a specified date range.
You do not need to duplicate any remaining logic (2.4 and following from section 4) to
handle those two separate conditions.
3. To help with round-off-error discrepancies, (a) do all computations with doubles, not
floats, and (b) use the code in the following table:
When the description says: Use the following:
value <= 0.97 value < 0.97000001
value >= 1.01 value > 1.00999999
7. Hints
1. If you are having problems getting a connection established, here are some possible
problems to check:
a) If your program terminates with an SQLException with a message of the form
“Access denied for user … (using password: YES),” that is most likely a problem
with the userid or password. Check to ensure that these are correctly specified in
your connection parameters file.
b) If your program terminates with an SQLException with a message of the form
“Access denied for user … to database …”, that most likely means that the userid
and password are valid, but the database name (johnson330) is wrong, or the
given userid doesn’t have permission to access that database. Make sure you’re
using your userid.
c) If your program terminates with an SQLException with a message of the form
“Communications link failure,” that is a problem communicating with the
database server. Check to ensure that the server is correctly specified and that
you have the needed Internet connectivity.
2. Dates are stored in the database as character strings, not SQL date types. However,
the date format (YYYY.MM.DD) means that string comparisons also give the correct
answer as date comparisons. Therefore, for this assignment, you don’t need to
decompose database dates; just continue to treat them as strings.
8. Good Design and Style
Although bad design and style will not hurt the performance of your program, it is of
utmost importance, especially when you will work in the real world with other
professionals, to follow good design and style for coding. It will be helpful not only for your
colleagues to understand your code better but also for you as well when you will revisit
your code later.
Here are some guidelines for good design and style:
• The program is logically organized so that a reader can quickly and easily
understand the program structure.
• Functions and variables have meaningful names.
• The program has a proper indentation.
• The program follows an object-oriented approach.
9. Point Distribution
Grading Criteria Points
The program gives the correct output 48
The program obeys the constraints 5
The program has good design and style 2.5
The write up is complete (YourLastName-Assignment2.pdf) 2.5
The output format is correct 1
The filename format is correct 1
For the correct output (48 points), we will check the output of your program with six
separate companies (8 points for each company). Suppose, if we check your program with
INTC, then we will distribute the 8 points in the following way:
10. Submission Instructions
• You have to submit the following on Canvas.
1. Java source code.
▪ The file name containing the main method should be
YourLastNameAssignment2.java.
▪ If you have more than one java file:
• Please keep all java files in a single folder.
• Zip the folder and name it YourLastName-Assignment2.zip.
2. A pdf file (Filename format: YourLastName-Assignment2.pdf) with the following
write-up (If you have a zip file for your java source code in the previous step,
please DON’T include the PDF file in your zip file):
▪ Your full name
▪ Does your program compile correctly? Yes/No
▪ Does your program run correctly? Yes/No
▪ An acknowledgment and discussion of any parts of the program that are not
working. Failure to disclose obvious problems will result in additional penalties.
▪ An acknowledgment and discussion of any parts of the program that appear to
be inefficient (in either time or space complexity).
▪ How many hours did you spend on completing this assignment?
▪ Your expected score (please fill out the following table)
Grading Criteria Points Your Expected Score
The program gives the correct output 48
The program obeys the constraints 5
The program has good design and style 2.5
The write up is complete (YourLastNameAssignment2.pdf)
2.5
The output format is correct 1
The filename format is correct 1
• Upload the following two files on canvas
1. YourLastNameAssignment2.java or YourLastName-Assignment2.zip
2. YourLastName-Assignment2.pdf
11. Debugging Tips
1. Since this assignment deals with millions of data points, it is tough to determine the
cause of the error if you get an incorrect output. Hence, for this assignment (and for
the next assignment as well), you need to know how to debug the java code
efficiently. Printing messages on the console might not be sufficient to find the
problem.
There are tons of tutorials and YouTube videos available on the web to learn how to
use the debugger for Eclipse, jGrasp, IntelliJ, and jdb. Please feel free to look at the
tutorials to learn how to do the debugging using your favorite java editor.
2. There are two log files (in .txt format) available on canvas. Besides using the
debugging tool, you may look at these log files to find the problem if your program
does not provide the correct output. These two log files are as follows:
• transLog-INTC-1980.01.01-1999.12.31.txt
• transLog-T-All Dates.txt
The first log file contains the stock data for Intel Corporation from Jan 1, 1980 to
December 31, 1999 and the second log file contains the stock data for AT&T for all
dates available. Each line of these two log files has the following information:
1. date
• Example: 1985.01.02
2. opening price
• Example: open: 1.1666667
3. high price
• Example: high: 1.1770833
4. low price
• Example: low: 1.1354167
5. closing price
• Example: close: 1.1458333
6. 50-day average [After first fifty days]
• Example: average 1.2229167
7. Buy stocks [If buy condition fulfills]
• Example: Buy: 1985.03.26 100 shares @ 1.0675000, total shares = 100, cash
= -114.7500000
8. Sell stocks [If sell condition fulfills]
• Example: Sell: 1999.11.22 100 shares @ 80.6900000, total shares = 0, cash =
44319.9479167
9. Final Sale [At the end of the file]
• Example: Final sale: 1999.12.31 0 shares @ 83.8100000, cash =
44953.9479167 (average = 77.8544000)
12. Late Policy
No late work will be accepted.
13. Academic Dishonesty
• Copying files from someone else and claiming they are your own is plagiarism.
• Providing files you created to another student or being party to such actions also
amounts to academic dishonesty.
CSCI 330 Homework 2 (Basic SQL) Solved
Goals
The goals of this homework are as follows:
To become familiar with basic SQL commands in MySQL.
o To create a database
o To insert data into tables
o To write basic SQL queries
The SQL commands from chapter 3 will be sufficient for this homework.
We will use a software called “MySQL Workbench” to write the SQL query in our
class. We recommend installing MySQL workbench on your computer. There are
numerous online tutorials and YouTube videos available to learn how to install and
use MySQL workbench. Some examples:
o https://www.youtube.com/watch?v=OM4aZJW_Ojs&ab_channel=WebDevSi
mplified (For Windows)
o https://www.youtube.com/watch?v=-BDbOOY9jsc&ab_channel=AmitThinks
(For Mac)
o https://www.youtube.com/watch?v=iLRY-NfXXsA&ab_channel=AmitThinks
(for Ubuntu)
If you prefer not to install MySQL workbench on your computer, you may use a freely
available online editor. Some examples:
o https://paiza.io/en/languages/mysql
o https://extendsclass.com/mysql-online.html
What to do
1. Create a University Database: Use the SQL script (create_university.sql), which is
available on canvas, to create the following university database (see figure 1).
Figure 1: University database
2. Populate the university database with Data: Use the SQL script
(insert_university.sql), which is available on canvas, to populate the seven tables (see
below) of the university database
department
instructor
course
student
section
teaches
Takes
3. Write SQL queries for the following (2*8 = 16 points)
a. Find courses that taught either in Fall 2009 or in Spring 2010.
b. Find all instructors earning the highest salary (there may be more than one with the
same salary).
c. Find names and average salaries of all departments whose average salary > 42000
d. For each department, find the maximum salary of instructors in that department.
You may assume that every department has at least one instructor.
e. Find the names of all students who have taken any Comp. Sci. course ever (there
should be no duplicate names)
f. Find the enrollment of each section that was offered in Spring 2009.
g. Find the maximum enrollment, across all sections, in Spring 2009.
h. Delete all courses that have never been offered (that is, do not occur in the section
relation).
Submission Instructions
Please put all of your SQL queries and their corresponding results (in table format)
in one PDF file.
The file name should be YourLastName-CSCI330-HW2.pdf.
Upload the pdf file on canvas.
Late Policy
No late work will be accepted.
CSCI 330 Database Systems Homework 3 Solved
Goal
The goal of this homework is to learn and practice more SQL commands (both basic and
intermediate) for MySQL.
What to do
1. We will use a publically available database named Chinook
https://github.com/lerocha/chinook-database
Below is the schema of this database:
2. Download the SQL script (Chinook_MySql.sql) available on canvas. We collected this SQL
script from the website. Execute this script to create the Chinook database and insert
values for the tables. It might take several minutes to finish executing the entire script
(~16K lines of SQL code).
3. Write SQL queries for the following.
a. Find distinct track names that start with “Z.” Sort the output alphabetically. (3
points)
b. Find the first names of the employees who are older than their supervisor. Hint:
ReportsTo attribute in Employee table stores the EmployeeId of the supervisor.
Sort the output alphabetically. (3 points)
c. Find the name of the highest-priced track. If more than one track has the highest
price, return the names of all such tracks. Sort the output alphabetically based on
the track name. (3 points)
d. Find a list containing the total amount spent by a customer. Include the
customer’s id and the last names along with the total amount. For customers
who did not make any purchase, make sure to include them as well. (3 points)
e. Find the title of the highest-priced album. (3 points)
f. Find a distinct list containing the titles of albums that are never sold. Consider an
album never sold if none of its tracks are sold. Sort the output alphabetically. (3
points)
g. Create a view that returns customers’ first and last names along with
corresponding sums of all their invoice totals. Name the view as
“CustomerInvoices.” (2 points)
What to submit
You have to submit all the SQL queries and the output of the query. If the output contains
more than six tuples, then give the top six tuples.
Submission Instructions
Put all SQL queries and output in one single doc/docx file.
Convert the file to a pdf file. The file name should be YourLastName-330-HW3.pdf.
Upload the pdf file on canvas.
Late Policy:
No late work will be accepted.
Questions?
If you have any questions, please first check the FAQ page on canvas to see if the question is
already answered.
CSCI 330 Database Systems Homework 4 Solved
Goal
To design databases (E-R diagram and schemas) discussed in chapter 7.
Q1 (10 points)
Suppose an automobile company (e.g., Toyota) hired you to design a database to assist
• Its dealers for maintaining customer records and dealer inventory
• Sales staff in ordering cars.
The database needs to store information about
• Brands (e.g., Toyota, Lexus)
• Models (e.g., Camry, Rav4)
• Options (e.g., Basic, premium, Prestige)
• Individual dealers (e.g., Wilson Motors)
• Customers
• Vehicles/Cars
You need to remember the following information:
• Each brand is identified by brand name (e.g., Toyota, Lexus).
• Each model has model_id and name. Each model is identified by the model_id.
• Each model must be associated with a brand. i.e., there is no model without a brand.
• Each vehicle is identified by VIN (vehicle identification number).
• Each vehicle must be associated with a model, i.e., there is no vehicle without a
model.
• Each model may have many options. For instance, the Rav4 model has three options
(e.g., LE, XLE, LE Hybrid).
• Each option can have option_id and specification. Each option is identified by
option_id.
• Each dealer has dealer_id, name, and address. Each dealer is identified by dealer_id.
• Each customer has customer_id, name, and address. Each customer is identified by
customer_id.
• Each vehicle may be associated with a dealer.
• Each vehicle may be associated with a customer.
(7 points) Draw an E-R diagram to represent the database mentioned above. You must
keep the following in mind.
• There are 6 strong entity sets and NO weak entity sets. Some entity sets might
have a single attribute.
• There are 12 attributes and all of them are simple and single-valued attributes.
• There are 5 binary relationship sets. There are no other types of relationship sets
(e.g., ternary).
• NO relationship set has any descriptive attributes.
• There are 2 total participation.
• In your ER diagram, you will also have to do the following:
o Identify the attributes for each entity set. Please underline the primary key.
o Specify the cardinality (one to one, one to many, many to one, or many-many)
of the diagram.
(3 points) Determine the Relation Schemas from the E-R diagram. No optimization is
required.
Q2 (10 points)
Suppose a worldwide packet delivery company (e.g., FedEx) hired you to design their
database. The database needs to store information about
• customer
• packet
• place
You need to remember the following information:
• Each customer has a unique customer_id. Each customer also contains their name
and address.
• Each packet has packet_id and weight. Each packet is identified by its packet_id.
• Each place or location has place_id, city, country, and address. Each place can be
identified by place_id.
• Each packet must be associated with a place, i.e., there is no packet without a place.
• Each packet must be associated with a customer, i.e., there is no packet without a
customer. The association between customers and packets can be in one of the
following ways:
o Customer who sends packets. In this case, the customer provides the
time_sent information, i.e., when the packet was sent.
o Customer who receives packets. In this case, the customer provides the
time_recieved information, i.e., when the packet was received).
o Some customers may play the role of both sender and receiver.
(7 points) Draw an E-R diagram to represent the database mentioned above. You must
keep the following in mind:
• There are 3 strong entity sets and NO weak entity sets.
• There are 9 attributes for the entity sets and all of them are simple and singlevalued attributes.
• There are 3 binary relationship sets. There are no other types of relationship sets
(e.g., ternary).
• Two relationship has two descriptive attributes
• Two entity sets may be related by two separate relationship sets.
• There are 3 total participation.
• In your ER diagram, you will also have to do the following:
o Identify the attributes for each entity set. Please underline the primary key.
o Specify the cardinality (one to one, one to many, many to one, or many-many)
of the diagram.
(3 points) Determine the Relation Schemas from the E-R diagram. No optimization is
required.
What to submit
All E-R diagrams and Relation schemas.
Submission Instructions
• Put all E-R diagrams and Relation schemas in one single doc/docx file.
• Convert the file to a pdf file. The file name should be YourLastName-330-HW4.pdf.
• Upload the pdf file on canvas.
Late Policy
• No late work will be accepted




