## Description

The purpose of this assignment is to have you run R code and produce the numerical and graphical
summaries discussed in Chapter 1 of the Course Notes for randomly generated data.
Follow the steps in the Introduction to R and RStudio posted on Learn to install the software needed
for this course (see Section 1 – Introduction). To learn how to run R code see Section 2 – Getting Started.

The MASS package can be installed using RStudio or by the commands given in the code below. (See
Section 4 – Summary Statistics.)
The code for this assignment is posted both as a text file called RCodeAssignment1.txt and an R file
called RCodeAssignment1R.R which are posted in the Assignment 1 folder in the Assignments folder
under Content on Learn.

Please see the instructions on the last page of this assignment
before you begin.

## Problem 1: Run the following R code.

###################################################################################
# Run this code only once
skewness<-function(x) {(sum((x-mean(x))^3)/length(x))/(sum((x-mean(x))^2)/length(x))^(3/2)}
kurtosis<- function(x) {(sum((x-mean(x))^4)/length(x))/(sum((x-mean(x))^2)/length(x))^2}
library(MASS) # truehist is in the library MASS
###################################################################################
###################################################################################
# Problem 1: R code for Gaussian data

id<-20456458
mu<-id-10*trunc(id/10) # mu = last digit of ID
sig<-max(1,trunc(id/10)-10*trunc(id/100)) # sig = second last digit of ID unless last digit is zero
cat(“mu = “, mu, “, sigma = “, sig) # display values of mu and sigma
set.seed(id)

yn<-sort(round(rnorm(200,mu,sig),digits=2)) # 200 observations from G(mu,sig)
yn[1:5] # display first 5 numbers in the data set
# display sample mean and standard deviation
cat(“sample mean = “, mean(yn), “, sample standard deviation = “, sd(yn))
cat(“five number summary: “,fivenum(yn)) # five number summary
cat(“sample skewness = “, skewness(yn)) # sample skewness

cat(“sample kurtosis = “, kurtosis(yn)) # sample kurtosis
# plot relative frequency histogram and superimpose Gaussian pdf
truehist(yn,main=”Relative Frequency Histogram of Data”)
# plot Empirical and Gaussian cdf’s

plot(ecdf(yn),verticals=T,do.points=F,xlab=”y”,ylab=”ecdf”,main=””)
title(main=”Empirical and Gaussian C.D.F.’s”)
#############################################################################
Verify that you obtain the following output and plots:

> yn[1:5] # display first 5 numbers in the data set
[1] -12.89 -5.67 -2.60 -1.54 -0.31
> # display sample mean and standard deviation
> cat(“sample mean = “, mean(yn), “, sample standard deviation = “, sd(yn))
sample mean = 8.11465 , sample standard deviation = 4.812293
> cat(“five number summary: “,fivenum(yn)) # five number summary

five number summary: -12.89 5.36 7.815 11.32 20.77
> cat(“sample skewness = “, skewness(yn)) # sample skewness
sample skewness = -0.2029152
> cat(“sample kurtosis = “, kurtosis(yn)) # sample kurtosis
sample kurtosis = 4.486426

## Problem 2: Run the following R code.

#################################################################################
# Problem 2: R code for Exponential data
set.seed(id)
mu<-max(1,id-10*trunc(id/10)) # mu = last digit of ID unless it is zero
ye<-sort(round(rexp(200,1/mu),digits=2)) # 200 observations from Exponential(1/mu)
ye[1:5] # display first 5 numbers in the data set

# display sample mean and standard deviation
cat(“sample mean = “, mean(ye), “, sample standard deviation = “, sd(ye))
cat(“five number summary: “,fivenum(ye)) # five number summary
cat(“sample skewness = “, skewness(ye)) # sample skewness
cat(“sample kurtosis = “, kurtosis(ye)) # sample kurtosis

# plot relative frequency histogram and superimpose Exponential pdf
truehist(ye,ymax=1/mean(ye),main=”Relative Frequency Histogram of Data”)
# plot Empirical and Exponential cdf’s
plot(ecdf(ye),verticals=T,do.points=F,xlab=”y”,ylab=”ecdf”,main=””)

title(main=”Empirical and Exponential C.D.F.’s”)
#Plot side by side boxplots
boxplot(yn,ye,col=”cyan”,names=c(“Gaussian Data”,”Exponential Data”))
###############################################################################
Verify that you obtain the following output and plots.
> ye[1:5] # display first 5 numbers in the data set
[1] 0.01 0.13 0.18 0.24 0.26

> # display sample mean and standard deviation
> cat(“sample mean = “, mean(ye), “, sample standard deviation = “, sd(ye))
sample mean = 7.9169 , sample standard deviation = 9.249768
> cat(“five number summary: “,fivenum(ye)) # five number summary
five number summary: 0.01 2.07 5.095 11.12 90.52

> cat(“sample skewness = “, skewness(ye)) # sample skewness
sample skewness = 4.198336
> cat(“sample kurtosis = “, kurtosis(ye)) # sample kurtosis
sample kurtosis = 33.82573

## Problem 3: Run the following R code.

#################################################################################
# Problem 3: R code for bivariate data
set.seed(id)
x<-round(runif(100,0,20),digits=1)
alpha<-mean(yn)
beta<-mean(ye)

# display values of alpha and beta
cat(“alpha = “, alpha, “, beta = “, beta)
y<-round(alpha+beta*x+rnorm(100,0,beta*2),digits=1)
# display first 5 pairs of data
matrix(c(x[1:5],y[1:5]),nrow=5,ncol=2,byrow=F)
# display sample correlation

cat(“sample correlation = “, cor(x,y))
plot(x,y,col=”blue”,main=”Scatterplot of Data”)
#################################################################################
Verify that you obtain the following output and plots:
> cat(“alpha = “, alpha, “, beta = “, beta)
alpha = 8.11465 , beta = 7.9169
> y<-round(alpha+beta*x+rnorm(100,0,beta*2),digits=1)
> # display first 5 pairs of data

> matrix(c(x[1:5],y[1:5]),nrow=5,ncol=2,byrow=F)
[,1] [,2]
[1,] 1.1 24.1
[2,] 1.9 14.9
[3,] 8.5 64.1
[4,] 15.5 136.9
[5,] 19.6 156.0

> # display sample correlation
> cat(“sample correlation = “, cor(x,y))
sample correlation = 0.9365159

Run the R code for the 3 problems above again except modify the line
“id<-20456458”
in Problem 1 by replacing the number 20456458 with your UWaterloo ID
number.

When you run the R code with your ID number you will generate 6 new plots.
Export these 6 plots as .png files using RStudio (See Introduction to R and
RStudio Section 6).
Download the Assignment 1 Template which is posted as a Word document on
Learn.

Fill in the required information and plots based on the output for the
template exactly. See Assignment 1 Example posted on Learn.
Create a .pdf file for the answer to EACH problem.

### Here are some options for creating pdf files:

Most word processing software will allow you to save your file as a PDF;
however, if you require software to create PDFs, some free options are listed
below:
 Use a free word processing program that can export directly to PDF, such
as OpenOffice.org.