Sale!

Data Wrangling in R MATH 8050 Homework 2 solution

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (4 votes)

Data Wrangling in R

1. (20 pts total, equally weighted) The diamonds dataset
a. Replicate the following scatter plot
1
D E F G H I J

0 50001000015000 0 50001000015000 0 50001000015000 0 50001000015000 0 50001000015000 0 50001000015000 0 50001000015000
0
1
2
3
4
5
Price
Carat
clarity
I1
SI2
SI1
VS2
VS1
VVS2
VVS1
IF Diamond Price x Carat

b. Replicate the following plot
0
5000
10000
0 5000 10000 15000 20000
price
count
clarity
I1
SI2

SI1
VS2
VS1
VVS2
VVS1
IF
Histogram of Prices by Clarity

c. Replicate the following plot
2 Data Wrangling in R
Fair
Good
Very Good
Premium
Ideal

0e+00 2e+07 4e+07 6e+07
price
cut
cut
Fair
Good
Very Good
Premium
Ideal
Diamond Cut ~ Price

d. For the diamonds dataset, replicate the following plot.
Ideal
Good
Very Good
Premium
Fair
Ideal
Very Good
Good

Premium
Fair
Ideal
Good
Very Good
Fair
Very Good
Ideal

Premium
Premium
Fair
Good
Ideal
Good

Very Good
Fair
Premium
Ideal
Fair
Good

Very Good
Premium
Good
Ideal
Fair

Very Good
Premium
color
D
E
F

G
H
I
J
Mean dimond price 2. (35 pts total, equally weighted) We use tydiverse package to generate various plots with the iris

3 Data Wrangling in R
dataset.
a. For the iris dataset, replicate the following plot
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8

Sepal.Length
Sepal.Width
Default (theme_grey)
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8
Sepal.Length
Sepal.Width
theme_bw

2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8

Sepal.Length
Sepal.Width
theme_linedraw
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8

Sepal.Length
Sepal.Width
theme_light
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8

Sepal.Length
Sepal.Width
theme_dark
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8

Sepal.Length
Sepal.Width
theme_minimal
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8
Sepal.Length
Sepal.Width

theme_classic theme_void
b. For the irish dataset, replicate the following plot.
4
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8

Sepal.Length
Sepal.Width
Species
setosa
versicolor
virginica
IRIS
c. Compute the mean Petal Length under each species and then replicate the following plot. Make sure
that you only use the tidyverse package for this problem.
0
1
2
3
4
5
6

setosa versicolor virginica
Species
Petal Length
d. Combine variables by species and then replicate the following plot. Make sure that you only use the
tidyverse package for this problem.
5
0
2
4
6

setosa versicolor virginica
Species
Values
Petal.Length
Petal.Width
Sepal.Length
Sepal.Width

e. Order the species according to the order virginica, setosa, and versicolor, and replicate the
following plot. Make sure that you only use the tidyverse package for this problem.
5.006
3.428
1.462
0.246
5.936
2.77
4.26

1.326
6.588
2.974
5.552
2.026
0
2
4
6 Data Wrangling in R

versicolor virginica setosa
Species
Values
Petal Length
Petal Width
Sepal Length
Sepal Width

f. Add small amount of random variation to the location of each point using geom_jitter and replicate
the following boxplot, where each characteristics of species corresponds to a boxplot and these boxplots
are grouped by species.
6
0
2
4
6
8

setosa versicolor virginica
Species
Values
Petal Length
Petal Width
Sepal Length
Sepal Width

g. Generate the boxplots faceted for each species and replicate the following plot.
7 MData Wrangling in R1
Sepal.Length Sepal.Width
Petal.Length Petal.Width
setosa versicolor virginica setosa versicolor virginica
0
2
4
6
8
0
2
4
6
8
Species
Values

3. (10 pts total, equally weighted) Use the economics dataset from the ggplot2 package answer the
following questions
a. Replicate the following figure mentioned in Lecture 2 for the ggplot2 package
8 Data Wrangling in R
5
10
15

20
25
1970 1980 1990 2000 2010
date
unemployment
savings

b. Replicate the following figure, where the date starts from the year 1990.
5
10
15
20
25
Jan 1990
Jan 1992

Jan 1994
Jan 1996
Jan 1998
Jan 2000
Jan 2002
Jan 2004

Jan 2006
Jan 2008
Jan 2010
Jan 2012
Jan 2014
Jan 2016
date
unemployment
savings

4. (25 pts total) Work with the GOES-R dataset mentioned in class
a. (4pts) load the DMWC_G16.nc dataset in R, extract variables: wind_speed, wind_direction, lat, lon,
9 Data Wrangling in R
time, pressure, temperature, local_zenith_angle, solar_zenith_angle, DQF, save it into a data
frame as shown below

b. (8pts) Convert the data frame dat into an sf object named df, where only observations with DQF equal
to 0 are kept as in Lecture 3, and then replicate the following figure with the following requirements:
• using the filled square shape with size .1
• using the scico::vik color palette
• using the wrap_plot() function or the pip operator “+” to arrange the columns

15°N
20°N
25°N
30°N
35°N
40°N
45°N
50°N

80°W 75°W 70°W 65°W 60°W 55°W
3 5 8
ws
15°N
20°N
25°N
30°N
35°N
40°N

45°N
50°N
80°W 75°W 70°W 65°W 60°W 55°W
700 850 1000
press
15°N
20°N
25°N
30°N
35°N
40°N
45°N
50°N

80°W 75°W 70°W 65°W 60°W 55°W
276 288 301
temp
c. (5pts) In the df data frame, pivot the variables ws, press, temp into longer format and give it a new
name variable with their values stored in the new variable value. Then save this new dataset into a
tibble p and print out the first 6 observations in this new data frame. You should obtain the following
output

## Simple feature collection with 6 features and 7 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -63.0746 ymin: 50.24621 xmax: -60.31003 ymax: 50.88714
## Geodetic CRS: WGS 84
## # A tibble: 6 x 8

## wd time lza sza DQF geometry variable value
##
## 1 209. 656121674. 60.0 77.1 0 (-60.31003 50.88714) ws 29.6
## 2 209. 656121674. 60.0 77.1 0 (-60.31003 50.88714) press 746.
## 3 209. 656121674. 60.0 77.1 0 (-60.31003 50.88714) temp 280.
## 4 263. 656121674. 58.7 78.3 0 (-63.0746 50.24621) ws 3.42
10 Data Wrangling in R

## 5 263. 656121674. 58.7 78.3 0 (-63.0746 50.24621) press 989.
## 6 263. 656121674. 58.7 78.3 0 (-63.0746 50.24621) temp 278.
d. (8pts) Replicate the exact figure with the following requirements:
• using the filled square shape with size .1
• using the scico::vik color palette

15°N
20°N
25°N
30°N
35°N
40°N

45°N
50°N
press
80°W75°W70°W65°W60°W55°W
800 900
15°N
20°N
25°N

30°N
35°N
40°N
45°N
50°N
temp
80°W75°W70°W65°W60°W55°W
280 290 300
15°N
20°N
25°N

30°N
35°N
40°N
45°N
50°N
ws
80°W75°W70°W65°W60°W55°W
10 20
11 Data Wrangling in R