## Description

You have been hired as a Building Energy Data Scientist. Congratulations! What an exciting new opportunity! In your first job assignment, you must construct algorithms to forecast residential electricity

consumption with data and machine learning (ML). The assignment is organized in a tutorial fashion,

thereby allowing you to practice ML on a relevant real-world energy system example.

## Reading

[Optional] Read “Gated Ensemble Learning Method for Demand-Side Electricity Load Forecasting” first

authored by Eric Burger, available at this URL.

## Background

Several years ago we collected Green Button Data <http://www.greenbuttondata.org/> from the CE 295

students. This data includes hourly electricity consumption from over 20 residences in the East Bay. The

data has been (mostly) cleansed for your convenience including (i) aligning time-stamps, (ii) filling-in missing

data, and (iii) organizing it into an easily readable format.

## Problem 1: Exploratory Data Analysis

Download the data file HW4_Train_Data.csv. Load the CSV data into Matlab or Python

(a) Create a bar plot of the average hourly energy consumption [kWh] for each building. Make the x-axis

the building index number. Make the y-axis the average hourly energy consumption. In red, super

impose error bars on your bars that indicate the standard deviation of hourly energy consumption.

Provide this plot in your report.

(b) Which building has abnormally high variance? Do any buildings have moments of NEGATIVE power

consumption? If so, then we will remove this building from our analysis. This building likely installed solar during the year and the smart meter data aggregates power consumption and solar power

generation, so it’s un-usable for this homework.

(c) Re-organize your energy data set into a 4-D array. In order, the dimensions correspond to (i) building

index, (ii) week-of-year, (iii) day-of-week, and (iv) hour-of-day. For each element in the 4-D vector,

store the normalized energy. That is, divide kWh by the maximum hourly energy consumption for

that building.

Normalize the energy for each building. That is, divide kWh by the maximum hourly energy consumption for that building. In industry and academia, we sometimes refer to these normalized building

electricity consumption trajectories as “load shapes”. Now, generate the following plots:

• In seven separate figures (one for each day-of-week), plot the hourly energy consumption load

shapes vs. hour (x-axis) for each building – all super-imposed.

Page 1 of 5

CE 295: Energy Systems and Control

• In each of the seven figures, plot the average hourly energy consumption in a thick black line.

How to do this? In Matlab, you can re-organize your energy data set into a 4-D array. In order, the

dimensions can correspond to (i) building index, (ii) week-of-year, (iii) day-of-week, and (iv) hour-ofday. For each element in the 4-D vector, store the normalized energy. In Python, you can create a

DataFrame with the Pandas library, where the first three columns are week-of-year, day-of-week, and

hour-of-day. The next columns would be normalized energy for all the buildings. Then you can use

the groupby function in Pandas to help you generate the plots.

Provide the seven plots in your report. They should look similar to Fig. 1 below. Comment on any

observations you might have.

Figure 1: Example of plot you must produce for Problem 1(c).

## Problem 2: Average Model

In this problem, we design an extremely simple forecasting model that is often effective. We call it the

“Average Model”. The average model forecasts the building power to be the average value from historical

data at the HoD and DoW. Mathematically:

Pˆ

avg(k) = X

7

i=1

X

24

j=1

P¯

ij · Uij (k) (1)

where

• P¯ ∈ R

7×24 is the average value from historical data, for each (DoW, HoD) pair.

• U(k) ∈ {0, 1}

7×24. All elements are zero, except the one element which corresponds to (DoW,HoD) at

time step k. For example,

Page 2 of 5

CE 295: Energy Systems and Control

– if k corresponds to 12am on Sunday, then U1,1(k) = 1.

– if k corresponds to 3am on Monday, then U2,4(k) = 1.

– if k corresponds to 11pm on Saturday, then U7,23(k) = 1.

Answer the following questions.

(a) Download test data file HW4_Test_Data.csv. Load the CSV data into Matlab or Python. The test

data includes one week of normalized hourly load for Sunday Sept 7, 2014 00:00 – Saturday Sept 13,

2014 23:00. Generate seven plots (one for each DoW) which visualize HoD (x-axis) and the normalized

hourly energy for the Test Data and Average Model.

(b) Compute the mean absolute error (MAE)

MAE =

1

m

Xm

i=1

|P(k) − Pˆ(k)| (2)

for each DoW and the entire week. Provide these error metrics in your report. Which DoW has the

largest and smallest MAE?

## Problem 3: Autoregressive with eXogeneous Inputs Model (ARX)

In this problem, we will design a forecasting model based on the Autoregressive with eXogenous inputs model

(ARX). We consider the ARX model given by:

Pˆ

arx(k) = X

L

`=1

α` · P(k − `) + Pˆ

avg(k) (3)

where Pˆ(k) is the normalized hourly energy consumption we aim to forecast in the target hour. The inputs

include the previous hourly energy consumption: P(k − 1), P(k − 2), · · · , P(k − L). We will also consider an

exogenous term corresponding to the average power: Pˆ

avg(k). Consequently, the autoregressive terms adjust

the average model based upon temporally local power, over a short retrospective horizon. In total, we must

fit L parameters. Answer the following questions.

(a) Write the ARX model in linear-in-the-parameters form: Y = Φθ as described in the video lectures.

Define the vector Y and matrix Φ.

(b) Formulate a least squares optimization problem to find the optimal parameters α`, ` = 1, 2, · · ·L which

fit your data. Write down the objective function. Define your notation. Is this a convex program?

Why?

(c) Solve your optimization problem with L = 3. In your report, give the optimal values of α

?

1

, α?

2

, α?

3

.

(d) Test your ARX model on the test data set. Generate seven plots (one for each DoW) which visualize

HoD (x-axis) and the normalized hourly energy for the Test Data, the Average Model, and the ARX

model. Report the MAE for each DoW, and the entire week.

Page 3 of 5

CE 295: Energy Systems and Control

Problem 4: Neural Network Model

Next, we consider a neural network (NN) model, and more specifically a single neuron model. As detailed

below, the NN model is just a nonlinear function with many parameters (a.k.a. “weights” in the jargon of

ML). Our NN model is simply: Pˆ

nn(k) = fnn(x; θ) +Pˆ

avg(k) where the NN makes adjustments to the power

forecast relative to the mean.

z =

X

3

`=1

w`x` = w

T x (4)

y = f(z) = tanh(z) (5)

(a) Consider the least squares error:

J =

1

2

Xm

i=1

δ

2

i =

1

2

Xm

i=1

h

y

(i) − y(w

T x

(i)

)

i2

=

1

2

Xm

i=1

h

y

(i) − tanh(w

T x

(i)

)

i2

(6)

Our objective is to fit neural network weights w to the example training data,

x

(i) = [P(k − 1), P(k − 2), P(k − 3)]T

, y(i) = P(k) − P¯TU(k) (7)

where training example index (i) = 1, 2, · · · , m corresponds to time index k = 4, 5, · · · , K. We do this

by solving a nonlinear least squares optimization problem:

minimize J =

1

2

Xm

i=1

h

y

(i) − tanh(w

T x

(i)

)

i2

(8)

which is a nonlinear program because the output y is nonlinear w.r.t. weights w. What are the

optimization variables?

(b) A simple fitting/optimization algorithm is called “gradient descent”:

w

k+1 = w

k − γ ·

∂J

∂w

w

k

(9)

where superscript (·)

k

represents the iterations. Armed with the gradient descent method, our next

goal is to compute the gradient in (9). Use chain rule to expand ∂J/∂w, as done in the lecture videos.

After expanding with chain rule, then express each term analytically.

(c) Implement the gradient descent algorithm yourself. Select an initial guess of w

0 = 0 ∈ R

3

. Use a small

step size/learning rate, such as γ = 10−3 or 10−4

. Perform at least three steps of gradient descent. In

Page 4 of

CE 295: Energy Systems and Control

your report, give the final values of w

?

1

, w?

2

, w?

3

.

(d) Test your NN model on the test data set. Generate seven plots (one for each DoW) which visualize

HoD (x-axis) and the normalized hourly energy for the Test Data, the Average Model, the ARX model,

and the NN model. Report the MAE for each DoW, and the entire week.

## Interesting Remarks

• Stochastic gradient descent (SDG) is a popular alternative to the gradient descent algorithm in (9) in

ML. In gradient descent, we update weights by computing the gradient with ALL training examples.

This can be a computationally heavy task. In SDG, we update a weights based on a RANDOM

SUBSET (e.g. 10%) of training examples within each iteration. This is computationally lighter, since

we significantly reduce the number of forward propagations and backward propagations through the

neural network.

## Deliverables

Submit the following on bCourses. Be sure that the function files are named exactly as specified (including

spelling and case). Please do NOT zip files.

LASTNAME_FIRSTNAME_HW4.PDF

LASTNAME_FIRSTNAME_HW4.xyz which contains your respective Matlab or Python files, i.e. xyz ∈ {m, ipynb}.

Page 5 of 5