Description
THEORY
Problem 5.1 (SVDs and ellipsoids)
This problem concerns the matrix
A =
1
√
10 ”
5 0
3 4 #
.
(a) Find the SVD of A, A = UΣV
T
, specifying the matrix of normalized left-singular vectors U,
the matrix of normalized right-singular vectors U, and the matrix of (non-negative) singular
values Σ. (Since A is square, Σ = Σ.) ˜
(b) Use your results from part (a) to write the SVD in outer-product form, as A = σ1u
(1)(v
(1))
T +
σ2u
(2)(v
(2))
T
.
Now consider the action of A on a unit vector x such that kxk2 = 1 (or, in part (e), kxk2 ≤ 1). In
answering parts (c)–(e) it may help to use the outer-product form from part (b) to write
Ax = UΣV
T x = UΣ¯x = U
”
σ1x¯1 0
0 σ2x¯2
#
= u
(1)σ1x¯1 + u
(2)σ2x¯2, (1)
where ¯x = V
T x and kx¯k2 = 1 since kxk2 = 1 and V is an orthogonal matrix.
(c) What is the unit input direction x, such that kxk2 = 1, that leads to the greatest amplification
(the largest kAxk2)? What is the amplification? What output direction Ax results from
setting x equal to the input direction that yields the greatest amplification?
(d) What is the unit input direction x, such that kxk2 = 1, that leads to the least amplification
(the smallest kAxk2)? What is the amplification? What output direction Ax results from
setting x equal to the input direction that yields the least amplification?
(e) Sketch the set {Ax ∈ R
2
: kxk2 ≤ 1, x ∈ R
2}. This set is the image of the unit ball under the
linear map f(x) = Ax.
(f) Now, sketch the set {x ∈ R
2
: kAxk2 ≤ 1}. The logic is related to that in (c)-(e), but note that
here you are asked to identify the inputs for which the corresponding outputs are constrained
to the unit ball. This set is the pre-image of the unit ball under the linear map f(x) = Ax.
Problem 5.2 (Fitting functions: LS, weighted LS, regularized LS)
This problem concerns using least squares to solve the problem of fitting a function to data. In this
problem we use variants of the least squares method to fit a variety of functions to a set of four
data points:
{(xi
, yi)}
4
i=1 = {(0, 4),(1, 1),(2, 4),(3, 2)}
ECE367 Problem Sets
In all these parts it is recommended to use a computer to calculate and plot the fits; it is also okay
to sketch by hand. Please show your work setting up the various least squares problem that you
will use the computer to solve numerically (and then to plot).
(a) Use least square to fit the “best” affine function y = α0 + α1x to the data. (Note here we are
using the αi as the parameters you want to solve for and (xi
, yi) for the data; in lecture we
used xi to denote the parameters we wanted to solve for.) First, according to (unweighted)
least squares, what are the best choices for α0 and α1? What are the resulting residuals
ri = yi − (α0 + α1xi) for i ∈ [4]? What is the norm-squared of the residual vector krk
2
2 where
r = (r1, r2, r3, r4)? Plot your affine function faffine(x) = α0 + α1x over the range −1 ≤ x ≤ 5
and also plot the four data point (xi
, yi) for comparison.
In parts (b)-(d) we consider fitting the data using the Weighted Least Squares criterion where
the PSD (not necessarily diagonal) weighting matrix is
W =
1 0 0 0
0
a+b
2
a−b
2
0
0
a−b
2
a+b
2
0
0 0 0 1
,
and where various choices for a and b are specified in the ensuing parts.
(b) Find the best fit line for a = b = 3. This is the diagonally-weighted case. What are the best
choices for α0 and α1? What are the resulting residuals ri = yi − (α0 + α1xi) for i ∈ [4]?
What is the norm-squared of the residual vector krk
2
2
? Plot your affine function over the
range −1 ≤ x ≤ 5 and also plot the four data point (xi
, yi) for comparison. Finally, comment
on the differences versus the line you found in (a).
In the following two parts we use a non-diagonal PSD weighted matrix W. To help develop an
interpretation note that the eigen-decomposition of the 2 × 2 matrix in the middle of W is
”
a+b
2
a−b
2
a−b
2
a+b
2
#
=
1
2
”
1 1
1 −1
# ” a 0
0 b
# ” 1 1
1 −1
#
(c) Find the best fit line for a = 4 and b = 2. What are the best choices for α0 and α1? What
are the resulting residuals ri = y1 − (α0 + α1xi) for i ∈ [4]? What is the norm-squared of the
residual vector krk
2
2
? Plot your affine function over the range −1 ≤ x ≤ 5 and also plot the
four data point (xi
, yi) for comparison. Comment on the differences versus the line you found
in (b), focusing on how the current choice of a and b changes the residuals. (Note that the
diagonal entries of the W matrix in part (c) match those of the W matrix in part (b), it is
only the off-diagonals that have changed.)
ECE367 Problem Sets © S. Draper, University of Toronto, 2021 page 4 of 11
University of Toronto ECE367 – Matrix Algebra & Optimization
(d) Find the best fit line for a = 2 and b = 4. What are the best choices for α0 and α1? What
are the resulting residuals ri = yi − (α0 + α1xi) for i ∈ [4]? What is the norm-squared of the
residual vector krk
2
2
? Plot your affine function over the range −1 ≤ x ≤ 5 and also plot the
four data point (xi
, yi) for comparison. Comment on the differences versus the line you found
in (c), again focusing on the role played by the off-diagonal entries of W.
As was mentioned in class, to fit a function to data using the least-squares method, the function
need not be linear (or affine), what is important is that the function is linear in its coefficients. In
this part you use least squares to fit quadratic and cubic functions to the data.
(e) Use the least-squares method to fit the quadratic function fquad(x) = α0 + α1x + α2x
2 and
the cubic function fcubic(x) = α0 +α1x+α2x
2 +α3x
3
to the data. For each function what are
the best choices for the αi
, what are the resulting residuals, and what are the krk
2
2
? Compare
the norm of the residuals to each other and to what you found in part (a).
Finally, we turn to regularized least squares, concentrating on a cubic fit. Here
α
∗ = arg min
α∈R4
X
4
i=1
yi − (α0 + α1xi + α2x
2
i + α3x
3
i
)
2
+ γkαk
2
2
,
where fcubic,γ(x) = α
∗
0 + α
∗
1x + α
∗
2x
2 + α
∗
3x
3
.
(f) Find the regularized cubic fit fcubic,γ(x) for γ = 0.05 and γ = 0.5. Sketch or plot fcubic,γ for
the interval −2 ≤ x ≤ 6 for γ = 0, γ = 0.05 and γ = 0.5. Note that the plot of fcubic,0 is the
same as that which you plotted in part (e).
Problem 5.3 (Lotka’s law and least squares)
OptM Problem 6.3. (Hint: Think about doing a variable transformation to make the problem look
more like a least-square problem.)
Problem 5.4 (`2 regularized least squares), from a previous exam
This problem considers optimization problems of the form
min
x∈R2
kAx − yk
2
2 + γkxk
2
2
(2)
where γ ∈ R and γ ≥ 0. In this problem we denote the x that minimizes (2) as x
∗
γ
.
(a) First consider the case where
A =
1 1
1 −1
2 1
, y =
4
−2
5
, γ = 0.
ECE367 Problem Sets
Find x
∗
0
, the optimal variable that solves (2) for the parameters given. (To be doubly clear,
x
∗
0 = x
∗
γ
|γ=0, which is read as “x
∗
γ
evaluated at γ = 0”.)
(b) Re-derive the following relation derived in class, fully justifying your steps:
kAx − yk
2
2 = kAx∗
0 − yk
2
2 + kA(x − x
∗
0
)k
2
2
,
again note that x
∗
0 = x
∗
γ
|γ=0.
Parts (c) and (d) of this problem concern Figure 1. In Figure 1 we plot two possible paths for x
∗
γ
as a function of γ for the same values of A and y that are specified in part (a).
0
0
Path A
Path B
Optimum if = 0
Figure 1: Path followed by x
∗
γ as a function of γ.
(c) Make a sketch of Fig. 1 in your solutions. To your sketch add some level sets of the objective
of (2) for the case where γ = 0. You must show below work that mathematically justifies
your sketches of the level sets.
(d) You must base your answer to this part on your answer to part (c):
Which path does x
∗
γ
follow as one increases γ? In the space provided below both clearly
indicate which path (either the upper, dashed, “Path A” or the lower, solid, “Path B”) you
think x
∗
γ
follows and provide an explanation for your choice based on your answer to part (c).
Problem 5.5 (Solving least squares problems using the Moore-Penrose pseudoinverse)
In this problem you derive the result that the Moore-Penrose pseudoinverse can be used to solve
least squares problems for overdetermined, underdetermined, and rank-deficient systems. Recall
that least square problems consider the system of linear equations Ax = b where A ∈ R
m,n and
rank(A) = r. If the compact SVD of A is A = UrΣV
T
r where Σ ∈ R
r,r is a positive definite diagonal
matrix of (non-zero) singular values, Ur ∈ R
m,r and Vr ∈ R
n,r each contain orthonormal columns,
then the Moore-Penrose pseudoinverse is A† = VrΣ
−1U
T
r
.
ECE367 Problem Sets
(a) Recall that the overdetermined least squares problem considers an A ∈ R
m,n when m > n
(more constraints in the y vector than parameters in the x vector). Here the objective is to
find an x that minimizes kAx − yk2. Show that an optimal solution is x
∗ = A†y. To show
this recall that an optimal solution x
∗ must satisfy the “normal” equations AT Ax∗ = AT y.
Verify that x
∗ = A†y satisfies the normal equations.
(b) Recall that the underdetermined least squares problem considers an A ∈ R
m,n when m < n
(fewer constraints in the y vector than parameters in the x vector). Here the objective is to
find the x that minimizes kxk2 while satisfying Ax = y (equivalently, satisfying kAx−yk2 = 0).
Show that the optimal solution x
∗ = A†y. To show this recall that the optimal solution x
∗
must satisfy two conditions: (i) x
∗ ∈ R(AT
) and (ii) Ax∗ = y. Verify that x
∗ = A†y satisfies
(i) all the time. Under what conditions does x
∗
satisfy condition (ii)? (Hint, think about the
rank of A.)
In the above two parts we haven’t explicitly considered the role of the rank r of the A matrix. But
we also note that the only place where the rank of A comes into the discussion of parts (a) and (b)
is in the discussion of condition (ii) of part (b).
Recall that r ≤ min{m, n}. When r = min{m, n} the A matrix is full column rank in the overdetermined problem and is full row rank in the underdetermined problem. In both these full-rank cases x
∗
has the simple expression presented in class. Now we consider what happens when r < min{m, n}.
(c) In this part consider the situation where rank(A) = r < m < n. This is a “rank-deficient”
underdetermined least squares problem. If we set x
∗ = A†y, what characteristics do Ax∗
and kx
∗k2 satisfy? (Hint: this is a type of hybrid problem that at the same time can have
characteristics of both overdetermined and underdetermined least squares.)
APPLICATIONS
NOTE: The requirement is to complete any ONE of the three following “application” problems.
Problem 5.6 (Eigenfaces and `2 projection)
In this problem you will familiarize with the concept of Eigenfaces and its uses. Download the
dataset yalefaces.mat from the course website. This dataset consists of 32 × 32 gray scale images of faces taken from the Extended Yale Face Database B (http://vision.ucsd.edu/~leekc/
ExtYaleDatabase/ExtYaleB.html). Load the dataset into your MATLAB environment by executing the command load(‘yalefaces.mat’). You will see a new variable M of dimension 32×32×2414
which consists of 2414 grayscale images, 32 × 32 pixels each. The pixel values of the images range
from 0 to 255. You can view the loaded images by making use of MATLAB built-in functions
imshow or imagesc. As an example, the first image of the dataset can be displayed by executing
imshow(M(:,:,1)/255).
ECE367 Problem Sets
Let N be the number of images in the dataset and let d = 1024, the total number of pixels in each
image. An image can be thought of as a matrix with 32 columns and 32 rows consisting of entries
in the range [0, 255]. In this exercise we consider the images in vector form. Let Ji be the ith image
in the dataset where i ∈ [N]. We formulate a column vector x
(i) ∈ R
d by flattening the matrix that
makes up Ji
. In our case we stack all 32 columns vertically to form a d-dimensional vector x
(i)
.
In MATLAB you can do this efficiently using the command reshape. The ‘average face’ vector of
the dataset ¯x can be computed as ¯x =
1
N
PN
i=1 x
(i)
. We can (roughly) visualize the x
(i)
s as forming
a cloud of data points where the cloud is centered at ¯x in d-dimensional space. Translate the
cloud center to the origin by subtracting ¯x from each x
(i)
. Let the resulting vectors be denoted as
x¯
(i) = x
(i) − x¯, which we will refer to as centered image vectors. Construct a matrix X ∈ R
d×N by
concatenating all the centered vectors ¯x
(i)
, i.e., X =
x¯
(1)
, x¯
(2)
, . . . , x¯
(N)
. The matrix C = XXT
is N times the covariance matrix of the data samples x
(i)
, i ∈ [N].
(a) In class you learned about singular value decomposition (SVD) and eigendecomposition.
What is the connection between the singular values of X and the eigenvalues of C? What
is the connection between the left-singular vectors of X and the eigenvectors of C? Make
sure to describe the reasoning behind your answers, e.g., by describing the singular or eigen
decompositions of each matrix.
(b) Compute the eigenvalue/eigenvector pairs of C and arrange them in decreasing order of the
magnitude of the eigenvalues. Let the jth pair in the ordered set be denoted as λj ∈ R, v(j) ∈
R
d
for j ∈ [d]. You may find the MATLAB command svd or eig helpful. Comment whether
the eigenvalues are real and briefly explain why. Plot log λj against j ∈ [d]. Include your plot
in your solution.
(c) The set of eigenvectors you computed in the previous part form a basis for the centered image
vectors. Reshape each eigenvector v
(j)
to obtain a 32 × 32 matrix. These matrices can be
considered as images and they are known as eigenfaces. Include plots of two sets of eigenfaces,
those corresponding to the largest 10, and those corresponding to the smallest 10, eigenvalues.
Do you observe any difference between the two sets of eigenfaces you plotted? If so briefly
explain the reason for this difference.
(d) In this part, consider the images Ji for i ∈ {1, 1076, 2043}. Let Bj = {v
(1), v(2), . . . , v(j)}
where j = {2
1
, 2
2
, 2
3
, . . . , 2
10}, i.e., j indexes sets each consisting of a different number of the
eigenvectors. Let us denote the `2 projection of ¯x
(i) onto the basis vector set Bj as ¯y
(i,j)
.
Compute ¯y
(i,j)
for the given i, j pairs using MATLAB. (I.e., do this using your numerical tool
and not by hand.) Note that ¯y
(i,j) vectors are computed using the centered image vectors.
Translate these vectors back (un-center them) by the cloud center shift ¯x to get the image
vectors y
(i,j) = ¯y
(i,j) + ¯x. Reshape the y
(i,j) vectors into 32 × 32 matrices and plot them as
images. Note that you will need to plot 30 images and these can be compactly plotted using
subplot command in MATLAB.
(e) In this part you will learn how the eigenfaces can be used for the task of face recognition.


