## Description

1 Image Filtering [50 pts]

In this first section, you will explore different ways to filter images. Through these tasks you will build

up a toolkit of image filtering techniques. By the end of this problem, you should understand how the

development of image filtering techniques has led to convolution.

Image Patches [8 pts] A patch is a small piece of an image. Sometimes we will focus on the patches of

an image instead of operating on the entire image itself.

1. (5 pts) Take the image ’grace hopper.png’, load it as grayscale, and divide the image into 16

by 16 pixel image patches. Normalize each patch to have zero mean and unit variance. Complete the

function image patches in filters.py. Plot three of the 16×16 image patches in your report.

2. (3 pts) Early work in computer vision used unique images patches as descriptors or features of images

for applications ranging from image alignment and stitching to object classification and detection.

Inspect the patches extracted in the previous question, and discuss, in a few sentences, why they

would be good or bad descriptors. Consider how those patches would look like if we changed the

object’s pose, scale, illumination, etc.

Gaussian Filter [16 pts] A Gaussian filter is a filter whose impulse response is a Gaussian function. Here

we only talk about the discrete kernel and assume 2D Gaussian distribution is circularly symmetric.

1D kernel : G(x) = 1

√

2πσ2

e

− x

2

2σ2 2D kernel : G(x, y) = 1

2πσ2

e

−

x

2+y

2

2σ2

1. (5 pts) For a 2D Gaussian filter with a given variance σ

2

, the convolution can be reduced by sequential

operations of a 1D kernel. Prove that a convolution by a 2D Gaussian filter is equivalent to sequential

convolutions of a vertical and a horizontal 1D Gaussian filter. Specify the relationship between the

2D and 1D Gaussian filter, especially the relationship between their variances.

2. (4 pts) Take the image ’grace hopper.png’ as the input. Complete the function convolve()

and other related parts in filters.py. Use a Gaussian kernel with size 3 × 3 and σ

2 ≈

1

2 ln 2 . Plot

the output images in your report. Describe what Gaussian filtering does to the image in one sentence.

Be sure to implement convolution and not cross-correlation.

3. (3 pts) Consider the image as a function I(x, y) and I : R

2 → R. When working on edge detection,

we often pay a lot of attention to the derivatives. Denote the derivatives:

Ix(x, y) = ∂I

∂x(x, y) ≈

1

2

(I(x + 1, y) − I(x − 1, y))

Iy(x, y) = ∂I

∂y (x, y) ≈

1

2

(I(x, y + 1) − I(x, y − 1))

Derive the convolution kernels for derivatives: (i) kx ∈ R

1×3

: Ix = I∗kx; (ii) ky ∈ R

3×1

: Iy = I∗ky.

Follow the detailed instructions in filters.py and complete the function edge detection()

in filters.py, whose output is the gradient magnitude.

4. (4 pts) Use the original image and the Gaussian-filtered image as inputs respectively. Plot both outputs

in your report. Discuss the difference between the two images in no more than three sentences.

2 CSE 803

Sobel Operator [18 pts] The Sobel operator is often used in image processing and computer vision.

Technically, it is a discrete differentiation operator, computing an approximation of the derivative of the

image intensity function.

1. (5 pts) Focus on the derivatives of the Gaussian-filtered image. Suppose we use this Gaussian kernel:

kGaussian =

1 2 1

2 4 2

1 2 1

Denote the input image as I. Prove that the derivatives of the Gaussian-filtered image (I ∗ kGaussian)

can be approximated by :

Gx = I ∗

1 0 −1

2 0 −2

1 0 −1

Gy = I ∗

1 2 1

0 0 0

−1 −2 −1

These are the sobel operators.

Hint: To derive the derivatives of an image, you can use the conclusion in Gaussian Filter – Q3, i.e.

Gx can be calculated by (I ∗ kGaussian)x using the kx derived before.

2. (4 pts) Take the image ’grace hopper.png’ as the original image I. Complete the corresponding

part of function sobel operator() in filters.py with the kernels given previously. Plot the

Gx, Gy, and the gradient magnitude.

3. Now we want to explore what will happen if we use a linear combination of the two Sobel operators.

Here, the linear combination of the original kernels is called a steerable filter.

(a) (3 pt) We want to use 3×3 kernel to approximate S(I, α) = Gx cos α + Gy sin α. Derive the

kernel in terms of α, i.e. the K(α) which makes S(I, α) = I ∗ K(α).

(b) (3 pts) Complete the function steerable filter() in filters.py.

Take α = 0,

π

6

,

π

3

,

π

2

,

2π

3

,

5π

6

. Plot the output images in your report.

(c) (3 pts) Observe the plotting results. What do these kernels detect? Discuss how the outputs

change with α.

LoG Filter [8 pts] Laplacian is a differential operator: ∇2

I(x, y) = ∂

2

I(x,y)

∂x2 +

∂

2

I(x,y)

∂y2 . And the Laplacian

of Gaussian (LoG) operation is very useful in computer vision.

1. (5 pts) In filters.py, you are given two LoG filters. You are not required to prove that they are

LoG, but you are encouraged to know what an LoG filter looks like. Complete the corresponding part

in filters.py. Plot the outputs of these two LoG filters in your report. Compare the two results.

Explain the reasons for the difference. Discuss whether these filters can detect edges. Can they detect

anything else?

Hint: We can take the high-value pixels in the outputs as the detected parts.

2. (3 pts) Instead of calculating LoG, we can often approximate it with a simple Difference of Gaussians

(DoG). Try to explain why this approximation works.

Hint: Try visualizing the following functions: two Gaussian functions with different variances, the

difference between the two functions, Laplacian of a Gaussian function.

3 CSE 803

2 Feature Extraction [15 pts]

While edges can be useful, corners are often more informative features as they are less common. In this

section, we implement a Harris Corner Detector (see: https://en.wikipedia.org/wiki/Harris Corner Detector)

to detect corners. Corners are defined as locations (x, y) in the image where a small change any direction

results in a large change in intensity if one considers a small window centered on (x, y) (or, intuitively,

one can imagine looking at the image through a tiny hole that’s centered at (x, y)). This can be contrasted

with edges where a large intensity change occurs in only one direction, or flat regions where moving in any

direction will result in small or no intensity changes. Hence, the Harris Corner Detector considers small

windows (or patches) where a small change in location leads large variation in multiple directions (hence

corner detector).

This question looks a bit long, but that is only because there is a fairly large amount of hand-holding

involved. The resulting solution, if done properly, is certainly under 10 lines.

2.1 Corner Score [5pt]

Let’s consider a grayscale image where I(x, y) is the intensity value at image location (x, y). We can

calculate the corner score for every pixel (i, j) in the image by comparing a window W centered on (i, j)

with that same window centered at (i+u, j+v). Specifically, we will compute the sum of square differences

between the two,

E(u, v) = X

x,y∈W

[I(x + u, y + v) − I(x, y)]2

or, for every pixel (x, y) in the window W centered at i, j, how different is it from the same window, shifted

over (u, v). This formalizes the intuitions above:

• If moving (u, v) leads to no change for all (u, v), then (x, y) is probably flat.

• If moving (u, v) in one direction leads to a big change and adding (u, v) in another direction leads to

a small change in the appearance of the window, then (x, y) is probably on an edge.

• If moving any (u, v) leads to a big change in appearance of the window, then (x, y) is a corner.

You can compute this E(u, v) for all (u, v) and at all (i, j).

Your first task is to write a function that calculates this function for all pixels (i, j) with a fixed offset (u, v)

and window size W. In other words, if we calculate S = cornerscore(u, v), S is an image such that Sij is

the SSD between the window centered on (i, j) in I and the window centered on (i + u, j + v) in I. The

function will need to calculate this function to every location in the image. This is doable via a quadruple

for-loop (for every pixel (i, j), for every pixel (x, y) in the window centered at (i, j), compare the two). Use

same padding for offset-window values that lie outside of the image.

Complete the function corner score() in corners.py which takes as input an image, offset values

(u, v), and window size W. The function computes the response E(u, v) for every pixel. We can look at,

for instance the image of E(0, y) to see how moving down y pixels would change things and the image of

E(−x, 0) to see how moving left x pixels would change things.

Plot your output for u, v that shift things left/right/up/down by 5 pixels.

Early work by Moravec [1980] used this function to find corners by computing E(u, v) for a range of offsets

and then select the pixels where the corner score why high for all offsets. In a few sentences, discuss why

this might be impractical.

4 CSE 803

2.2 Harris Corner Detector [10pt]

For every single pixel (i, j), you now have a way of computing how much changing by (u, v) changes

the appearance of a window (i.e., E(u, v) at (i, j)). But in the end, we really want a single number of

“cornerness” per pixel and don’t want to handle checking all the (u, v) values at every single pixel (i, j).

You’ll implement the cornerness score invented by Harris and Stephens [1988].

Harris and Stephens recognized that if you do a Taylor series of the image, you can build an approximation

of E(u, v) at a pixel (i, j). Specifically, if Ix and Iy denote the image of the partial derivatives of I with

respect to x and y, then

E(u, v) ≈

X

W

I

2

xu

2 + 2IxIyuv + I

2

y

v

2

= [u, v]

P

W I

2

x

P

P

W IxIy

W IxIy

P

W I

2

y

[u, v]

T = [u, v]M[u, v]

T

where M is called the structure tensor. To avoid extreme notation clutter, the sums are over x, y in a window

W centered at i, j and any image (e.g., Ix) is assumed to be indexed by x, y. In other words: the top-left

entry of M is P

(x,y)∈W (I

2

x

)x,y where W is the set of pixels in a window centered on (i, j). This matrix M

is a reasonable approximation of how the window changes at each pixel and you can compute M at every

single pixel (i, j) in the image.

What does this do for our lives? We can decompose the M we compute at each pixel into a rotation matrix

R and diagonal matrix diag([λ1, λ2) such that (specifically an eigen-decomposition):

M = R−1diag([λ1, λ2])R

where the columns of R tell us the directions that E(u, v) most and least rapidly changes, and λ1, λ2 tell us

the maximum and minimum amount it changes. In other words, if both λ1 and λ2 are big, then we have a

corner; if only one is big, then we have an edge; if neither are big, then we are on a flat part of the image.

Unfortunately, doing this decomposition is slow, and Harris and Stephens were doing this over 30 years ago.

Harris and Stephens had two other tricks up their sleeve. First, rather than calculate the eigenvalues directly, for a 2×2 matrix, one can compute the following score, which is a reasonable measure of what the

eigenvalues are like:

R = λ1λ2 − α(λ1 + λ2) = det(M) − αtrace(M)

2

which is far easier since the determinants and traces of a 2×2 matrix can be calculated very easily (look this

up). Pixels with large positive R are corners; pixels with large negative R are edges; and pixels with low R

are flat. In practice α is set to something between 0.04 and 0.06. Second, the sum that’s being done weights

pixels across the window equally, when we know this can cause trouble. So instead, Harris and Stephens

computed a M where the contributions of Ix and Iy for each pixel (i, j) were weighted by a Gaussian kernel.

Ok, so how do I implement it?

1. In your implementation, you should first figure out how to calculate M for all pixels just using a

straight-forward sum.

You can compute it by brute force (quadruple for-loop) or convolution (just summing over a window).

In general, it’s usually far easier to write a slow-and-not-particularly-clever version that does it brute

force. This is often a handful of lines and requires not so much thinking. You then write a version that

is convolutional and faster but requires some thought. This way, if you have a bug, you can compare

with the brute-force version that are pretty sure has no issues.

You can store M as a 3-channel image where, for each pixel (i, j) you store M1,1 in the first channel,

M1,2 in the second and M2,2 in the third. Storing M2,1 is unnecessary since it is the same as M1,2.