## Description

Part 1: Camera Projection Matrix Estimation

Learning Objective: (1) Understanding the the camera projection matrix and (2) estimating it using

fiducial objects for camera projection matrix estimation and pose estimation.

Introduction

In this first part you will perform pose estimation in an image taken by an uncalibrated camera. As

we saw in class, pose estimation is incredibly useful; it is used in VR, AR, controller tracking,

autonomous driving, and even satellite docking. Recall that for a pinhole camera model, the camera

matrix is a projective mapping from world (3D) to pixel (2D) coordinates defined up to a

scale.

Above is an arbitrary scale factor. The camera matrix can also be decomposed into intrinsic

parameters and extrinsic parameters .

Let’s look more carefully into what each of the individual parts of the decomposed matrix mean. The

homogenous vector coordinates of indicate the position of a point in 3D space in

the world coordinate system. The matrix represents a translation and the matrix

represents a rotation. When combined they convert points from the world to the camera coordinate

system. An intuitive way to understand this is to think about how aligning the axes of the world

coordinate system to the ones of the camera coordinate system can be done with a rotation and a

translation.

The distinction between camera coordinate and world coordinate systems is a rotation and a

translation.

In this part of the project you will learn how to estimate the projection matrix using objective

function minimization, how you can decompose the camera matrix, and what knowing these lets you

do.

Part 1.1: Implement Camera Projection

In projection_matrix.py you will implement camera projection in the projection(P,

points_3d) from homogenous world coordinates to non-homogenous image

coordinates .

Given the projection matrix , the equation that accomplish this are:

Part 1.2: Implement Objective Function

A camera projection matrix maps points from 3D into 2D. How can we use this to estimate its

parameters? Assume that we have known 2D-3D correspondences for a set of points, that is, for

points with index we have both access to the respective 3D coordinates and 2D

coordinates . Let be an estimation for the camera projection matrix. We can determine how

accurate the estimation is by measuring the reprojection error

between the 3D points projected into 2D and the known 2D points , both in nonhomogeneous coordinates. Therefore we can estimate the projection matrix itself by minimizing the

reprojection error with respect to the projection matrix

In this part, in projection_matrix.py you will implement the objective function

objective_function() that will be passed to scipy.optimize.least_squares for minimization

with the Levenberg-Marquardt algorithm.

Part 1.3: Estimating the Projection Matrix Given Point

Correspondences

Optimizing the reprojection loss using Levenberg-Marquardt requires a good initial estimate for .

This can be done by having good initial estimates for and and which you can multiply to then

generate your estimated . In this part, to make sure that you have the least squares optimization

working properly we will provide you with an initial estimate. In the function you will have to

implement in this part, estimate_projection_matrix() , you will have to pass the initial guess to

scipy.optimize.least_squares and get the appropriate output.

Note: because has only 11 degrees of freedom, we fix .

Part 1.4: Decomposing the Projection Matrix

Recall that

.

Rewriting this gives us:

Where is the first 3 columns of . An operation known as RQ decomposition which will

decompose into an upper triangular matrix and an orthonormal matrix such that ,

where the upper triangular matrix will correspond to and the orthonormal matrix to . In this

part you will implement decompose_camera_matrix(P) where you will need to get the appropriate

matrix elements of to perform the RQ decomposition, and make the appropriate function call to

scipy.linalg.rq() .

Part 1.5: Calculating the Camera Center

In this part in projection_matrix.py you will implement calculate_camera_center(P, K, R)

that takes as input the

projection , intrinsic and rotation matrix and outputs the camera position in world

coordinates.

Part 1.6: Taking Your Own Images and Estimating the

Projection Matrix + Camera Pose

In part 1.3 you were given a set of known points in world coordinates. In this part of the assignment

you will learn how to use a fiducial—an object of known size that can be used as a reference. Any

object for which you have measured the size given some unit (in this project you should use

centimeters).

Example of how one cuboid of known dimensions can be used as a fiducial to create multiple world

coordinate systems

The figure above illustrates how a cuboid of known dimension can be used to create a world

coordinate system and a set of points with known 3D location. Choose an object that you will use as

a fiducial (we recommend using a thick textbook) and measure it. Using a camera capture two

images of the object (you will estimate the camera parameters for both images) keeping in mind the

considerations discussed in class for part 2 and fundamental matrix estimation if you want to reuse

these images. When taking the images, try to estimate the pose of the camera lens of your phone in

the world coordinate system.

Now that you have the dimension of your object (3D points), you can use the Jupyter notebook to

find the image coordinates of the 3D points create your own 2D-3D correspondences for each image.

For each of your 2 images, make initial estimates for and if your estimate is good, using your code

from the previous part you should be able to estimate both the the projection matrix and the camera

pose. Use the code available in the Jupyter notebook to visualize your findings.

Report

Put these answers in your report!

What would happen to the projected points if you increased/decreased the coordinate, or the

other coordinates of the camera center ? Write down a description of your expectations in the

appropriate part of your writeup submission.

Perform this shift for each of the camera coordinates and then recompose the projection matrix

and visualize the result in your Jupyter notebook. Was the visualized result what you expected?

Part 2: Fundamental Matrix Estimation

Diagram of epipolar lines and epipoles. The fundamental matrix maps points from one image to an

epipolar line on the other.

Learning Objective: (1) Understanding the fundamental matrix and (2) estimating it using selfcaptured images to estimate your own fundamental matrix.

In this part, given a set of corresponding 2D points, we will estimate the fundamental matrix. Now

that we know how to project a point from a 3D coordinate to a 2D coordinate, next we’ll look at how

to map corresponding 2D points from two images of the same scene. You can think of the

fundamental matrix as something that maps points from one view into a line in the other view. We

get a line because a point in one image is only a projection to 2D, which means we can’t actually

know the “depth” of that point. As such, from the viewpoint of the other camera, we can see the

entire “line” that our first point could exist on.

The fundamental matrix constraint between two points and in the left and right views,

respectively, is given by the following equation:

Above and are homogenous coordinates in the two views, and is the fundamental

matrix. We can write out the equation above as:

Since takes points in one image and maps them to a line in another image, the constraint says that

we want the point to be on the line represented by . If the point is exactly on the line, then the

constraint and the line-to-point distance between them is also zero.

Hence to estimate we can minimize the line-to-point distance between the point and the line

, for all matching points . This makes estimating the fundamental matrix a least squares

problem.

Part 2.1: Distance Formula

The line-to-point distance formula is defined by the following equation:

where is a line and is the homogenous coordinate for the point. You will

need to implement this in point_line_distance() in fundamental_matrix.py .

Part 2.2: Symmetric line-to-point error

Given a set of matching points we can set up the following symmetric line-to-point error

function for every match .

where is the distance formula between a line and a point, is one of the homogeneous points,

is the fundamental matrix, and is the other homogeneous point. In the first term we use , which

maps from the second to the first view, and in second term we use , which maps from the first to

the second view. SciPy handles the squaring and summing for you so you just need to implement

point_line_distance() .

By applying this symmetric line-to-point error calculation to every pair of points, we get this equation

for the objective function to minimize, as a function of :

This is the equation you will be implementing in signed_point_line_errors() which is in the file

fundamental_matrix.py .

Part 2.3: Least Squares Optimization

Since this is a least squares optimization problem, you will also be making the call to SciPy’s least

squares optimizer. The documentation for this is available here.

You’ll need to give as input: the objective function, your initial estimate of the fundamental matrix,

optimization algorithm, method for computing the Jacobian, and the point pairs. There are more

details on this in the code.

Part 2.4: Try it Yourself

Similar to Part 1, you’ll have to take two images of the same scene and estimate the fundamental

matrix between the two images. Recall that these two images must be from different positions, and

you cannot simply just rotate the camera or zoom the image. You’ll have to save the images in the

same project folder and use the Jupyter notebook to run your fundamental matrix estimator on your

images.

Report

Put these answers in your report!

Why is it that when you take your own images, you can’t just rotate the camera or zoom the

image for your two images of the same scene?

Why is it that points in one image are projected by the fundamental matrix onto epipolar lines in

the other image?

What happens to the epipoles and epipolar lines when you take two images where the camera

centers are within the images? Why?

What does it mean when your epipolar lines are all horizontal across the two images?

Why is the fundamental matrix defined up to a scale?

Why is the fundamental matrix rank 2?

Part 3: RANSAC

Learning objective: (1) Understand the parameters of the RANSAC algorithm. (2) Apply RANSAC to

find the fundamental matrix for a pair of images given imperfect point correspondences (ie outliers).

Now you have a function which can calculate the fundamental matrix from matching pairs of

points in two different images. However, having to manually extract the matching points is

undesirable. Later in the course, you will learn to to automate the process of identifying matching

points in two images. For now we will assume that someone has already run code to find the

matches for you; however, the automated matching is not perfect, and we will use RANSAC to find a

set of true matches (inliers) to calculate . Fortunately, to calculate the fundamental matrix , we

need only 9 matching points, but we want to automate the process of finding them. Below is a highlevel description of how we will implement this section. See the Jupyter notebook and code

documentation for implementation details.

Part 3.1: RANSAC Iterations

We will use a method called RANdom SAmple Consensus (RANSAC) to search through the points

returned by SIFT and find true matches to use for calculating the fundamental matrix . Please

review the lecture slides on RANSAC to refresh your memory on how this algorithm works.

Additionally, you can find a simple explanation of RANSAC at

https://www.mathworks.com/discovery/ransac.html. See section 6.1.4 in the textbook for a more

thorough explanation of how RANSAC works.

You may wonder how many iterations of this algorithm we need to run in order to acheive success. It

turns out RANSAC actually provides probabilistic guarentees of finding the correct model for our

dataset given the number of iterations we run, the size of our sample, and the proportion of true

matches in our dataset. You will right a function to determine how many iterations of RANSAC to run

in order to find the fundamental matrix with a high probability of success. The jupyter notebook will

guide you through this derivation.

Part 3.2: RANSAC Implementation

Next we will implement the RANSAC algorithm. Remember the steps from the link above:

1. Randomly selecting a subset (k=9) of the data set

2. Fitting a model to the selected subset

3. Determining the number of outliers

4. Repeating steps 1-3 for a prescribed number of iterations

For the application of finding true point pair matches and using them to calculate the fundamental

matrix, our subset of the data will be the minimum number of point pairs needed to calculate the

fundamental matrix.

The model we are fitting is the fundamental matrix.

Outliers will be found by using the point_line_distance() error function from part 2 and

thresholding with a certain margin of error.

Part 3.3: RANSAC Visualization

Finally, we will demonstrate running RANSAC on a real image pair, and plot the epipolar lines from

the resulting fundamental matrix. You do not need to write any new code for this section, but use it

as an opportunity in addition to the unit tests to check your results and better understand epipolar

geometry and RANSAC.

Report

Put these answers in your report!

What is the minimum number of RANSAC iterations we would we need to find the fundamental

matrix with 99.9% certainty from a set of proposed matches that have a 90% point

correspondence accuracy? Keep in mind that we need at least 9 point correspondences for our

optimization to find the fundamental matrix in part 2

One might imagine that if we had more than 9 point correspondences, it would be better to use

more of them to solve for the fundamental matrix. Investigate this by finding the number of

RANSAC iterations you would need to run for the above situation with 18 points.

If our dataset had a lower point correspondence accuracy, say 70%, what is the minimum

number of iterations needed to find the fundamental matrix with 99.9% certainty?

Part 4: Recovering relative camera pose from epipolar

geometry

Note: This section is compulsory for graduate students. Undergraduate students are welcome to

attempt this section for extra credits.

Once we have the fundamental matrix, we will try and recover the relative rotation and translation

between two camera poses.

4.1 Recovering essential matrix from the fundamental matrix

Let’s work out some of the math and introduce the essential matrix.

Recall the epipolar constraint . If we assume that both these camera share the same

calibration matrix , we can write the projection equation for and . Let and be the

extrinsic matrix for the two cameras. Without loss of generality, we can fix the world coordinate

system such that is . We can then represent as where and are the relative

rotation and translation between the two cameras.

where and are called normalized image coordinates (image coordinates with the effect of

camera intrinsics removed), and are called normalized camera matrices.

We define the essential matrix as the fundamental matrix corresponding to the a pair of

normalized cameras. We can write the epipolar constraint as :

You can now write the essential matrix in terms of . Derive the equation and write the code in

recover_E_from_F() function in recover_rot_translation.py .

Please note that this writeup is not thorough and we expect you to refer to the lecture slides. Section

9.6 from the book Multi-view geometry by Hartley or Zisserman (or any other material of your

preference) is also a good reference

4.2 Recovering relative rotation and translation

We have now computed the essential matrix. Please follow the derivation in the lecture slides for the

essential matrix equation in terms of and : .

We will follow section 9.6.2 from the book multi-view geometry for this part. Note that the proof in

this writeup is not comprehensive and we expect you to go through the detailed proof.

Let us create two constant matrices to help with the notation:

Note that W is an orthogonal matrix, and Z is a skew-symmetric matrix. We can use SVD to write

.

Our goal is to decompose as . where is a skew symmetric matrix and is a rotation

matrix.

As and have the same left null space, we can write . As rotation matrices are

orthonormal, we can write in terms of another rotation matrix , using and which are

orthonormal: .

Plugging in the values of and in the SVD decomposition of , we get two possible values of :

or .

We have to derive the value of from . As we can only derive upto scale, we will restrict the l2-

norm of to be 1. As is a proxy for , we can use the relationship to calculate as the the

third column of .

Hence we have two possible values of rotation, and two possible values of normalized translation

(positive and negative sign).

Complete the function recover_rot_translation_from_E in recover_rot_translation.py .

There is no deliverable in report for this section.

Testing

We have provided a set of tests for you to evaluate your implementation. We have included tests

inside proj2.ipynb so you can check your progress as you implement each section. When you’re

done with the entire project, you can call additional tests by running pytest inside the

“./proj2_unit_tests” directory of the project. Your grade on the coding portion of the project will be

further evaluated with a set of tests not provided to you.

Bells & Whistles (Extra Points)

This assignment includes 10 points of extra for implementing SVD to decompose the essential matrix

as described above. All grad students are required to complete the extra credit.

Writeup

For this project (and all other projects), you must do a project report using the template slides

provided to you. Do not change the order of the slides or remove any slides, as this will affect the

grading process on Gradescope and you will be deducted points. In the report you will describe your

algorithm and any decisions you made to write your algorithm a particular way. Then you will show

and discuss the results of your algorithm. The template slides provide guidance for what you should

include in your report. A good writeup doesn’t just show results–it tries to draw some conclusions

from the experiments. You must convert the slide deck into a PDF for your submission.

If you choose to do anything extra, add slides after the slides given in the template deck to describe

your implementation, results, and analysis. Adding slides in between the report template will cause

issues with Gradescope, and you will be deducted points. You will not receive full credit for your extra

credit implementations if they are not described adequately in your writeup.

Rubric

+66 (+10) pts: Code

22 pts: Part 1

22 pts: Part 2

22 pts: Part 3

10 pts: Part 4

+34 pts: PDF report

10 pts: Part 1

10 pts: Part 2

10 pts: Part 3

4 pts: Conclusions

-5*n pts: Lose 5 points for every time you do not follow the instructions for the hand-in format.

Submission Format

This is very important as you will lose 5 points for every time you do not follow the instructions. You

will have two submission files for this project:

1. .zip via Gradescope containing:

proj2_code/ – directory containing all your code for this assignment

2. _proj2.pdf via Gradescope – your report

Do not install any additional packages inside the conda environment. The TAs will use the same

environment as defined in the config files we provide you, so anything that’s not in there by default

will probably cause your code to break during grading. Do not use absolute paths in your code or

your code will break. Use relative paths like the starter code already does. Failure to follow any of

these instructions will lead to point deductions. Create the zip file using python zip_submission.py

–gt_username (it will zip up the appropriate directories/files for you!) and

hand it through Canvas. Remember to submit your report as a PDF to Gradescope as well.

Credit

Assignment developed by Jacob Knaup, Julia Chen, Stefan Stojanov, Frank Dellaert, and James Hays

based on a similar project by Aaron Bobick; updated by Ayush Baid, Jacob Knaup, and Judy Hoffman.