Sale!

CS 576 Assignment 1- Assignment 3 Solutions

Original price was: $85.00.Current price is: $72.00.

Download Details:

  • Name: Assignment-1-umursi.zip
  • Type: zip
  • Size: 544.39 KB

  • Name: Assignment-2-kcgysx.zip
  • Type: zip
  • Size: 304.27 KB

  • Name: Assignment-3-srkp9b.zip
  • Type: zip
  • Size: 2.15 MB

Category:

Description

5/5 - (1 vote)

CS 576 Assignment 1

Theory Part (30 points)

Question 1: (5 points)

 

The following sequence of real numbers has been obtained sampling an audio signal: 1.8,
2.2, 2.2, 3.2, 3.3, 3.3, 2.5, 2.8, 2.8, 2.8, 1.5, 1.0, 1.2, 1.2, 1.8, 2.2, 2.2, 2.2, 1.9, 2.3, 1.2,
0.2, -1.2, -1.2, -1.7, -1.1, -2.2, -1.5, -1.5, -0.7, 0.1, 0.9 Quantize this sequence by dividing
the interval [-4, 4] into 32 uniformly distributed levels by placing the level 0 at -3.75, the
level 1 at -3.5, and so on and level 31 at 4.00. Also, remember that quantization should
result in least error
 Write down the quantized sequence. (4 points)
 How many bits do you need to transmit it? (1 points)

 

Question 2: (10 points)

A high-definition film color camera has 1080 lines per frame, 1920 pixels per line, with a
24 Hz capture frame rate. Each pixel is quantized with 12 bits per channel during the
quantization process.

The capture pipeline employs the follow sequence

1. YUV 4:2:0 color subsampling scheme
2. An optional feature, to the signal to standard definition CIF (352×288)
3. An obligatory MPEG2 compression phase
4. Disk write with a varying disk write speed (12 to 36 Mbytes per second).

Answer the following questions

 If the second optional feature is off, what minimal compression ratio needs be
achieved by the third compression step process? (4 points)
 If the second optional feature is turned on to produce CIF format, how does your
previous answer change? (3 points)
 If original pixels were square, how do the pixel stretch with the second optional
feature turned on.? (3 points)

Question 3: (15 points)

Temporal aliasing can be observed when you attempt to record a rotating wheel with a
video camera. In this problem, you will analyze such effects. Assume there is a car
moving at 36 km/hr and you record the car using a film, which traditionally records at 24
frames per second. The tires have a diameter of 0.4244 meters. Each tire has a white
mark to gauge the speed of rotation. (15 points)

 If you are watching this projected movie in a theater, what do you perceive the
rate of tire rotation to be in rotations/sec? (3 points)

 If you use your camcorder to record the movie in the theater and your camcorder
is recording at one third film rate (ie 8 fps), at what rate (rotations/sec) does the
tire rotate in your video recording (6 points)

 The driver decides to participate in race, and buys tires that safely allow a max
speed of 180 km/hr. What must be the diameter of the tire if no temporal aliasing
needs to be witnessed in the recording? (6 points)

Programming Part (120 points)

This assignment will help you gain a practical understanding of Quantization and
Subsampling to analyze how it affects visual media types like images and video.
We have provided you with a Microsoft Visual C++ project and a java class to display
two images side by side (original on the left and a processed output on the right).
Currently both left and right correspond to the same input image. You are free to any of
these as a start.

Your program will be invoked using seven parameters where
YourProgram.exe C:/myDir/myImage.rgb Y U V Sw Sh A
 The first parameter is the name of the image, which will be provided in an 8 bit
per channel RGB format (Total 24 bits per pixel). You may assume that all
images will be of the same size for this assignment (HD size = 1920wx1080h),
more information on the image format will be placed on the class website

 The next three parameters are integers control the subsampling of your Y U and
V spaces respectively. For sake of simplicity, we will follow the convention that
subsampling occurs only along the width dimension and not the height. Each of
these parameters can take on values from 1 to n for some n, 1 suggesting no sub
sampling and n suggesting a sub sampling by n

 The next two parameters are single precision floats Sw and Sh which take positive
values < 1.0 and control the scaled output image width and height independently.

 Finally a integer A ( 0 or 1) to suggest that antialiasing (prefiltering needs to be
performed). 0 indicates no antialiasing and vice versa
Example invocations shown below should give you a fair idea about what your input
parameters do and how your program will be tested.

1. YourProgram,exe image1.rgb 1 1 1 1.0 1.0 0
No subsampling in the Y, U or V, and no scaling in w and h and no antialiasing, which
implies that the output is the same as the input

2. YourProgram,exe image1.rgb 1 1 1 0.5 0.5 1
No subsampling in Y, U or V, but the image is one fourth its original size (antialiased)
3. YourProgram,exe image1.rgb 1 2 2 1.0 1.0 0
The output is not scaled in size, but the U and V channels are subsampled by 2. No
subsampling in the Y channels.

Now for the details – In order the display an image on a display device, the normal choice
is an RGB representation. Here is the dataflow pipeline that illustrates all the steps.
1.Read Input
Image (RGB)
Display Input
Image
2. Convert to
YUV space
3. Process YUV
subsampling
4. Adjust up sampling
for display
5. Convert
back to RGB
space
Display Output
Image
This code is already provided to you,
if you choose to make use of it
The RGB to YUV with the conversion
matrix is given below
Sub sample Y U and V separately
according to the input parameters
6. Scale RGB image
with Sw & Sh
respecting choice of
A
Adjust sample values. Although samples
are lost, prior to further process, all
values must be interpolated in place
Apply the inverse matrix to get the RGB
data
Scale the RGB image and display
final output

Conversion of RGB to YUV

Given R, G and B values the conversion from RGB to YUV is given by a matrix
multiplication
Y 0.299 0.587 0.114 R
U = 0.596 -0.274 -0.322 G
V 0.211 -0.523 0.312` B

Remember that if RGB channels are represented by n bits each, then the YUV channels
are also represented by the same number of bits.
Conversion of YUV to RGB
Given R, G and B values the conversion from RGB to YUV is given by the inverse
matrix multiplication
R 1.000 0.956 0.621 Y
G = 1.000 -0.272 -0.647 U
B 1.000 -1.106 1.703 V

Sub sampling of YUV & processing

Sub sampling, as you know will reduce the number of samples for a channel.
Eg for the input parameters

YourProgram.exe image1.rgb 1 2 2 256
In this example, the YUV image is not subsampled in Y, but by 2 in U and by 2 in V
resulting in
When converting back to the RGB space, all the YUV channels have to be of the same
size.

However the sampling throws away samples, which have to be filled in
appropriately by average the neighborhood values. For example, for the above case a
local image area would look like
Y11U11V11 Y12 Y13U13V13 Y14 . . . . . line 1
Y21U21V21 Y22 Y23U23V23 Y24 . . . . . line 2
Y31U31V31 Y32 Y33U33V33 Y34 . . . . . line 3
Y41U41V41 Y42 Y43U43V43 Y44 . . . . . line 4

The missing values may be filled in using filters. Here is an example
U12 = (U11 + U13)/2 V12 = (V11 + V13)/2
U14 = (U13 + U15)/2 V14 = (V13 + V15)/2

Or you may choose to invent your own filter using appropriate valid neighborhood
samples
Y11U11V11 Y12U12V12 Y13U13V13 Y14U14V14 . . . . . line1
Y21U21V21 Y22U22V22 Y23U23V23 Y24U24V24 . . . . . line 2
Y31U31V31 Y32U32V32 Y33U33V33 Y34U34V34 . . . . . line 3
Y41U41V41 Y42U42V42 Y43U43V43 Y44U44V44 . . . . . line 4

Note the samples that you take to fill in values will change depending on the subsampling
parameters. The YUV components can now be converted to RGB space.

Scaling with Antialiasing

Your output image width and height will change to a new (smaller) value depending on
scale factors Sw and Sh. You will need to create the output image by resampling the input
image. This can be achieved by inverse mapping all destination pixel indexes [i,j] to their
source location indexes. Depending on whether you need to perform antialiasing the
destination resampled pixel value can be the value of your inverse mapped source pixel
(A=0) or the average of small neighborhood around the inverse mapped source pixel
(A=1). To compute the average, you may use a 3×3 kernel.

What should you submit ?

 Your source code, and your project file or makefile, if any, using the submit
program. Please do not submit any binaries or data files. We will compile your
program and execute our tests accordingly.

 Along with the program, also submit an electronic document (word, pdf,
pagemaker etc format) using the submit program that answers the fore-mentioned
analysis questions. You may use any (or all) input images for this analysis.

 

CS 576 Assignment 2 Solved

Question1: Color Theory – 10 points (courtesy TA Jared Hwang)

A Rubik’s cube is a cube-shaped puzzle with 6 different 3×3 colored tiled sides: white,
green, red, blue, orange, and yellow. The goal of the puzzle is to rotate sides and make
each face have 3×3 tiles of the same color. When held under different colored lights
(white, red, green, blue) the cube looks very interesting and vivid, see below:

Question 2: Color Theory (10 points)

• The chromaticity diagram in (x, y) represents the normalized color matching
functions X, Y and Z. Prove that (2 points)
Z = [ (1-x-y)/y ] Y
Here you are tasked with mapping the gamut of a printer to that of a color CRT
monitor. Assume that gamuts are not the same, that is, there are colors in the printer’s
gamut that do not appear in the monitor’s gamut and vice versa. So in order to print a
color seen on the monitor you choose the nearest color in the gamut of the printer.

Answer the following questions

• Explain why this happened. Why do some
tiles look bright, almost glowing, while
others appear muted and devoid of their
original color? (4 points)
• Assuming ideal conditions, you have the
following lighting conditions to solve the
puzzle – under pure yellow light or under
red light. Which of these two light choices
make it harder to solve? Give reasons for
your choice of answer. (6 points)

• Comment (giving reasons) whether this algorithm will work effectively? (2
points)
• You have two images – a cartoon image with constant color tones and a real
image with varying color tones? Which image will this algorithm perform better –
give reasons? (2 points)
• Can you suggest improvements rather than just choosing the nearest color? (4
points)

Question 3: Entropy Coding 10 points –

Consider a communication system that gives out only two symbols X and Y. Assume that
the parameterization followed by the probabilities are P(X) = xk
and P(Y) = (1-x
k
)
• Write down the entropy function and plot it as a function of x for k=2. (1
points)

• From your plot, for what value of x with k=2 does H become a minimum? (1
points)
• Your plot visually gives you the minimum value of x for k=2, find out a
generalized formula for x in terms of k for which H is a minimum (3 points).
• From your plot, for what value of x with k=2 does H become a maximum? (1
points)
• Your plot visually gives you the maximum value of x for k=2, find out a
generalized formula for x in terms of k for which H is a maximum (4 points).

Question 4: Huffman Coding/Entropy 10 points – (courtesy to TA Jared Hwang)

Bob has a pen pal, Alice, who has been learning about information theory and
compression techniques. Alice decides from now on that they should exchange letters as
encoded signals so they can save on ink. The following is a letter that Alice sends to Bob
on her trip to Paris:

• Find and show a Huffman code for the body of Alice’s postcard (i.e.
exclude “Dear Bob” and “Alice”). Treat each word as a symbol, and don’t
include punctuation. What is the average code length? (3 points)
Bob, having just learned about the telegram in history class, suggests to Alice that they
can try writing their letters as telegram messages to shorten them even more. He sends
Alice what her postcard might look like as a telegram:

Dear Bob,
Hello from Paris!
I got this postcard from the Louvre. You
would love Paris! I hope to hear from you.
Alice
• Find a Huffman code for the telegram message. What is the average code
length? How does it compare to the original letter? (3 points)
• Which version of the message, postcard, or telegram, contains more
information? Show quantitatively and explain qualitatively where the
difference (if any) comes from. (4 points)

Programming Part (160 points)

This assignment will help you gain a practical understanding of analyzing color channels
especially as it pertains to image segmentation. While image segmentation is a
challenging task in the field of image processing and computer vision, the process has
been made simpler via the use of green screen and chroma keying techniques.

I am sure
you are all too familiar with online video conferencing applications such as zoom, webex
where you can change your background with or without a green screen. Here you will
implement similar functionality and hopefully get an opportunity to explore color and
color spaces.

You will be given two input videos in the same rgb format – each video will be a
640×480 video that plays at 24 fps for a total of 20 seconds (480 frames). The frames are
named with a base name and indexed by frame number eg basename.0001.rgb,
basename.0002.rgb … basename.0600.rgb. You are free to use extend the display code
sample given of assignment 1 to display a sequence of images at the frame rate of display
and implement the color algorithms needed in this video. (no scripting languages such
as MATLAB or python please!).

To invoke your program we will compile it and run it at the command line as
YourProgram.exe C:/myDir/foreGroundVideo C:/myDir/backGroundVideo mode
Where,
• foreGroundVideo is the base name for a green screened foreground video, which
has a foreground element (actor, object) captured in front of a green screen.
• backGroundVideo is any normal video
• mode is a mode that can take values 1 or 0. 1 indicating that the foreground video
has a green screen, and 0 indicating there is no green screen.
IN PARIS POSTCARD FROM LOUVRE STOP
YOU WOULD LOVE STOP
HOPE HEAR FROM YOU STOP
Implementation details for mode 1:
In this mode you have a green screen used in the foreground video. While the color of
screen used is intended to be constant, practically it never is and has slight variations that
come across in the captured video.

While a specific color might have been used, the
actual RGB pixel values of the screen can vary depending on lighting conditions,
shadows cast, noise and quantization in the capture process etc. Normally thresholds may
be used to decide how to detect green screen pixels. In your implemented you need to
arrive at these thresholds by analyzing your video.
For all frames write a process that will detect the green screen pixels in the foreground
video and replace them with the corresponding background video pixels in all the frames.

Image taken from https://en.wikipedia.org/wiki/Chroma_key
Some thoughts that you might want to consider
• How do you detect the thresholds to label a pixel as a green screen pixels given that
the screen pixels might have noise, shadows etc.
• You can certainly make this work by processing in the RGB color space, but other
color spaces might work better like HSV. Here are references that were shared in
the class and should give you an understanding of these spaces.
https://en.wikipedia.org/wiki/HSL_and_HSV
https://www.cs.rit.edu/~ncs/color/

• To have good quality at the boundaries of the composition (where foreground
regions and background video meet), can you think how you to blend boundary
pixels correctly, so that the foreground element does not appear stuck on a
background but blends in nicely

Implementation details for mode 0:

In this mode your foreground video does not have any constant colored green screen and
while this is a hard problem to find automatically, the foreground videos we give you will
have the foreground element (actor, object) moving in every frame while the camera is
static.

In other words, you should be able to arrive at your green screen pixels by
comparing two frames where – pixels that are constant (not changing within some
threshold) can be assumed to be “green screen” pixels and hence can be replaced by the
corresponding pixels in the background video. This algorithm is known as background
subtraction.

For example, shown below are two frames from a static camera. Comparing
corresponding pixels in frame1 and frame2 (for each x,y), you should be able to assess
pixels that have “not changed” and hence can serve as “green screen” pixels. Also, pixels
that have changed and hence are foreground pixels. You may then proceed to composite
the other video’s corresponding frame with this extracted green screen. Note – while the
camera may be static, under changing conditions of lighting, motion etc, you might not
get “perfect” results, especially in at the boundaries of the areas in motion. Also, if there
is no motion, then you will not be able to extract this foreground.

Frame 1 Frame 2 Foreground

What should you submit?

• Your source code, and your project file or makefile, if any, using the submit
program. Please do not submit any binaries or images. We will compile your
program and execute our tests accordingly.
• If you need to include a readme.txt file with any special instructions on
compilation, that is fine too.

 

CSCI 576 Assignment 3 Solved

Question 1: DCT Coding (20 points)

In this question you will try to understand the working of DCT in the context of JPEG. Below is
an 8×8 luminance block of pixel values and its corresponding DCT coefficients.
188 180 155 149 179 116 86 96
168 179 168 174 180 111 86 95
150 166 175 189 165 101 88 97
163 165 179 184 135 90 91 96
170 180 178 144 102 87 91 98
175 174 141 104 85 83 88 96
153 134 105 82 83 87 92 96
117 104 86 80 86 90 92 103

• Using the 2D DCT formula, compute the 64 DCT values. Assume that you quantize your DCT
coefficients using the luminance quantization table K1 on page 143 of the uploaded ITU-T JPEG
standard. What does your table look like after quantization? (5 points)
• In the JPEG pipeline, the quantized DCT values are then further scanned in a zigzag order.

Ignoring your DC value, show the resulting zigzag scan AC values. (2 points).
• For this zigzag AC sequence, write down the intermediary notation (5 points)
• For these are luminance values, write down the resulting JPEG bit stream. You will need to
consult standard luminance code tables on page 150 of the ITU-T JPEG standard. (6 points)
• What compression ratio do you get for this luminance block? (2 points)

Programming on DWT Compression (80 points)

This programming assignment will help you gain an understanding of issues that relate to image
compression using wavelets. You will read an RGB file and convert the image pixel to a DWT
representation (as used in the JPEG2000 implementation) for each channel.

Depending on the second
parameter n you will decode both the representations using only n levels of the low pass coefficients and
display the output. Remember all input files will have the same format as explained to the class website.
They will be of size 512×512 (intentionally square and a power of 2 to facilitate easy encoding and
decoding). Your algorithm, whether encoding or decoding, should work on each channel independently.

Input to your program will be 2 parameters where:

• The first parameter is the name of the input image rgb file. (file format is similar to previous
assignments).
• The second parameter n is an integral number from 0 to 9 that defines the low pass level to be
used in your decoding. For a given n, this translates to using 2n
low pass coefficients in rows

and columns respectively to use in the decoding process . Additionally, n could also take a
value of -1 to show progressive decoding. Please see the implementation section for an
explanation
Typical invocations to your program would look like
MyExe Image.rgb 9
This the level 9 or the highest level and as such corresponds to your entire image. Here you are making
use of 2
9 = 512 coefficients in rows and 512 coefficients in columns, which essentially is the input image
itself and so the output should look just like the input.
MyExe Image.rgb 8

This is level 8 and the first level of the wavelet encoded hierarchy in rows and columns. Here you are
making use of 28 = 256 low pass coefficients in rows and 256 low pass coefficients in columns,
MyExe Image.rgb 1
This is level 1 and the eight level in the wavelet encoded hierarchy in rows and columns . Here you are
making use of 22 = 4 low pass coefficients in rows and 4 low pass coefficients in columns,

Encoding Implementation

For the DWT encoding process, convert each row (for each channel) into low pass and high pass
coefficients followed by the same for each column applied to the output of the row processing. Recurse
through the process as explained in class through rows first then the columns next at each recursive
iteration, each time operating on the low pass section until you reach the appropriate level

Decoding Implementation:

Once you reach the appropriate level, zero out all the high pass coefficients. Then perform a recursive
IDWT from the encoded level upto level 9 which is the image level. Yu need to appropriately decode by
zeroing out the unrequested coefficients (just setting the coefficients to zero) and then perform an IDWT.
Progressive Encoding-Decoding Implementation
This is when n = -1. In this case you will go through the creation of the entire DWT representation till
level 0. Then decode each level recursively and display the output. The first display will be at level 0,
then level 1 and so on till you reach level 9. You should see the image progressively improving with
details.

What should you submit?

• Your source code, and your project file or makefile. Please confirm submission procedure from
the TAs. Please do not submit any binaries or data sets. We will compile your program and
execute our tests accordingly.
• Along with the program, also submit an electronic document (word, pdf, pagemaker etc) for the
written part and any other extra credit explanations.