Sale!

# CSC 4760/6760 Big Data Programming Assignment 4 solution

Original price was: \$35.00.Current price is: \$30.00.

Category:

## Description

1. (100 points) (Computing PageRank in Spark)
Dataset:
The toy dataset is the following graph. The PageRank values are already known. We can use it to
Figure 1: A toy graph for computing PageRank. The number on the edge represents the
transition probability from one node to another.
The PageRank values are given in the following table (given that the decay factor π = 0.85):
Nodes PageRank Values
1 0.1556
2 0.1622
3 0.2312
4 0.2955
5 0.1556
PageRank:
Compute the PageRank value of each node in the graph. Please refer to the slides for more details
about the PageRank method. The key PageRank equation is as follows.
π« = ππ
β€π« + (1 β π)π/π
where π« represents the π Γ 1 PageRank vector with each element π«π
representing the PageRank
value of node π, π represents the number of nodes in the graph, π represents the π Γ π transition
probability matrix with each element ππ,π = ππ,π =
1
ππ
representing the transition probability from
node π to node π, ππ
represents the degree of node π, π
β€ represents the transpose of π, π β (0,1)
represents a decay factor, π represents a π Γ 1 vector of all 1βs, and π represents the number of
nodes in the graph.
Please see the slides for more details.
In this assignment, we set the decay factor π = 0.85 and set the number of iterations to 30.
Implementation:
Design and implement a PySpark program to compute the PageRank values. A template
βPageRank_Spark_Incomplete.pyβ file is given. You need to add 6 lambda functions in the file.
For example:
Line 13: AdjList2 = AdjList1.map(lambda line : line) # 1. Replace the lambda function with yours
You need to replace βlambda line : lineβ with your own lambda function. The inputs to the lambda
function should be not changed.
The outputs in the terminal of the ground-truth solutions is given in the file βTerminalOutputs.txtβ.
You may use it to understand the source code, debug your code, and verify your solution.
Example command to run the β.pyβ file:
\$ spark-submit PageRank_Spark_Incomplete.py
The files can be put in local file system.
Report: