Sale!

COMP9311 Assignment 3 solved

Original price was: $35.00.Current price is: $28.00.

Category:

Description

5/5 - (1 vote)

Question 1 (5 marks)

Given a graph database D containing following graphs:
1) Suppose minFreq = 3, draw at least 4 frequent patterns/fragments in the graph database
D. A graph/pattern g is frequent if its occurrence frequency is no less than minFreq. (5
marks)

Question 2 (10 marks)

Given the following query q and data graph G.
1) Please draw a Neighborhood Equivalence Class tree (NEC tree) of query q. (5 marks)
q G
The Neighborhood Equivalence Class(NEC) of a query vertex u is a set of query vertices,
which are equivalent to u.

The equivalence is defined as follows:

Let ≅ be an equivalence relation over all query vertices in q such that,
ui (∈V(q))≅uj (∈V (q)) if for every embedding m that contains (ui, vx) and (uj, vy) (vx,
vy∈ V(g)), there exists an embedding m’ such that m’ = m−{(ui, vx), (uj, vy)}∪{(ui, vy),
(uj, vx)}.

Please read the following paper for more detail:

Han, W. S., Lee, J., & Lee, J. H. (2013, June). Turbo iso: towards ultrafast and robust
subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM
SIGMOD International Conference on Management of Data (pp. 337-348). ACM.
2) Please decompose the vertex set of query q according to Core-Forest-Leaf decomposition.
That is, decompose the vertex set of q into three sets including the core-set, the forest-set
and the leaf-set. (5 marks)

Given a query q, the Core-Forest-Leaf decomposition consists of core-forest
decomposition and forest-leaf decomposition.
Core-Forest Decomposition
Edges of q can be categorized into two categories regarding a spanning tree qT of q:
edges in qT are called tree edges while edges of q that are not in qT are called non-tree
edges regarding qT.

Our core-forest decomposition is to compute a small dense subgraph containing all
non-tree edges regarding any spanning tree, which is defined as follows. Given a query q,
the core-forest decomposition of q is to compute the minimal connected subgraph g of q
that contains all non-tree edges of q regarding any spanning tree of q; g is called the
core-structure of q.

The subgraph of q consisting of all other edges not in the
core-structure called the forest-structure of q, denoted T. We call the vertex set of the
core-structure as the core-set VC and the forest-structure of q doesn’t contain any vertices
in VC.

Forest-Leaf Decomposition

Given the forest-structure T, rooting each tree in forest-structure at its connection
vertex with core-structure. The set VI is called the leaf set which contains all the
degree-one vertices in the trees of forest-structure. The set of vertices not in VC U VI is
called the forest set VT.

Let V(q) denotes the vertex set of q, V(q) = VC ∪VT ∪VI and VC ∩VT = VC ∩VI = VT
∩VI = ∅.
Please read the following paper for more detail:
Bi, F., Chang, L., Lin, X., Qin, L., & Zhang, W. (2016, June). Efficient subgraph
matching by postponing cartesian products. In Proceedings of the 2016 International
Conference on Management of Data (pp. 1199-1214). ACM.

Considering Figure 4 in the above paper, we can decompose the vertex set of q into:

The core set: u0, u1, u2
The forest set: u3, u4, u5, u6
The leaf set: u7, u8, u9, u10

Question 3 (5 marks)

Given a social influence graph G1 as following:
1) Choose one activated seed s from v0 ~ v9 which can generate the largest influence spreads
(i.e., let w(s) = 1, maximize ∑ 𝑤(𝑣𝑖
9
𝑖=0
)). (5 marks)

Initially, all the vertices are inactivated. We define w(u) as the probability of a vertex u
which can be activated. In graph G1, p(u, v) on a directed link from u to v is the
probability that v is activated by u after u is activated (e.g., p(v0, v1) = 0.3). For example,
w(v0) = 1, w(v2) = 0.2, and w(v3) = 0.2 * 0.1 = 0.02 if we choose v0 as the activated seed.