## Description

1. Background

Graphs arise in many applications in science, engineering, and the social sciences. Here, we are

particularly interested in a class of graphs known as scale-free networks. An important

characteristic of scale-free network is they include some number of hub nodes that contain a

relatively large number of links, but the vast majority of nodes contain relatively few links. The

number of links attached to a node is referred to as the node’s degree. Technically, a scale-free

network is defined as a network whose node degree distribution follows a power-law. Scale free

networks are interesting because there is empirical evidence that they arise in many applications

of current interest.

This assignment involves generating scale free network graphs and performing an analysis of the

maximum path length through the graph. The problem is divided into two parts. The first part

involves writing a program to generate a graph, storing it in a file, and considering if the

generated topology is scale-free. The second involves writing another separate program that

reads the graph file created by the graph generation program and computing the diameter of the

graph to evaluate how the maximum distance between nodes increases with network size.

You will work in teams of two persons each to attack this problem. For each team one person

will develop a program to generate a network graph and completing associated tasks. The second

person will develop a program that reads the graph file and performs an analysis of the graph.

The two programs should be completely separate “stand-alone” programs. The interface between

the two programs will be the format of the file specifying the graph. You and your partner must

define and document a common file format for the graph, and write up all of your results into a

single, combined report.

2. Part 1: Random Graph Generation

To complete this part of the assignment, write a program that creates a scale-free network graph,

stores the result into a file, and generates a histogram of the node degree distribution. Your

program should take three command line parameters. The first is the number of nodes in the

network to be generated. The second is the name of the file where the resulting topology is to be

written. The third is the name of another file where the histogram of the node degree distribution

is to be written. For example, the command line (in Unix):

% graphgen 2000 topology –h histogram

executes the program called “graphgen” (which you will write) to create a scale-free network

topology containing 2000 nodes, and writes the resulting topology into a file called topology.

The “-h histogram” part of the command line is optional. If specified, a histogram is created

and written into a file called histogram.

2 CX 4010

The network topology is constructed using a well-known principle called preferential attachment

[1]. The key idea behind preferential attachment is when a node is added to an existing network,

it is more likely to establish a link to another node that already has a high degree (a large number

of edges) compared to nodes with fewer edges. This construction mechanism is also referred to

as “the rich get richer” in the sense that the “richness” of a node is based on its node degree.

More precisely, when a new node is added to an existing network, it is attached via an edge to a

randomly selected node i with probability p(i). p(i) is defined as: d(i) / D where d(i) is the degree

of node i, and D is the sum of node degrees of all nodes in the existing network prior to adding

the new node.

The algorithm to generate the topology with N nodes is as follows:

1. Construct a fully connected topology containing n0 nodes, where n0 << N. A fully

connected network means there is a single edge between every pair of nodes (so a fully

connected network with n0 nodes has n0*(n0-1)/2 edges). For this assignment, assume n0

is equal to 3 nodes.

2. Assuming the network now contains k nodes (labeled 0, 1, 2, … k-1) add one additional

node k and one edge between k and one randomly selected node j with probability p(j) as

defined above using the preferential attachment rule.

3. Repeat the previous step until the network contains N nodes.

Your program must include at least three functions:

1. A function that creates the network and stores it in memory

2. A function that outputs a histogram of the node degree distribution for the created

network to a file; each line of the file should contain three values: i, num, list,

where i is the node degree (1, 2, 3, …), num is the number of nodes in your network

with degree i, and list is a list of nodes with degree i. The first line of the generated

file should have information for nodes of degree 1, the second line has information for

nodes of degree 2, etc.

3. A function that writes the network topology stored in memory into a file in the format

agreed to with your partner.

You should complete an analysis of the topologies generated by your program and argue that the

created network is indeed scale-free. In this plot, the horizontal axis is the node degree, and the

vertical axis indicates the number of times a node with that degree appeared in the network you

generated. An easy way to qualitatively determine if the node-degree histogram follows a power

law is to create a log-log plot of the node degree histogram, i.e., plot log(d) on the horizontal

axis, and log(c) on the vertical axis, where d is the degree, and c is a count of the number of

nodes with degree d. If the distribution follows a power law, the plot should produce

(approximately) a straight line. Do this for as large a network as your program can generate in a

reasonable amount of time, say several minutes.

In addition to documenting your software you will need to define the format of the file

containing the graph. This file format defines the interface between the graph generation and the

gra..

3. Part 2: Network Analysis

We are interested in understanding how the path length between vertices grows as the size of the

scale-free network grows. The metric of interest here is called the network diameter. Diameter is

defined as follows. A path from a source to a destination vertex is a sequence of edges leading

from the source to the destination. The length of this path is the sum of the weights assigned to

the edges making up the path. The distance between any two vertices in a graph is the length of

the shortest path between those two vertices. The diameter of the graph is defined as the

maximum distance among all pairs of vertices in the graph. The objective of this part of the

assignment is to understand how the graph diameter grows as N is increased in a scale free

network with N vertices. Here, you will assume that each edge of the network is bidirectional,

and has weight (length) of 1.

The graph analysis program reads a file containing the graph topology (produced by the graph

generation program) and computes the diameter of the graph. To compute the diameter, use a

breadth-first search algorithm to compute the minimum length path between a source vertex s

and every other vertex in the network. Write your program to be as efficient as possible. You

may reuse code, e.g., a FIFO queue, that you or your partner developed in the previous

assignment, but be sure to cite the source of this software in your report and through a comment

in the software itself.

Use the following command line to execute your program:

% analysis topology –o output

The first argument specifies the name of the file that holds the topology of the network you are

analyzing. The second (-o output) is optional. If specified, it indicates the name of the file

your program will create to hold the output. The output file should provide the following

information:

• The first line indicates the diameter of the network that was analyzed

• The next N lines contain information on each of the N nodes. Specifically, each line

should contain (1) the node number s, and (2) the maximum distance from node s to any

other node in the network, and (3) a list of the nodes that are this maximum distance from

node s.

If the output file is not specified in the command line via the –o flag, the program should simply

print the computer network diameter to the screen.

The report for this part of the assignment should plot the size of the network diameter as a

function of N. Use as large a value of N as your program can process in a reasonable amount of

time (e.g., a few minutes).

4.

Students enrolled in CX 4010 must complete all of the tasks described above to create the two

programs. A report should describe both the software as well as the results produced by the

software, as discussed above, as well as your analyses. Although there are two distinct parts to

this assignment, your report should be written as one document, not two separate documents. Be

sure to include proper citations and a bibliography in your report.

4 CX 4010

5. CSE 6010 Students

For graduate students, in addition to the above:

1. Perform a literature search on scale free networks, focusing on the applications where

scale-free networks are believed to arise. The network need not exactly match the

preferential attachment model, but there should be empirical evidence suggesting the

topology has properties such as hub and leaf nodes. What do the nodes and links

represent, and what kinds of questions might a graph analysis answer? Write up your

findings in the report. You should discuss at least three different applications involving

scale-free networks and provide some intuition as to why they might be scale-free. This

part should be completed by the individual who wrote the graph generation program.

2. Search the literature for information on the expected diameter of scale free networks.

Compare your results with what is expected, as described in the literature. Write up your

findings in the report.

6. CX 4010 Students (Extra Credit)

Complete the assignment required for CSE 6010 students described above.

7. Reminder: Collaboration Policy

A reminder you must adhere to the Georgia Tech honor code, and the collaboration policy stated

in the course syllabus. Specifically, you are encouraged to discuss the problem and possible

solutions with other students (as well as the TA/instructor), however, all code that you turn in

must be completely your own work, except as noted above (in which case you must clearly

document the source of the code). Disseminating your code to other students (not in your team)

is strictly prohibited. Further, downloading code from the web or other sources other than

examples provided in the class is also prohibited.

References

1. Barabasi, A.L. and R. Albert, Emergence of Scaling in Random Networks. Science, 1999.

286(5439): p. 509-512.