## Description

## Parallel TSP using MPI

Your assignment is to compare the serial TSP solution, threaded TSP solution and the MPI-based

TSP Solutions from Assignment-1 with Assignment-2. In your report, compare the performances

of these different implementations in terms of both execution times and accuracy.

## Requirements

1) You will generate the same number of points (x,y pairs) for each block separately. Make

sure the points you generate for each block. are within the x,y boundaries for that block.

This ensures an even distribution of points for each block.

2) You MUST use a dynamic programming (DP) solution to solve the individual TSP

subproblems. Instead of creating an open-TSP as in Assignment-1, we will now directly

use the optimal closed loops from the serial TSP solution.

3) Use an MPI Cartesian topology to set up your b2

blocks in a 2-D grid as in assignment-1.

4) Use the TSP-merge technique discussed in the hand-out for merging two closed TSP

solutions from two different blocks.

5) At the block level, in essence, we are following the Euclidean TSP approach using the

nearest neighbors. This problem is solved by computing the minimum spanning tree from

the nearest neighbor graph and has performance guarantees within 2OPT.

For a proper parallelization of the MST concept, we will use Boruvka’s nearest neighbor

algorithm which takes O(log n) steps, where n is the number of blocks (i.e., n x K points,

where each block will have K points and n=b2

). Check hand-out for implementing the

MST idea.

6) Using the MPI Cartesian coordinates, we will first solve the TSP-merge problem along

each row. Since we have b columns, this will take O(log b) time (not considering the time

taken for merging two TSPs; that’s an exercise for you).

Note that all rows of the 2-D grid can merge their TSPs in parallel in O(log b) steps.

Once this is completed (use MPI_Barrier to check for completion), we will just have to

stitch b TSPs along the column which takes another O(log b) steps. Hence total number

of steps for stitching the blocks will be 2log b steps multiplied by the individual worstcase times of stitching two TSPs together. This could have been implemented in a 1-D

scenario (taking O(log n) steps), however, the 2-D block partition scheme provides

improvements due to the 2log b steps from an implementation perspective.

7) We should not get any inversions using this approach as in Assignment-1. But make sure

that is the case. If we do get inversions, we need to use the same strategy as before to

switch edges and resolve them.

### Results and Report

I expect that you will execute timing runs. From these you can prepare plots showing speedup

and accuracy (in terms of length of the tour) for parallel versions using several sizes.

### Deliverables

1. Source code(s)

2. Written report describing results. I expect a few pages with graphs or tables along with

paragraphs describing your results and conclusions.

3. Show theoretical analyses of your parallel algorithm in terms of run time complexity,

efficiency, scalability (isoefficiency), cost and memory optimality and calculate the optimal

number of processors needed for a problem of size N points.