## Description

You have been contracted by the Department of Fire and Emergency Services to develop an algorithm

they can use to place temporary fire stations in the case of extensive bushfires. They would provide

you with a map of hotspots affected by a bushfire and a number of temporary fire stations they are able

to support at that time.

The temporary fire stations will each be responsible several hotspots, with no double-handling of

hotspots (i.e., each hotspot will be handled by only one temporary fire station).

The fire department have two important considerations:

๏ท On one hand, they want to ensure that hotspots handled by different fire stations are as far from each

other as possible. This is important as fire can easily spread and in general it is not possible to fully

synchronise the efforts of all neighbouring fire stations.

๏ท On the other hand, it is also important that hotspots handled by the same fire station are as close together

as possible so that fire brigade can move from one hotspot to another.

Therefore your program should suggest the number of fire stations to maximize the ratio of distances

between hotspots handled by different fire stations and those handled by the same station.

Your task is to solve this problem with an efficient algorithm for solving a clustering problem.

CLUSTERING BACKGROUND

The clustering problem is defined as follows:

Given a set of items, ๐1, ๐2, โฆ , ๐๐, and a set of distances between those items,

๐๐๐ = ๐(๐๐

, ๐๐) where ๐,๐ โ [1, ๐]

then find the optimal clustering of the points.

We explain/define the important concepts as follows.

๏ท Distance between two items is a measure of dissimilarity between the two items. If the items are points

in a two-dimensional plane, distance can be defined as the Euclidean distance.

๏ท Inter-clustering distance (InterCD) is the minimum distance between any two items belonging to

different clusters. Formally:

InterCD = min{๐(๐๐

, ๐๐) | ๐,๐ โ [1, ๐] and ๐๐

is in a different cluster to ๐๐

}

๏ท Similarly, intra-clustering distance (IntraCD) is the maximum distance between any two items

belonging to the same cluster. Formally:

IntraCD = max{๐(๐๐

, ๐๐) | ๐,๐ โ [1, ๐] and ๐๐

is in the same cluster as ๐๐

}

๏ท A clustering is optimal if it maximizes the inter-clustering distance for the given number k of clusters.

(Note that, in general, there are many different ways to define an optimal clustering, and we are using

this particular one).

๏ท The centroid of the cluster is the โaverageโ point in the cluster. The x-coordinate of the centroid is the

average of the ๐ฅ-coordinates of all the points in the cluster; similarly, the ๐ฆ-coordinate of the centroid

is the average of the ๐ฆ-coordinates of all the points in the cluster.

TASKS

Task 1 โ Individual and pairs assignment: Use Kruskalโs algorithm for finding a minimum spanning

tree to find an optimal clustering (clustering that maximizes the inter-clustering distance) of the given

set of n points in a two-dimensional plane, for the given number k of clusters.

Task 2 โ Pairs assignment only: Among the optimal clusterings obtained by Kruskalโs algorithm for

different values of ๐ (where 2 โค ๐ โค ๐ โ 1), find the one that maximizes the ratio InterCD

IntraCD

between

the inter-clustering distance and intra-clustering distance.

INPUT

1. Your program must be executable from the command line. It must accept input as command line

arguments (a.k.a., parameters), and write output to standard output (e.g., cout in C++,

System.out in Java). Errors should be printed to standard error (e.g., cerr in C++,

System.error in Java).

2. Your program executable (or your main Java class) must be called kcluster. The first command

line arguments it accepts must be a path to the input file, and the second must be the number of

temporary fire stations.

For example, to run the program using a file named โinput.txtโ in the current folder, and 2

temporary fire stations, we would run:

kcluster ./input.txt 2

or

java kcluster ./input.txt 2

3. If you are working in pairs, then the program must allow for the 2nd argument to be the word

โautoโ. In this case the program will automatically calculate number of clusters so as to maximize

the inter-clustering to intra-clustering distance ratio.

4. The input file will be a comma separated value file, with the first column being the ID of the

hotspot, the 2nd column is the ๐ฅ-coordinate of the hotspot, and the 3rd column is the ๐ฆ-coordinate

of the hotspot. The ID will be a unique integer, and the ๐ฅ- and ๐ฆ-coordinates will be floating point

numbers.

For example, an input file might look like:

1,1.0,1.0

2,2.0,2.0

3,3.0,5.0

4,7.0,8.0

5,8.0,7.0

5. The program should handle obvious error conditions (e.g., incorrectly formatted or empty input

files, negative numbers of temporary fire stations, etc) by printing an error message to standard

error and immediately stopping execution.

OUTPUT

The output must look like the following example.

INDIVIDUAL ASSIGNMENT:

Suppose the program is run for an input file named โinput.txtโ, requesting two stations by running

java kcluster ./input.txt 2 then the output would be:

Hello and welcome to Kruskalโs Clustering!

The weighted graph of hotspots:

0 1.41 4.47 9.22 9.22

1.41 0 3.16 7.81 7.81

4.47 3.16 0 5 5.39

9.22 7.81 5 0 1.41

9.22 7.81 5.39 1.41 0

There are 5 hotspots.

You have requested 2 temporary fire stations.

Station 1:

Coordinates: (2.00, 2.67)

Hotspots: {1,2,3}

Station 2:

Coordinates: (7.50, 7.50)

Hotspots: {4,5}

Inter-clustering distance: 5.00

Thank you for using Kruskalโs Clustering. Bye.

PAIRS ASSIGNMENT:

The assignment should look the same as the individual assignment if the number of requested

temporary fire stations is a number.

In addition: if the number of fire stations is given as auto, then the output should be as below. For

example, suppose the program is run for an input file named โinput.txtโ, requesting automatic station

calculation by running java kcluster ./input.txt auto then the output would be:

There are 5 hotspots.

Automatically calculating number of temporary fire stations.

The weighted graph of hotspots:

0 1.41 4.47 9.22 9.22

1.41 0 3.16 7.81 7.81

4.47 3.16 0 5 5.39

9.22 7.81 5 0 1.41

9.22 7.81 5.39 1.41 0

Number of stations: 3

Station 1:

Coordinates: (1.50, 1.50)

Hotspots: {1,2}

Station 2:

Coordinates: (3.00,5.00)

Hotspots: {3}

Station 3:

Coordinates: (7.50, 7.50)

Hotspots: {4,5}

InterCD/IntraCD = 2.24

Thank you for using Kruskalโs Clustering. Bye.

ASSESSMENT CRITERIA

Your code must compile and run. A program that does not compile or run can only earn marks from

the โGraph constructionโ and โKruskalโs Algorithmโ categories (see below). The marker will not

modify your code to make it work.

Marks awarded for the assignments depend on whether the assignment is an individual or pairs

assignment, as follows.

INDIVIDUAL ASSIGNMENT

1. Graph construction: 12 marks.

1.1. (10 marks) for the correct algorithm and graph

1.2. (2 marks) for the well-commented and clear implementation

2. Kruskalโs Algorithm: 45 marks.

2.1. (40 marks) for the correct algorithm

2.2. (5 marks) for a well-commented and clear implementation

3. Positioning of fire stations: 15 marks.

4. Calculation of inter-clustering distance: 10 marks.

5. Input and output as specified: 18 marks.

5.1. (4 marks) Program accepts command line parameters.

5.2. (4 marks) Program outputs to standard output and writes errors to standard error.

5.3. (10 marks) Output formatted as requested.

PAIRS ASSIGNMENT

1. Graph construction: 7 marks.

1.1. (5 marks) for the correct algorithm and graph

1.2. (2 marks) for the well-commented and clear implementation

2. Kruskalโs Algorithm: 35 marks.

2.1. (30 marks) for the correct algorithm

2.2. (5 marks) for a well-commented and clear implementation

3. Positioning of emergency stations: 10 marks.

4. Calculation of inter-clustering distance: 10 marks.

5. Finding the number of clusters to maximize InterCD

IntraCD

: 20 marks.

6. Input and output as specified: 18 marks.

6.1. (4 marks) Program accepts command line parameters correctly.

6.2. (4 marks) Program outputs to standard output and writes errors to standard error.

6.3. (10 marks) Output formatted as requested.