Wednesday, July 21, 2010

Gould Index - Matrix Application to Geography

Figure 1:
Map of 7 towns connected by 9 roads
The Gould Index has a nice visual that students enjoy, but it also is used to answer a question (important for applications) and can lead to discussions of the PageRank function used by Google.  Much of this post is adapted from a paper by Patrick Carlson, one of my linear algebra students.  He was interested in networks, and did a great job of understanding and explaining the general nature of the mathematics behind the Gould Index.

Example 1:  Consider a graph that respresents a set of towns (the vertices) and the travel routes between those towns (the edges) like in Figure 1.  Historical geographers were interested in which town would become the trade center for this region.  Now, you are saying to yourself that this is obvious in this picture, but let's see if some mathematics can come to the same result we do.  If so, then let's apply it to a less obvious graph.

Figure 2:  Adjacency matrix for Figure 1
First, make the adjacency matrix for the graph (Figure 2).  Place a 1 in position (U, V) when U and V have a travel route between them and a 0 in the (W, T) position when there is no direct travel route between W and T.  We will also place a 1 in each diagonal position, such as (Q, Q) as if there was a loop there, since clearly if you are in Town Q you can obviously get to Town Q.  Notice that this matrix is symmetric, which will make some of the mathematics work but is also a way of checking to make sure you created the matrix correctly.

Now, here's the math.  Find the eigenvalues of this matrix and find the largest one in absolute value.  We get eigenvalues (2, 0, 4, -1, 0, 2, 0).  The third one, 4, has the largest absolute value.  Now, find the eigenvector associated with the eigenvalue of 4: 
(0.3162, 0.3162, 0.3162, 0.3162, 0.6325, 0.3162, 0.3162).  Normalize this vector by dividing by the sum of the entries, 2.5297, to get

(Q, R, S, T, U, V, W) =
(0.125, 0.125, 0.125, 0.125, 0.25, 1.25, 1.25).

These are the Gould indices of each of the vertices, which describes how strongly each vertex is connected to the other vertices.  As we knew, U is the most strongly connected with a Gould Index of 0.25, and the others are equally connected because of the symmetry of the graph.

Figure 3:  Another graph
Example 2:  What if we looked at a graph that more realistically represents a set of towns, which may have rivers, mountains or other geographical features preventing travel routes between towns, as in the graph in Figure 3.  In this case, would G be the trade center because it is directly connected to the most towns, or would it be C because between C and every other town is at most one town?  Create the adjacency matrix (with ones along the diagonal), and find the eigenvalues which are
(4.01, -1.37, 1.71, 1, -0.37, -0.56, 2.58, 1, 1).  The first is the largest and the eigenvector associated with it is
(0.2941, 0.4097, 0.5023, 0.3249, 0.3026, 0.1359, 0.4770, 0.1583, 0.1583) and the sum of the entries is 2.7631.  Normalize by dividing by the sum of the entries to get the Gould indices:

(A, B, C, D, E, F, G, H, I) =
(0.1064, 0.1482, 0.1818, 0.1176, 0.1095, 0.0492, 0.1726, 0.0573, 0.0573).

Of course, we can see which Gould index will be largest before normalizing, but there are other applications for which normalized values of the eigenvector will have meaning, so it doesn't hurt to get used to normalizing.  We see that C has the largest Gould Index with 0.18, but G comes in a close second with 0.17.  In this calculation, more value is given to vertices that are closer to others along paths of length more than 1 than to vertices that connect with a path of length 1 to more vertices.  Note that I and H have the same smallest index, as expected.

What is going on?  Although we have some empirical evidence that the Gould index does what is expected, what is the theory behind it?  A version of the Perron-Frebonius Theorem says that if a matrix is nonnegative and primitive, then there will be a real eigenvector that dominates the others in absolute value, it will have multiplicity 1, and the eigenvector associated with it will be real and positive [C].  The adjacency matrices above are nonnegative since all the entries are 0 or 1, and for the two above I checked that they are primitive; in Example 1, the square of the matrix was positive, and in Example 2, it took the fourth power of the matrix before getting a positive matrix.  We don't use the adjacency A, per se, but instead we use B = A + I when we put 1s in the diagonal.  Try a bipartite graph and see that the powers of A will never be positive, and thus A is not primitive in that case, but B = A + I will be.

Straffin discusses the how of the Gould index in more detail, and discusses its connection to finding the dominant team in a round robin tournament and measuring the importance in nodes in a communications network.  He has a significant list of references to examples of this method along with references to other uses of linear algebra and graphs in geography.  This method is a beginning for the mathematics behind the algorithm used in PageRank used by Google; however, the adjacency matrix (with 1s on the diagonal) is not primitive and is too large to find the eigenvalues and eigenvectors directly.  But that is a discussion for another day.

Questions:  When this model is applied to historical travel routes, the trade center predicted by the model does not always match the trade center that develops.  Non-geographical issues may be involved here, such as politics and wealth, but it is clear that the model does not take into account the time of travel.  Maybe there is a train between two towns for quick access, but two others are only accessible on foot through a mountain pass.  Can we change the results for Example 2 above by considering a weighted graph instead?  If so, should we weight with time of travel or the reciprocal of time of travel?  How would the diagonal elements be weighted?

The nth power of an adjacency graph gives the number of paths of length n between vertices x and y in the (x, y) position.  Could this be used in someway to predict the trade center?

In Gould's article is a study of the travel routes of Uganda in 1921 and 1935, and Straffin's paper has references to other studies.  Can you reproduce these results?

References:

[C] Hal Caswell, Matrix Popoulation Models, Sinauer, 1989.

[G] P. R. Gould, "On the geographical interpretation of eigenvectors,"  Transactions of the Institute of British Geographers, No. 42 (Dec., 1967), pp. 53-86.

[S] Phillip D. Straffin, "Linear Algebra in Geography: Eigenvectors of Networks," Mathematics Magazine, Vol. 53, No. 5 (Nov., 1980), pp. 269-276.

1 comment:

  1. Really nice explanation...Thanks for writing.

    ReplyDelete