An introduction to information retrieval. Manning_ Raghavan (2009) (811397), страница 96
Текст из файла (страница 96)
What is the overallcomplexity of single-link clustering for a set of points on a line?Exercise 17.11Prove that single-link, complete-link, and group-average clustering are monotonic inthe sense defined on page 378.Exercise 17.12For N points, there are ≤ N K different flat clusterings into K clusters (Section 16.2,page 356). What is the number of different hierarchical clusterings (or dendrograms)of N documents? Are there more flat clusterings or more hierarchical clusterings forgiven K and N?Online edition (c) 2009 Cambridge UPOnline edition (c) 2009 Cambridge UPDRAFT! © April 1, 2009 Cambridge University Press.
Feedback welcome.18403Matrix decompositions and latentsemantic indexingOn page 123 we introduced the notion of a term-document matrix: an M × Nmatrix C, each of whose rows represents a term and each of whose columnsrepresents a document in the collection. Even for a collection of modest size,the term-document matrix C is likely to have several tens of thousands ofrows and columns. In Section 18.1.1 we first develop a class of operationsfrom linear algebra, known as matrix decomposition.
In Section 18.2 we use aspecial form of matrix decomposition to construct a low-rank approximationto the term-document matrix. In Section 18.3 we examine the applicationof such low-rank approximations to indexing and retrieving documents, atechnique referred to as latent semantic indexing. While latent semantic indexing has not been established as a significant force in scoring and rankingfor information retrieval, it remains an intriguing approach to clustering in anumber of domains including for collections of text documents (Section 16.6,page 372). Understanding its full potential remains an area of active research.Readers who do not require a refresher on linear algebra may skip Section 18.1, although Example 18.1 is especially recommended as it highlightsa property of eigenvalues that we exploit later in the chapter.18.1RANKLinear algebra reviewWe briefly review some necessary background in linear algebra. Let C bean M × N matrix with real-valued entries; for a term-document matrix, allentries are in fact non-negative.
The rank of a matrix is the number of linearlyindependent rows (or columns) in it; thus, rank(C ) ≤ min{ M, N }. A squarer × r matrix all of whose off-diagonal entries are zero is called a diagonalmatrix; its rank is equal to the number of non-zero diagonal entries. If allr diagonal entries of such a diagonal matrix are 1, it is called the identitymatrix of dimension r and represented by Ir .For a square M × M matrix C and a vector ~x that is not all zeros, the valuesOnline edition (c) 2009 Cambridge UP40418Matrix decompositions and latent semantic indexingof λ satisfying(18.1)EIGENVALUE(18.2)C ~x = λ~xare called the eigenvalues of C .
The N-vector ~x satisfying Equation (18.1)for an eigenvalue λ is the corresponding right eigenvector. The eigenvectorcorresponding to the eigenvalue of largest magnitude is called the principaleigenvector. In a similar fashion, the left eigenvectors of C are the M-vectors ysuch that~y T C = λ~y T .The number of non-zero eigenvalues of C is at most rank(C ).The eigenvalues of a matrix are found by solving the characteristic equation,which is obtained by rewriting Equation (18.1) in the form (C − λI M )~x = 0.The eigenvalues of C are then the solutions of |(C − λI M )| = 0, where |S|denotes the determinant of a square matrix S.
The equation |(C − λI M )| = 0is an Mth order polynomial equation in λ and can have at most M roots,which are the eigenvalues of C. These eigenvalues can in general be complex,even if all entries of C are real.We now examine some further properties of eigenvalues and eigenvectors,to set up the central idea of singular value decompositions in Section 18.2 below. First, we look at the relationship between matrix-vector multiplicationand eigenvalues.✎Example 18.1:Consider the matrix30S= 00020000 .1Clearly the matrix has rank 3, and has 3 non-zero eigenvalues λ1 = 30, λ2 = 20 andλ3 = 1, with the three corresponding eigenvectors100x~1 = 0 , x~2 = 1 and x~3 = 0 .001For each of the eigenvectors, multiplication by S acts as if we were multiplying theeigenvector by a multiple of the identity matrix; the multipledifferent for each is 2eigenvector.
Now, consider an arbitrary vector, such as ~v = 4 . We can always6express ~v as a linear combination of the three eigenvectors of S; in the current examplewe have2~v = 4 = 2x~1 + 4x~2 + 6x~3 .6Online edition (c) 2009 Cambridge UP40518.1 Linear algebra reviewSuppose we multiply ~v by S:S~v(18.3)====S (2x~1 + 4x~2 + 6x~3 )2S x~1 + 4S x~2 + 6S x~32λ1 x~1 + 4λ2 x~2 + 6λ3 x~360x~1 + 80x~2 + 6x~3 .Example 18.1 shows that even though ~v is an arbitrary vector, the effect ofmultiplication by S is determined by the eigenvalues and eigenvectors of S.Furthermore, it is intuitively apparent from Equation (18.3) that the productS~v is relatively unaffected by terms arising from the small eigenvalues of S;in our example, since λ3 = 1, the contribution of the third term on the righthand side of Equation (18.3) is small. In fact, if we were to completely ignorethe contribution in Equation (18.3) from the third eigenvectorcorresponding60to λ3 = 1, then the product S~v would be computed to be 80 rather than060the correct product which is 80 ; these two vectors are relatively close6to each other by any of various metrics one could apply (such as the lengthof their vector difference).This suggests that the effect of small eigenvalues (and their eigenvectors)on a matrix-vector product is small.
We will carry forward this intuitionwhen studying matrix decompositions and low-rank approximations in Section 18.2. Before doing so, we examine the eigenvectors and eigenvalues ofspecial forms of matrices that will be of particular interest to us.For a symmetric matrix S, the eigenvectors corresponding to distinct eigenvalues are orthogonal. Further, if S is both real and symmetric, the eigenvaluesare all real.✎(18.4)Example 18.2:Consider the real, symmetric matrixS=2112.From the characteristic equation | S − λI | = 0, we have the quadratic (2 − λ)2 − 1 =0,yield the eigenvalues 3 and 1. The corresponding eigenvectors solutions whose11are orthogonal.and1−1Online edition (c) 2009 Cambridge UP4061818.1.1MATRIXDECOMPOSITIONEIGEN DECOMPOSITION(18.5)(18.6)Matrix decompositions and latent semantic indexingMatrix decompositionsIn this section we examine ways in which a square matrix can be factoredinto the product of matrices derived from its eigenvectors; we refer to thisprocess as matrix decomposition.
Matrix decompositions similar to the onesin this section will form the basis of our principal text-analysis techniquein Section 18.3, where we will look at decompositions of non-square termdocument matrices. The square decompositions in this section are simplerand can be treated with sufficient mathematical rigor to help the reader understand how such decompositions work. The detailed mathematical derivation of the more complex decompositions in Section 18.2 are beyond thescope of this book.We begin by giving two theorems on the decomposition of a square matrix into the product of three matrices of a special form. The first of these,Theorem 18.1, gives the basic factorization of a square real-valued matrixinto three factors.
The second, Theorem 18.2, applies to square symmetricmatrices and is the basis of the singular value decomposition described inTheorem 18.3.Theorem 18.1. (Matrix diagonalization theorem) Let S be a square real-valuedM × M matrix with M linearly independent eigenvectors. Then there exists aneigen decompositionS = UΛU −1,where the columns of U are the eigenvectors of S and Λ is a diagonal matrix whosediagonal entries are the eigenvalues of S in decreasing orderλ1λ2 , λ i ≥ λ i +1 .···λMIf the eigenvalues are distinct, then this decomposition is unique.(18.7)To understand how Theorem 18.1 works, we note that U has the eigenvectors of S as columnsU = (~u1 u~2 · · · u~M ) .Then we haveSU= S (~u1 u~2 · · · u~M )= (λ1 u~1 λ2 u~2 · · · λ M u~M )λ1λ2= (~u1 u~2 · · · u~M ) ···λMOnline edition (c) 2009 Cambridge UP.18.2 Term-document matrices and singular value decompositions407Thus, we have SU = UΛ, or S = UΛU −1.We next state a closely related decomposition of a symmetric square matrixinto the product of matrices derived from its eigenvectors.