An introduction to information retrieval. Manning_ Raghavan (2009) (811397), страница 97
Текст из файла (страница 97)
This will pave theway for the development of our main tool for text analysis, the singular valuedecomposition (Section 18.2).SYMMETRIC DIAGONALDECOMPOSITIONTheorem 18.2. (Symmetric diagonalization theorem) Let S be a square, symmetric real-valued M × M matrix with M linearly independent eigenvectors. Thenthere exists a symmetric diagonal decompositionS = QΛQ T ,(18.8)where the columns of Q are the orthogonal and normalized (unit length, real) eigenvectors of S, and Λ is the diagonal matrix whose entries are the eigenvalues of S.Further, all entries of Q are real and we have Q−1 = Q T .We will build on this symmetric diagonal decomposition to build low-rankapproximations to term-document matrices.?Exercise 18.1What is the rank of the 3 × 3 diagonal matrix below?1 1 0 0 1 1 1 2 1Exercise 18.2Show that λ = 2 is an eigenvalue ofC=64−20.Find the corresponding eigenvector.Exercise 18.3Compute the unique eigen decomposition of the 2 × 2 matrix in (18.4).18.2SINGULAR VALUEDECOMPOSITIONTerm-document matrices and singular value decompositionsThe decompositions we have been studying thus far apply to square matrices.
However, the matrix we are interested in is the M × N term-documentmatrix C where (barring a rare coincidence) M 6= N; furthermore, C is veryunlikely to be symmetric. To this end we first describe an extension of thesymmetric diagonal decomposition known as the singular value decomposition. We then show in Section 18.3 how this can be used to construct an approximate version of C. It is beyond the scope of this book to develop a fullOnline edition (c) 2009 Cambridge UP40818SYMMETRIC DIAGONALDECOMPOSITIONSVDMatrix decompositions and latent semantic indexingtreatment of the mathematics underlying singular value decompositions; following the statement of Theorem 18.3 we relate the singular value decomposition to the symmetric diagonal decompositions from Section 18.1.1.
GivenC, let U be the M × M matrix whose columns are the orthogonal eigenvectors of CC T , and V be the N × N matrix whose columns are the orthogonaleigenvectors of C T C. Denote by C T the transpose of a matrix C.Theorem 18.3. Let r be the rank of the M × N matrix C. Then, there is a singularvalue decomposition (SVD for short) of C of the formC = UΣV T ,(18.9)where1. The eigenvalues λ1 , .
. . , λr of CC T are the same as the eigenvalues of C T C;√2. For 1 ≤ i ≤ r, let σi = λi , with λi ≥ λi+1 . Then the M × N matrix Σ iscomposed by setting Σii = σi for 1 ≤ i ≤ r, and zero otherwise.The values σi are referred to as the singular values of C. It is instructive toexamine the relationship of Theorem 18.3 to Theorem 18.2; we do this ratherthan derive the general proof of Theorem 18.3, which is beyond the scope ofthis book.By multiplying Equation (18.9) by its transposed version, we have(18.10)CC T = UΣV T VΣU T = UΣ2 U T .Note now that in Equation (18.10), the left-hand side is a square symmetricmatrix real-valued matrix, and the right-hand side represents its symmetricdiagonal decomposition as in Theorem 18.2.
What does the left-hand sideCC T represent? It is a square matrix with a row and a column corresponding to each of the M terms. The entry (i, j) in the matrix is a measure of theoverlap between the ith and jth terms, based on their co-occurrence in documents. The precise mathematical meaning depends on the manner in whichC is constructed based on term weighting.
Consider the case where C is theterm-document incidence matrix of page 3, illustrated in Figure 1.1. Then theentry (i, j) in CC T is the number of documents in which both term i and termj occur.When writing down the numerical values of the SVD, it is conventionalto represent Σ as an r × r matrix with the singular values on the diagonals,since all its entries outside this sub-matrix are zeros. Accordingly, it is conventional to omit the rightmost M − r columns of U corresponding to theseomitted rows of Σ; likewise the rightmost N − r columns of V are omittedsince they correspond in V T to the rows that will be multiplied by the N − rcolumns of zeros in Σ.
This written form of the SVD is sometimes knownOnline edition (c) 2009 Cambridge UP40918.2 Term-document matrices and singular value decompositionsrrrrrrrrrrrrrrrrrrrrCr r r r rr r r r rr r r r r=rrrrrrrrrrrrrrrrrrrrrrr r rr r rr r rrVTΣUrr r rr r rr r rrrrrrrrrrrrrrrrrrrrrrrrrrrr◮ Figure 18.1 Illustration of the singular-value decomposition. In this schematicillustration of (18.9), we see two cases illustrated. In the top half of the figure, wehave a matrix C for which M > N.
The lower half illustrates the case M < N.TRUNCATED SVDas the reduced SVD or truncated SVD and we will encounter it again in Exercise 18.9. Henceforth, our numerical examples and exercises will use thisreduced form.✎Example 18.3: We now illustrate the singular-value decomposition of a 4 × 2 matrix of rank 2; the singular values are Σ11 = 2.236 and Σ22 = 1.REDUCEDSVD(18.11)1 0C= 1−1 −0.632−1 0.3161 =0 −0.3160.63210.000−0.707 2.236−0.707 0.0000.0000.0001.000−0.707−0.7070.707−0.707As with the matrix decompositions defined in Section 18.1.1, the singular value decomposition of a matrix can be computed by a variety of algorithms, many of which have been publicly available software implementations; pointers to these are given in Section 18.5.?(18.12)Exercise 18.4Let1C= 0111 0be the term-document incidence matrix for a collection.
Compute the co-occurrencematrix CC T . What is the interpretation of the diagonal entries of CC T when C is aterm-document incidence matrix?Online edition (c) 2009 Cambridge UP.41018(18.13)Matrix decompositions and latent semantic indexingExercise 18.5Verify that the SVD of the matrix in Equation (18.12) is−0.816 0.000−0.7071.732 0.000and V T =U = −0.408 −0.707 , Σ =0.7070.000 1.000−0.408 0.707−0.707−0.707by verifying all of the properties in the statement of Theorem 18.3.Exercise 18.6Suppose that C is a binary term-document incidence matrix. What do the entries ofC T C represent?Exercise 18.7Let(18.14)0C= 0223110 0be a term-document matrix whose entries are term frequencies; thus term 1 occurs 2times in document 2 and once in document 3. Compute CC T ; observe that its entriesare largest where two terms have their most frequent occurrences together in the samedocument.18.3F ROBENIUS NORM(18.15)Low-rank approximationsWe next state a matrix approximation problem that at first seems to havelittle to do with information retrieval.
We describe a solution to this matrixproblem using singular-value decompositions, then develop its applicationto information retrieval.Given an M × N matrix C and a positive integer k, we wish to find anM × N matrix Ck of rank at most k, so as to minimize the Frobenius norm ofthe matrix difference X = C − Ck , defined to bevuM Nuk X k F = t ∑ ∑ Xij2 .i =1 j =1LOW- RANKAPPROXIMATIONThus, the Frobenius norm of X measures the discrepancy between Ck and C;our goal is to find a matrix Ck that minimizes this discrepancy, while constraining Ck to have rank at most k.
If r is the rank of C, clearly Cr = Cand the Frobenius norm of the discrepancy is zero in this case. When k is farsmaller than r, we refer to Ck as a low-rank approximation.The singular value decomposition can be used to solve the low-rank matrix approximation problem. We then derive from it an application to approximating term-document matrices.
We invoke the following three-stepprocedure to this end:Online edition (c) 2009 Cambridge UP,41118.3 Low-rank approximations=Ckr r r r rr r r r rr r r r rr r rr r rr r rVTΣkUrrrrrrrrrrrrrrrrrrrrrrrrrrrr◮ Figure 18.2 Illustration of low rank approximation using the singular-value decomposition. The dashed boxes indicate the matrix entries affected by “zeroing out”the smallest singular values.1. Given C, construct its SVD in the form shown in (18.9); thus, C = UΣV T .2.
Derive from Σ the matrix Σk formed by replacing by zeros the r − k smallest singular values on the diagonal of Σ.3. Compute and output Ck = UΣk V T as the rank-k approximation to C.The rank of Ck is at most k: this follows from the fact that Σk has at mostk non-zero values. Next, we recall the intuition of Example 18.1: the effectof small eigenvalues on matrix products is small. Thus, it seems plausiblethat replacing these small eigenvalues by zero will not substantially alter theproduct, leaving it “close” to C. The following theorem due to Eckart andYoung tells us that, in fact, this procedure yields the matrix of rank k withthe lowest possible Frobenius error.Theorem 18.4.(18.16)Z|minkC − Z k F = kC − Ck k F = σk+1 .rank( Z)=kRecalling that the singular values are in decreasing order σ1 ≥ σ2 ≥ · · ·,we learn from Theorem 18.4 that Ck is the best rank-k approximation to C,incurring an error (measured by the Frobenius norm of C − Ck ) equal to σk+1.Thus the larger k is, the smaller this error (and in particular, for k = r, theerror is zero since Σr = Σ; provided r < M, N, then σr +1 = 0 and thusCr = C).To derive further insight into why the process of truncating the smallestr − k singular values in Σ helps generate a rank-k approximation of low error,we examine the form of Ck :(18.17)Ck= UΣk V TOnline edition (c) 2009 Cambridge UP41218(18.18)=(18.19)=Matrix decompositions and latent semantic indexingσ10000U0···00000σk00k∑ σi~ui~viT ,000000000··· TVi =1where u~ i and ~vi are the ith columns of U and V, respectively.
Thus, ~ui~viT isa rank-1 matrix, so that we have just expressed Ck as the sum of k rank-1matrices each weighted by a singular value. As i increases, the contributionof the rank-1 matrix ~ui~viT is weighted by a sequence of shrinking singularvalues σi .?Exercise 18.8Compute a rank 1 approximation C1 to the matrix C in Example 18.12, using the SVDas in Exercise 18.13. What is the Frobenius norm of the error of this approximation?Exercise 18.9Consider now the computation in Exercise 18.8. Following the schematic in Figure 18.2, notice that for a rank 1 approximation we have σ1 being a scalar. Denoteby U1 the first column of U and by V1 the first column of V.
Show that the rank-1approximation to C can then be written as U1 σ1 V1T = σ1 U1 V1T .Exercise 18.10Exercise 18.9 can be generalized to rank k approximations: we let Uk′ and Vk′ denotethe “reduced” matrices formed by retaining only the first k columns of U and V,Trespectively. Thus Uk′ is an M × k matrix while V ′ k is a k × N matrix.