Deturck, Wilf - Lecture Notes on Numerical Analysis (523142), страница 27
Текст из файла (страница 27)
If we want it to drop to, say, 10−mtimes its original value then we can expect to need no more than about m(ln 10)n(n − 1)/2rotations.To put it in very concrete terms, suppose we’re working in double precision (12-digit)arithmetic and we are willing to decree that convergence has taken place if Od has beenreduced by 10−12 . Then at most 12(ln 10)n(n − 1)/2 < 6(ln 10)n2 ≈ 13.8n2 rotations willhave to be done. Of course in practice we will be watching the function Od(A) as it drops,118Numerical linear algebraso there won’t be any need to know in advance how many iterations are needed.
We canstop when the actual observed value is small enough. Still, it’s comforting to know that atmost O(n2 ) iterations will be enough to do the job.Now let’s re-direct our thoughts to the grand iterations process itself. At each stepwe apply a rotation matrix to the current symmetric matrix in order to make it “morediagonal”. At the same time, of course, we must keep track of the product of all of therotation matrices that we have so far used, because that is the matrix that ultimately willbe an orthogonal matrix with the eigenvectors of A across its rows.Let’s watch this happen. Begin with A.
After one rotation we have J1 AJ1T , aftertwo iterations we have J2 J1 AJ1T J2T , after three we have J3 J2 J1 AJ1T J2T J3T , etc. After alliterations have been done, and we are looking at a matrix that is “diagonal enough” forour purposes, the matrix we see is P AP T = D, where P is obtained by starting with theidentity matrix and multiplying successively on the left by the rotational matrices J thatare used, and D is (virtually) diagonal.Since P AP T = D, we have AP T = P T D, so the columns of P T , or equivalently therows of P , are the eigenvectors of A.Now, we have indeed proved that the repeated rotations will diagonalize A. We have notproved that the matrices P themselves converge to a certain fixed matrix.
This is true, butwe omit the proof. One thing we do want to do, however, is to prove the spectral theorem,Theorem 3.7.1, itself, since we have long since done all of the work.Proof (of the Spectral Theorem 3.7.1): Consider the mapping f that associates with everyorthogonal matrix P the matrix f (P ) = P T AP . The set of orthogonal matrices is compact,and the mapping f is continuous.
Hence the image of the set of orthogonal matrices underf is compact. Hence there is a matrix F in that image that minimizes the continuousfunction Od(f (P )) = Od(P T AP ). Suppose D is not diagonal. Then we could find a Jacobirotation that would produce another matrix in the image whose Od would be lower, whichis a contradiction (of the fact that Od(D) was minimal).
Hence F is diagonal. So there isan orthogonal matrix P such that P T AP = D, i.e., AP = P D. Hence the columns of P aren pairwise orthogonal eigenvectors of A, and the proof of the spectral theorem is complete.Now let’s get on with the implementation of the algorithm.3.10Corbató’s idea and the implementation of the JacobialgorithmIt’s time to sit down with our accountants and add up the costs of the Jacobi method.First, we have seen that O(n2 ) rotations will be sufficient to reduce the off-diagonal sum ofsquares below some pre-assigned threshold level.
Now, what is the price of a single rotation?Here are the steps:(i) Search for the off-diagonal element having the largest absolute value. The cost seemsto be equal to the number of elements that have to be looked at, namely n(n − 1)/2,which we abbreviate as O(n2 ).3.10 Corbató’s idea and the implementation of the Jacobi algorithm119(ii) Calculate θ, sin θ and cos θ, and then carry out a rotation on the matrix A. This costsO(n), since only four lines of A are changed.(iii) Update the matrix P of eigenvectors by multiplying it by the rotation matrix. Sinceonly two rows of P change, this cost is O(n) also.The longest part of the job is the search for the largest off-diagonal element.
The searchis n times as expensive in time as either the rotation of A or the update of the eigenvectormatrix.For this reason, in the years since Jacobi first described his algorithm, a number of otherstrategies for dealing with the eigenvalue problem have been worked out. One of these iscalled the cyclic Jacobi method. In that variation, one does not search, but instead goesmarching through the matrix one element at a time. That is to say, first do a rotation thatreduces A12 to zero.
Next do a rotation that reduces A13 to zero (of course, A12 doesn’tstay put, but becomes nonzero again!). Then do A14 and so forth, returning to A12 againafter Ann , cycling as long as necessary. This method avoids the search, but the proof thatit converges at all is quite complex, and the exact rate of convergence is unknown.A variation on the cyclic method is called the threshold Jacobi method, in which wego through the entries cyclically as above, but we do not carry out a rotation unless themagnitude of the current matrix entry exceeds a certain threshold (“throw it back, it’s toosmall”). This method also has an uncertain rate of convergence.At a deeper level, two newer methods due to Givens and Householder have been developed. These methods work not by trying to diagonalize A by rotations, but instead totri-diagonalize A by rotations. A tri-diagonal matrix is one whose entries are all zero except for those on the diagonal, the sub-diagonal and the super-diagonal (i.e, Aij = 0 unless|i − j| ≤ 1).The advantage of tri-diagonalization is that it is a finite process: it can be done in sucha way that elements, once reduced to zero, stay zero instead of bouncing back again asthey do in the Jacobi method.
The disadvantage is that, having arrived at a tri-diagonalmatrix, one is not finished, but instead one must then confront the question of obtainingthe eigenvalues and eigenvectors of a tri-diagonal matrix, a nontrivial operation.One of the reasons for the wide use of the Givens and Householder methods has beenthat they get the answers in just O(n3 ) time, instead of the O(n4 ) time in which the originalJacobi method operates.Thanks to a suggestion of Corbató, (F.
J. Corbató, On the coding of Jacobi’s methodfor computing eigenvalues and eigenvectors of symmetric matrices, JACM, 10: 123-125,1963) however, it is now easy to run the original Jacobi method in O(n3 ) time also. Thesuggestion is all the more remarkable because of its simplicity and the fact that it lives atthe software level, rather than at the mathematical level. What it does is just this: it allowsus to use the largest off-diagonal entry at each stage while paying a price of a mere O(n)for the privilege, instead of the O(n2 ) billing mentioned above.We can do this by carrying along an additional linear array during the calculation.
Theith entry of this array, say loc[i], contains the number of a column in which an off-diagonal120Numerical linear algebraelement of largest absolute value in row i lives, i.e., Ai,loc(i) is an entry in row i of A oflargest absolute value in that row.Now of course if some benefactor is kind enough to hand us this array, then it would bea simple matter to find the biggest off-diagonal element of the entire matrix A. We wouldjust look through the n − 1 numbers |Ai,loc(i) | (i = 1, 2, . . . , n − 1) to find the largest one.If the largest one is the one where i = p, say, then the desired matrix element would beAp,loc(p) . Hence the cost of using this array is O(n).Since there are no such benefactors as described above, we are going to have to pay aprice for the care and feeding of this array. How much does it cost to create and to maintainit?Initially, we just search through the whole matrix and set it up.
This clearly costsO(n2 ) operations, but we pay just once. Now let’s turn to a typical intermediate stage inthe calculation, and see what the price is for updating the loc array.Given an array loc, suppose now that we carry out a single rotation on the matrixA. Precisely how do we go about modifying loc so it will correspond to the new matrix?The rotated matrix differs from the previous matrix in exactly two rows and two columns.Certainly the two rows, p and q, that have been completely changed will simply have to besearched again in order to find the new values of loc. This costs O(n) operations.What about the other n − 2 rows of the matrix? In the ith one of those rows exactly twoentries were changed, namely the ones in the pth column and in the qth column.
Supposethe largest element that was previously in row i was not in either the pth or the qth column,i.e., suppose loc[i]6∈ {p, q}. Then that previous largest element will still be there in therotated matrix. In that case, in order to discover the new entry of largest absolute value inrow i we need only compare at most three numbers: the new |Aip |, the new |Aiq |, and theold |Ai,loc(i) |. The column in which the largest of these three numbers is found will be thenew loc[i].
The price paid is at most three comparisons, and this does the updating jobin every row except those that happen to have had loc[i]∈ {p, q}.In the latter case we can still salvage something. If we are replacing the entry ofpreviously largest absolute value in the row i, we might after all get lucky and replace itwith an even larger number, in which case we would again know the new loc(i).