Nash - Compact Numerical Methods for Computers (523163), страница 51
Текст из файла (страница 51)
Related methods have been developed in particular byPaige and Saunders (1975) who relate the conjugate gradients methods to theLanczos algorithm for the algebraic eigenproblem. The Lanczos algorithm hasbeen left out of this work because I feel it to be a tool best used by someoneprepared to tolerate its quirks. This sentiment accords with the statement ofKahan and Parlett (1976):‘The urge to write a universal Lanczos program shouldbe resisted, at least until the process is better understood.’ However, in the handsof an expert, it is a very powerful method for finding the eigenvalues of a largesymmetric matrix.
For indefinite coefficient matrices, however, I would expect thePaige-Saunders method to be preferred, by virtue of its design. In preparing the firstedition of this book, I experimented briefly with some FORTRAN codes for severalmethods for iterative solution of linear equations and least-squares problems, findingno clear advantage for any one approach, though I did not focus on indefinitematrices.
Therefore, the treatment which follows will stay with conjugate gradients,which has the advantage of introducing no fundamentally new ideas.It must be pointed out that the general-purpose minimisation algorithm 22 doesnot perform very well on linear least-squares or Rayleigh quotient minimisations.234Next235Conjugate gradients method in linear algebraIn some tests run by S G Nash and myself, the inexact line searches led to veryslow convergence in a number of cases, even though early progress may havebeen rapid (Nash and Nash 1977).19.2. SOLUTION OF LINEAR EQUATIONS ANDLEAST-SQUARES PROBLEMS BY CONJUGATE GRADIENTSThe conjugate gradients algorithm 22 can be modified to solve systems of linearequations and linear least-squares problems.
Indeed, the conjugate gradientsmethods have been derived by considering their application to the quadratic formS(b)=½b T Hb-c Tb+(anyscalar)(15.11)where H is symmetric and positive definite. The minimum at b' has the zerogradientg = Hb' - c =0(15.17)so that b' solves the linear equationsHb' = c .(19.1)The linear least-squares problem(19.2)Tcan be solved by noting that A A is non-negative definite. For the presentpurpose we shall assume that it is positive definite. Hestenes (1975) shows that itis only necessary for H to be non-negative definite to produce a least-squaressolution of (19.1).
Thus identification ofH= AT A(19.34c =A Tf(19.3b)permits least-squares problems to be solved via the normal equations.The particular changes which can be made to the general-purpose minimisingalgorithm of §§16.1 and 16.2 to suit these linear systems of equations better arethat (a) the linear search and (b) the gradient calculation can be performedexplicitly and simply. The latter is accomplished as follows. First, if the initial pointis taken to beb ( 0 ) =0(19.4)then the initial gradient. from (15.17), isg, = -c.(19.5)If the linear search along the search direction tj yields a step-length factor kj, thenfrom (15.18)(19.6)g j + l =g j +k jHt j .This with (19.5) defines a recurrence scheme which avoids the necessity ofmultiplication of b by H to compute the gradient, though Ht j must be formed.However, this is needed in any case in the linear search to which attention willnow be directed.236Compact numerical methods for computersSubstitution ofb j + 1 =b j +k jt j(19.7)into (15.11) gives an equation for kj, permitting the optimal step length to becomputed.
For convenience in explaining this, the subscript j will be omitted.Thus from substituting, we obtainφ(k)=½(b+k t) T H(b+k t) -c T (b+k t)+(anyscalar).(19.8)The derivative of this with respect to k isφ ' (k ) =t T H( b+k t) -c Tt=t T (Hb- c) +k tT Ht= t Tg +kt T Ht(19.9)so that φ'(k)=0 implies(19.10)k= -t T g/ t T Ht.But from the same line of reasoning as that which follows equation (16.6), theaccurate line searches imply that the function has been minimised on a hyperplanespanned by the gradient directions which go to make up t except for g itself: thusk=+g T g/ t T Ht.(19.11)Note the sign has changed since t will have component in direction g with acoefficient of -1.The recurrence formula for this problem can be generated by one of theformulae (16.8), (16.10) or (16.11).
The last of these is the simplest and is the onethat has been chosen. Because the problem has a genuine quadratic form it maybe taken to have converged after n steps if the gradient does not become smallearlier. However, the algorithm which follows restarts the iteration in caserounding errors during the conjugate gradients steps have caused the solution tobe missed.Algorithm 24. Solution of a consistent set of linear equations by conjugate gradientsprocedure lecg( n : integer; {order of the problem}H : rmatrix; {coefficient matrix in linear equations}C : rvector; (right hand side vector}var Bvec : rvector; {the solution vector -- on input, wemust supply some guess to this vector}var itcount : integer; {on input, a limit to the number ofconjugate gradients iterations allowed; on output,the number of iterations used (negative if thelimit is exceeded)}var ssmin : real); {the approximate minimum sum of squaredresiduals for the solution}{alg24.pas == linear equations by conjugate gradientsThis implementation uses an explicit matrix H.
However,we only need the result of multiplying H times a vector,that is, where the procedure callmatmul( n, H, t, v);Conjugate gradients method in linear algebra237Algorithm 24. Solution of a consistent set of linear equations by conjugate gradients(cont.)computes the matrix - vector product H t and places it inv, we really only need havematmulH(n, t, v),with the matrix being implicit. In many situations, theelements of H are simple to provide or calculate so thatits explicit storage is not required. Such modificationsallow array H to be removed from the calling sequenceof this procedure.Copyright 1988 J.C.Nash}varcount, i, itn, itlimit : integer;eps, g2, oldg2, s2, step, steplim, t2, tol : real;g, t, v : rvector;begin{writeln(‘alg24.pas -- linear equations solution by conjugate gradients’);}{Because alg24.pas is called repeatedly in dr24ii,pas, we do not alwayswant to print the algorithm banner.}itlimit := itcount; {to save the iteration limit -- STEP 0}itcount := 0; {to initialize the iteration count}eps := calceps; {machine precision}steplim := 1.0/sqrt(eps); {a limit to the size of the step which maybe taken in changing the solution Bvec}g2 := 0.0; {Compute norm of right hand side of equations.}for i := 1 to n do g2 := g2+abs(C[i]); tol := g2*eps*eps*n;{STEP 1: compute the initial residual vector){Note: we do not count this initial calculation of the residual vector.}matmul(n, H, Bvec, g);for i := 1 to n do g[i] := g[i]-C[i];{this loop need not be separate ifexplicit matrix multiplication is used} {This loop computes theresidual vector.
Note that it simplifies if the initial guess to thesolution Bvec is a null vector. However, this would restrict thealgorithm to one major cycle (steps 4 to 11)}g2 := 0.0; {STEP 2}for i := 1 to n dobeging2 := g2+g[i]*g[i]; t[i] := -g[i];end; {loop on i}{writeln(‘initial gradient norm squared = ’,g2);}ssmin := big; {to ensure no accidental halt -- STEP 3}while (g2>tol) and (itcount<itlimit) and (ssmi1>0.0) do(Convergence test -- beginning of main work loop in method. )begin {STEP 4 -- removed}{STEP 5 -- here in the form of explicit statements}itcount := itcount+l; {to count matrix multiplications}matmul( n, H, t, v);t2 := 0.0; {STEP 6 -- t2 will contain the product t-transpose * H * t}for i := 1 to n do t2 := t2+t[i]*v[i];step := g2/t2; oldg2 := g2; {STEP 7}if abs(step)>steplim thenbeginwriteln(‘Step too large -- coefficient matrix indefinite?‘);238Compact numerical methods for computersAlgorithm 24. Solution of a consistent set of linear equations by conjugate gradients(cont.)ssmin := -big; {to indicate this failure}endelsebegin{The step length has been computed and the gradient norm saved.}g2 := 0.0; count := 0; {STEP 8}for i := 1 to n dobeging[i] := g[i]+step*v[i]; {to update the gradient/residual}t2 := Bvec[i]; Bvec[i] := t2+step*t[i]; {to update the solution}if Bvec[i]=t2 then count := count+l;g2 := g2+g[i]*g[i]; {accumulate gradient norm}end; {loop on i -- updating}if count<n then {STEP 9}beginif g2>tol thenbegin {STEP 10 -- recurrence relation to give next search direction.}t2 := g2/oldg2;for i := 1 to n do t[i] := t2*t[i]-g[i];end; {if g>tol}end; {if count<n}{writeln(‘Iteration’,itcount,’count=’,count,’residualss=’,g2);)ssmin := g2; {to return the value of the estimated sum of squares}end; {else on stepsize failure}end; {while g2>tol}if itcount>=itlimit then itcount := -itcount; {to notify calling programof the convergence failure.}{After n cycles, the conjugate gradients algorithm will, in exactarithmetic, have converged.
However, in floating-point arithmeticthe solution may still need refinement. For best results, theresidual vector g should be recomputed at this point, in the samemanner as it was initially computed. There are a number of waysthis can be programmed. Here we are satisfied to merely point outthe issue.}end, {alg24.pas == lecg}The above algorithm requires five working vectors to store the problem and intermediate resultsas well as the solution. This is exclusive of any space needed to store or generate the coefficientmatrix.Example 19.1. Solution of Fröberg’s differential equation (example 2.2)Recall that the finite difference approximation of a second-order boundary valueproblem gives rise to a set of linear equations whose coefficient matrix is easilygenerated. That is to say, there is no need to build the set of equations explicitly ifthe coefficient matrix satisfies the other conditions of the conjugate gradientsmethod.
It turns out that it is negative definite, so that we can solve the equations( - A )x= - b(19.12)by means of the conjugate gradients algorithm 24. Alternatively, if the coefficientConjugate gradients method in linear algebra239matrix is not definite, the normal equationsA T Ax=A T b(19.13)Tprovide a non-negative definite matrix A A. Finally, the problem can be approached in a completely different way.