Nash - Compact Numerical Methods for Computers (523163), страница 12
Текст из файла (страница 12)
The use of the symbol for the toleranceis not coincidental. The previous employment of this symbol in computing therotation parameters and the norm of the orthogonalised columns of the resultingmatrix is finished, and it can be re-used.Permitting S + to depend on a user-defined tolerance places upon him/her theresponsibility for deciding the degree of linear dependence in his/her data. In aneconomic modelling situation, for instance, columns of U corresponding to smallsingular values are almost certain to be largely determined by errors or noise inthe original data. On the other hand, the same columns when derived from thetracking of a satellite may contain very significant information about orbitperturbations. Therefore, it is not only difficult to provide an automatic definitionfor S +, it is inappropriate.
Furthermore, the matrix B = US contains the principalcomponents (Kendall and Stewart 1958-66, vol 3, p 286). By appropriatechoices of q in equation (3.37), the solutions x corresponding to only a few of theSingular-value decomposition, and use in least-squares problems41dominant principal components can be computed. Furthermore, at this stage inthe calculation UT b should already have been computed and saved, so that only asimple matrix-vector multiplication is involved in finding each of the solutions.Another way to look at this is to consider the least-squares problem(3.38)Bw bwhere B is the matrix having orthogonal columns and is given in equations (3.1)and (3.6). Thus the normal equations corresponding to (3.38) are(3.39)BT Bw = S2 w = BT b.But S 2 is diagonal so that the solutions are easily obtained asw = S - 2 BT b(3.40)w = S - 1UT b.(3.41)and substitution of (3.6) givesShould the problem be singular, thenw = S+ U T b.(3.42)can be used.
Now note that becauseBVT = A(3.43)from (3.1), the solution w allows x to be computed via(3.44)x = Vw .The coefficients w are important as the solution of the least-squares problem interms of the orthogonal combinations of the original variables called the principalcomponents. The normalised components are contained in U. It is easy torearrange the residual sum of squares so thatr T r = (b – Ax) T ( b – Ax)T= ( b – Bw) T (b – Bw) = bT b – b T Bw(3.45)by virtue of the normal equations (3.39).
However, substituting (3.37) in (3.42)and noting the ordering of S, it is obvious that ifSk+1,k+1 < q(3.46)is the first singular value less than or equal to the tolerance, thenwi = 0for i > k.(3.47)The components corresponding to small singular values are thus dropped from thesolution. But it is these components which are the least accurately determinedsince they arise as differences. Furthermore, from (3.6) and (3.45)r T r = b T b – bT USS+ U T b(3.48)where the limit of the sum in (3.48) is k, the number of principal componentswhich are included. Thus inclusion of another component cannot increase the42Compact numerical methods for computersresidual sum of squares. However, if a component with a very small singular valueis introduced, it will contribute a very large amount to the corresponding elementof w, and x will acquire large elements also.
From (3.48), however, it is theinteraction between the normalised component uj and b which determines howmuch a given component reduces the sum of squares. A least-squares problemwill therefore be ill conditioned if b is best approximated by a column of U whichis associated with a small singular value and thus may be computed inaccurately.On the other hand, if the components corresponding to ‘large’ singular valuesare the ones which are responsible for reducing the sum of squares, then theproblem has a solution which can be safely computed by leaving out thecomponents which make the elements of w and x large without appreciablyreducing the sum of squares. Unless the unwanted components have no part inreducing the sum of squares, that is unlessuiT b = 0for i > k(3.49)under the same condition (3.46) for k, then solutions which omit these componentsare not properly termed least-squares solutions but principal-components solutions.In many least-squares problems, poorly determined components will not arise,all singular values being of approximately the same magnitude.
As a rule ofthumb for my clients, I suggest they look very carefully at their data, and inparticular the matrix A, if the ratio of the largest singular value to the smallestexceeds 1000. Such a distribution of singular values suggests that the columns of Aare not truly independent and, regardless of the conditioning of the problem asdiscussed above, one may wish to redefine the problem by leaving out certainvariables (columns of A) from the set used to approximate b.Algorithm 2. Least-squares solution via singular-value decompositionprocedure svdlss(nRow, nCo1: integer; {order of problem}W : wmatrix; {working array with decomposition}Y : rvector; {right hand side vector}Z : r-vector; {squares of singular values}A : rmatrix; {coefficient matrix (for residuals)}var Bvec: r-vector); {solution vector}{alg02.pas ==least squares solution via singular value decomposition.On entry, W must have the working matrix resulting from the operation ofNashSVD on a real matrix A in alg1.pas.
Z will have the squares of thesingular values. Y will have the vector to be approximated. Bvec will bethe vector of parameters (estimates) returned. Note that A could beomitted if residuals were not wanted. However, the user would then losethe ability to interact with the problem by changing the tolerance q.Because this uses a slightly different decomposition from that in thefirst edition of Compact Numerical Methods, the step numbers are notgiven.Copyright 1988 J. C.
Nash}vari, j, k : integer;q, s : real;Singular-value decomposition, and use in least-squares problems43Algorithm 2. Least-squares solution via singular-value decomposition (cont.)beginwriteln(‘alg02.pas == svdlss’);repeatwriteln;writeln(‘Singular values’);for j := 1 to nCo1 dobeginwrite(sqrt(Z[j]):18,‘’);if j = 4 * (j div 4) then writeln;end;writeln;write(‘Enter a tolerance for zero singular value (<0 to quit)’);readln(infile,q);if length(infname)>0 then writeln(q);if q>=0.0 thenbeginq := q*q; {we will work with the Z vector directly}for i := 1 to nCo1 do {loop to generate each element of Bvec}begins := 0.0;for j := 1 to nCo1 do {loop over columns of V}beginfor k := 1 to nRow do {loop over elements of Y}beginif Z[j]>q thens := s + W[i+nRow,j]*W[k,j]*Y[k]/Z[j];{this is V * S+ * U-transpose * Y = A+ * Y}{NOTE: we must use the > sign and not >= in casethe user enters a value of zero for q, which wouldresult in zero-divide.}end;end;Bvec[i] := s;end;writeln(‘Least squares solution’);for j := 1 to nCo1 dobeginwrite(Bvec[j]:12,‘’);if j = 5 * (j div 5) then writeln;end;writeln;s := resids(nRow, nCo1, A, Y, Bvec, true);end; {if q>=0.0}until q<0.0; {this is how we exit from the procedure}end {alg02.pas == svdlss};In the above code the residual sum of squares is computed in the separate procedure resids.pas.In alg02.pas, I have not included step numbers because the present code is quite different from theoriginal algorithm.44Compact numerical methods for computersExample 3.1.
The generalised inverse of a rectangular matrix via the singular-valuedecompositionGiven the matrices U, V and S of the singular-value decomposition (2.53), then bythe product(2.56)A+ = VS+ UTthe generalised (Moore-Penrose) inverse can be computed directly.
Consider thematrixA Hewlett-Packard 9830 operating in 12 decimal digit arithmetic computes thesingular values of this matrix via algorithm 1 to six figures as13·7530, 1·68961 and 1·18853E-5withandThe generalised inverse using the definition (2.57) of S+ is then (to six figures)However, we might wonder whether the third singular value is merely anapproximation to zero, that is, that the small value computed is a result ofrounding errors. Using a new definition (3.37) for S+, assuming thissingular value is really zero givesIf these generalised inverses are used to solve least-squares problems withb = (1, 2, 3, 4)TSingular-value decomposition, and use in least-squares problems45as the right-hand sides, the solutions arex1 = (1·000000048, -4·79830E-8, -4·00000024)Twith a residual sum of squares of 3·75892E-20 andx2 = (0·222220924, 0·777801787, -0·111121188)Twith a residual sum of squares of 2·30726E-9.
Both of these solutions areprobably acceptable in a majority of applications. Note, however, that the firstgeneralised inverse giveswhile the second givesin place ofIn the above solutions and products, all figures printed by the HP 9830 have beengiven rather than the six-figure approximations used earlier in the example.Example 3.2. Illustration of the use of algorithm 2The estimation of the coefficients xi, i = 1, 2, 3, 4, 5, in example 2.3 (p.
23),provides an excellent illustration of the worth of the singular-value decompositionfor solving least-squares problems when the data are nearly collinear. The data forthe problem are given in table 3.1.To evaluate the various solutions, the statistic(3.50)will be used, wherer = b – Ax(2.15)is the residual vector and is the mean of the elements of b, the dependentvariable. The denominator in the second term of (3.50) is often called the totalsum of squares since it is the value of the residual sum of squares for the modely = constant =2(3.51)The statistic R can be corrected for the number of degrees of freedom in theleast-squares problem.
Thus if there are m observations and k fitted parameters,46Compact numerical methods for computersTABLE 3.1. Index numbers (1940 = 100) for farm money incomeand agricultural use of nitrogen, phosphate, potash and petroleum inthe United States (courtesy Dr S Chin).IncomeNitrogenPhosphatePotashPetroleum3053423313393543693783684054384384514855636586767498349731079115113241499169017351778262291294302320350386401446492510534559461473513516540596650676769870907932956221222221218217218218225228230237235236there are (m - k) degrees of freedom and the corrected R2 i s(3.52)R2 andprovide measures of the goodness of fit of our model which are notdependent on the scale of the data.Using the last four columns of table 3.1 together with a column of ones for thematrix A in algorithm 2, with the first column of the table as the dependentvariable b, a Data General NOVA operating in 23-bit binary floating-pointarithmetic computes the singular values:5298·55, 345·511, 36·1125, 21·4208 and 5·13828E-2.The ratio of the smallest of these to the largest is only very slightly larger than themachine precision, 2-22, and we may therefore expect that a great number ofextremely different models may give very similar degees of approximation to thedata.