Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 72
Текст из файла (страница 72)
Let Q = VWT. Note,incident ally, that P21 = VWT · WSWT, so Q = VWT is the orthonormalpolar factor of P21, and hence is the nearest orthonormal matrix to P21 in the2- and Frobenius norms. For details of the CS decomposition see Golub andVan Loan [470, 1989, pp. 77, 471] and Paige and Wei [814, 1994].)389P ROBLEMS18.12. We know that Householder QR factorization ofis equivalent tothe MGS method applied to A, and Problem 18.10 shows that the orthonormal matrix Q from MGS is a submatrix of the orthogonal matrix P from theHouseholder method. Since Householder’s method produces a nearly orthogonal P, does it not follow that MGS must also produce a nearly orthonormalQ?18.13.
(Higham [557, 1994]) Letposition A = UH. Show that(m > n) have the polar decom-This result shows that the two measures of orthonormality ||ATA – I|| 2||A - U||2 are essentially equivalent (cf. (18.22)).andPreviousHomeChapter 19The Least Squares ProblemFor some time it has been believed that orthogonalizing methodsdid not suffer this squaring of the condition number . . .It caused something of a shock, therefore,when in 1966 Golub and Wilkinson . . .
asserted thatalready the multiplications QA and Qb may produce errors in the solutioncontaining a factor x2(A).— A. VAN DER SLUIS,Stability of the Solutions of Linear Least Squares Problems (1975)Most packaged regression problems do compute a cross-products matrixand solve the normal equations using a matrix inversion subroutine.All the programs . . . that disagreed(and some of those that agreed) with the unperturbed solutiontried to solve the normal equations.— ALBERT E. BEATON, DONALD B.
RUBIN, and JOHN L. BARONE,The Acceptability of Regression Solutions:Another Look at Computational Accuracy (1976)On January 1, 1801 Giuseppe Piazzi discovered the asteroid Ceres.Ceres was only visible for forty daysbefore it was lost to view behind the sun . . .Gauss, using three observations, extensive analysis,and the method of least squares, was able todetermine the orbit with such accuracy that Ceres waseasily found when it reappeared in late 1801.— DAVID K. KAHANER, CLEVE B.
MOLER, and STEPHEN G. NASH,Numerical Methods and Software (1989)391NextT HE LEAST S QUARES P ROBLEM392In this chapter we consider the least squares (LS) problemwhere A(m > n) has full rank. We begin by examining the sensitivity of the LS problem to perturbations. Then we examine the stabilityof methods for solving the LS problem, covering QR factorization methods,the normal equations and seminormal equations methods, and iterative refinement. Finally, we show how to compute the backward error of an approximateLS solution.We do not develop the basic theory of the LS problem, which can be foundin standard textbooks (see, for example, Golub and Van Loan [470, 1989,However, we recall the fundamental result that any solution of theLS problem satisfies the normal equations ATAx = ATb (see Problem 19.1).Therefore if A has full rank there is a unique LS solution.
More generally,+whatever the rank of A the vector xLS = A b is an LS solution, and it is the+solution of minimal 2-norm. Here, A is the pseudo-inverse of A (given byA + = (A T A) -l A T when A has full rank); see Problem 19.3. (For more on thepseudo-inverse see Stewart and Sun [954, 1990, 3.1]. )19.1.
Perturbation TheoryPerturbation theory for the LS problem is, not surprisingly, more complicatedthan for linear systems, and there are several forms in which bounds can bestated. We begin with a normwise perturbation theorem that is a restatementof a result of Wedin [1069, 1973, Thm. 5.1]. For an arbitrary rectangularmatrix A we define the condition number K2 (A) = ||A|| 2 ||A + || 2 . If A hasr = rank(A) nonzero singular values, σ1 > · · · > σr, then K 2 (A) = σ1/σ r .Theorem 19.1 (Wedin). Let Arank, and letThen, provided thatK2(A)ε(m > n) and A + ∆A both be of full< 1,(19.1)(19.2)These bounds are approximately attainable.19.1 P ERTURBATION T HEORY393Proof.
We defer a proof until 19.8, since the techniques used in the proofare not needed elsewhere in this chapter.The bound (19. 1) is usually interpreted as saying that the sensitivity of theLS problem is measured by k2 (A) when the residual is small or zero and by2K2(A) otherwise. This means that the sensitivity of the LS problem dependsstrongly on b as well as A, unlike for a square linear system.Here is a simple example where the K2(A)2 effect is seen:It is a simple exercise to verify thatSinceK2(A)= 1/ε,Surprisingly, it is easier to derive componentwise perturbation bounds thannormwise ones for the LS problem.
The key idea is to express the LS solutionand its residual as the solution of the augmented system(19.3)which is simply another way of writing the normal equations, ATAx = ATb.This is a square nonsingular system, so standard techniques can be applied.The perturbed system of interest is(19.4)where we assume that(19.5)From (19.3) and (19.4) we obtain394T HE LEAST S QUARES P ROBLEMPremultiplying by the inverse of the matrix on the left gives(19.6)Looking at the individual block components we obtain(19.7)(19.8)(Note that ||I – AA+||2 = min{1, m – n}, as is easily proved using the singular value decomposition (SVD).) On taking norms we obtain the desiredperturbation result.Theorem 19.2.
Let(m > n) and A + ∆A be of full rank. Forthe perturbed LS problem described by (19.4) and (19.5) we have(19.9)(19.10)for any monotonic norm. These bounds are approximately attainable.For a square system, werem 7.4. Note, however, thats. For theoretical analysis itwhich x and r replace y andhave s = O, and we essentially recover Theothe bounds contain the perturbed vectors y andmay be preferable to use alternative bounds ins and there is an extra factorwhere the term in parentheses is assumed to be positive.
For practical computation (19.9) is unsatisfactory because we do not know s = b + ∆ b – (A + ∆A)y.However, as Stewart and Sun observe [954, 1990, p. 159], = b – Ay is computable andand using this bound in (19.9) makes only a second-order change.The componentwise bounds enjoy better scaling properties than the normwise ones. If E = |A| and f = |b| then the bounds (19.7) and (19.8), andto a lesser extent (19.9) and (19.10), are invariant under column scalings(D diagonal). Row scaling does affect the componentwise bounds, since it changes the LS solution, but the componentwisebounds are less sensitive to the row scaling than the normwise bounds, in away that is difficult to make precise.19.2 S OLUTIONBYQR F ACTORIZATION39519.2. Solution by QR FactorizationLetwith m > n and rank(A) = n. If A has the QR factorizationthenIt follows that the unique LS solution is x =R -l c, and the residual ||b –Ax||2 = ||d||2 .
Thus the LS problem can be solved with relatively little extrawork beyond the computation of a QR factorization. Note that Q is notrequired explicitly; we just need the ability to apply Qt to a vector.It is well known that the Givens and Householder QR factorization algorithms provide a normwise backward stable way to solve the LS problem. Thenext result expresses this fact for the Householder method and also providescomponentwise backward error bounds (essentially the same result holds forthe Givens method).As in Chapter 18, we will use the generic constant γ cm, in which c denotesa small integer.
We will assume implicitly that a condition holds of the formm nγ cm < 1/2.Theorem 19.3. Let A(m > n) have full rank and suppose theLS problem minx ||b – Ax||2 is solved using the Householder QR factorizationmethod. The computed solution is the exact LS solution towhere the perturbations satisfy the normwise boundsand the componentwise boundswhere ||G||F = 1.Proof. The proof is a straightforward generalization of the proof of Theorem 18.5 and is left as an exercise (Problem 19.2).396T HE LEAST S QUARES P ROBLEMAs for Theorem 18.5 (see (18.7)), Theorem 19.3 remains true if we set∆ b = 0, but in general there is no advantage to restricting the perturbationsto A.Theorem 19.3 is a strong result, but it does not bound the residual of thecomputed solution, which, after all, is what we are trying to minimize.
Howclose, then, isto minx ||b – Ax||2? We can answer this question usingthe perturbation theory of 19.1. With := b + ∆b – (A + ∆A)x := x LSand r := b – Ax, (19.6) yieldsso thatSubstituting the bounds for ∆A and ∆b from Theorem 19.3, and noting that||AA+||2 = 1, we obtainwhere cond2(A) := || |A+||A| ||2. HenceThis bound contains two parts. The term mnγ cm || |b| + |A||x| ||2 is a multipleof the bound for the error in evaluating fl(b – Ax), and so is to be expected.The factor 1 + mnγ c m cond2 (AT) will be less than 1.1 (say) provided thatcond2 (AT) is not too large. Note that cond2 (AT) < nk2 (A) and cond2 (AT) isinvariant under column scaling of A (A A diag(d i), di0).