Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 98
Текст из файла (страница 98)
Lemma 3.5 show that (3.2) holds provided that each (1 + δ )k product is replacedbyxi yi . Thus we haveIt is easy to show that |αι| < γ n+2, so the only change required in (3.4) is to replaceγ n by γn+2. The complex analogue of (3.10) is = (A+ ∆A)x, |∆A| < γn + 2 |A|.S OLUTIONS540TOP ROBLEMS3.8. Without loss of generality we can suppose that the columns of the product arecomputed one at a time. With xj = A1 . . . Akej we have, using (3.10),and so, by Lemma 3.6,Squaring these inequalities and summing over j yields||A1 . . . Ak – fl(Al . .
. Ak)||F <which gives the result.Note that the product ||A1||2 . . . ||Ak||2 can be much smaller than ||A||F . . . ||Ak||F;the extreme case occurs when the Ai are orthogonal.3.9. We have fl((x + y)(x – y)) = (x + y)(x – y)(1 + θ3), |θ3| < γ3 = 3u/(1 – 3u), sothe computed result has small relative error. Moreover, if y/2 < x < y then x – y iscomputed exactly, by Theorem 2.5, hence fl((x + y)(x – y)) = (x + y)(x – y) (1 + θ2).However, fl(x2 – y2) = x2 (1 + θ2) – y2 (1 + θ´2), so thatand we cannot guarantee a small relative error.If |x| >> |y| then fl(x2 – y2) suffers only two rounding errors, since the error informing f1(y2) will not affect the final result, while fl((x + y)(x – y)) suffers threerounding errors; in this case fl(x2 – y2) is likely to be the more accurate result.3.10. Assume the result is true for m – 1.
NowsoS OLUTIONSTOP ROBLEMS5413.11. The computations can be expressed asWe havewhere |δi |,< u. Solving these recurrences, we find thatIt follows thatwhich shows thathowever, we find thatdiffers negligibly from ym+l. For the repeated squarings,where we have used Lemma 3.1. Hence the squarings introduce a relative error thatcan be approximately as large as 2mu. Since u = 2 –53 this relative error is of order0.1 for m = 50, which explains the observed results for m = 50.For m = 75, the behaviour on the Sun is analogous to that on the HP calculatordescribed in §1.12.2.
On the 486DX, however, numbers less than 1 are mapped to1. The difference is due to the fact that the 486DX uses double rounding and theSun does not; see Problem 2.9.3.12. The analysis is just a slight extension of that for an inner product. Theanalogue of (3.3) isHenceSetting M = max{ |f(x)| : a < x < b}, we have(A.2)542S OLUTIONSTOP ROBLEMSAny reasonable quadrature rule designed for polynomic f hassoone implication of (A.2) is that it is best not to have weights of large magnitude andvarying sign; ideally, wi > 0 for all i (as for Gaussian integration rules, for example),so that4.1.
A condition number isC(x) = maxIt is easy to show thatThe condition number is 1 if the xi all have the same sign.4.2. In the (i – 1)st floating point addition the “2k-t” portion of xi does notpropagate into the sum (assuming that the floating point arithmetic uses round tonearest with ties broken by rounding to an even last bit or rounding away fromzero), thus there is an error of 2k-t and = i.
The total error iswhile the upper bound of (4.4) iswhich agrees with the actual error to within a factor 3; thus the smaller upperbound of (4.3) is also correct to within this factor. The example just quoted is, ofcourse, a very special one, and as Wilkinson [1088, 1963, p. 20] explains, “in orderto approach the upper bound as closely as this, not only must each error take itsmaximum value, but all the terms must be almost equal.”4.3.
With S k =we haveBy repeated use of this relation it follows thatwhich yields the required expression for The bound on |En| is immediate.The bound is minimized if the xi are in increasing order of absolute value. Thisobservation is common in the literature and it is sometimes used to conclude thatthe increasing ordering is the best one to use. This reasoning is fallacious, because”minimizing an error bound is not the same as minimizing the error itself. As (4.3)shows, if we know nothing about the signs of the rounding errors then the “best”ordering to choose is one that minimizes the partial sums.S OLUTIONSTOP ROBLEMS5434.4.
Any integer between 0 and 10 inclusive can reproduced. For example, fl (1 +2 + 3 + 4 + M - M) = 0, fl(M - M + l + 2 + 3 + 4) = 10, and fl(2 + 3 + M - M + 1 + 4) = 5.4.5. This method is sometimes promoted on the basis of the argument that itminimizes the amount of cancellation in the computation of Sn. This is incorrect:the “±” method does not reduce the amount of cancellation—it simply concentratesall the cancellation into one step. Moreover, cancellation is not a bad thing per se,as explained in §l.7.The “±’ method is an instance of Algorithm 4.1 (assuming that S+ and S–are computed using Algorithm 4.1) and it is easy to see that it maximizes max i |Ti |over all methods of this form (where, as in §4.2, Ti is the sum computed at the i t hthe value of maxi |Ti | tends to bestage).
Moreover, whenmuch larger for the “±” method than for the other methods we have considered.4.6. The main concern is to evaluate the denominator accurately when the xi areclose to convergence. The bound (4.3) tells us to minimize the partial sums; theseare, approximately, for xi ξ, (a) −ξ, 0, (b) 0, 0, (c) 2ξ, 0. Hence the error analysisof summation suggests that (b) is the best expression, with (a) a distant second.That (b) is the best choice is confirmed by Theorem 2.5, which shows there will beonly one rounding error when the xi are close to convergence.
A further reason toprefer (b) is that it is less prone to overflow than (a) and (c).4.7. This is, of course, not a practical method, not least because it is very proneto overflow and underflow. However, its error analysis is interesting. Ignoring theerror in the log evaluation, and assuming that exp is evaluated with relative errorbounded by u, we have, with |δ| < u for all i, and for some δ2 nHence the best relative error bound we can obtain isClearly, this method of evaluation guarantees a small absolute error, but not a smallrelative error when |Sn| << 1.4.8.
Method (a) is recursive summation of a, h, h,. . . . h. From (4.3) we have |a +Hence, sinceFor (b), using the relative error counter notation (3.9),a <1> + ih <3>. HenceSOLUTIONS544For (c),TOP ROBLEMS= a(1 – <l> i/n) <2> + (i/n) b<3>, henceThe error bound for (b) is about a factor i smaller than that for (a). Note thatmethod (c) is the only one guaranteed to yield= b (assuming fl(n/n) = 1, asholds in IEEE arithmetic), which may be important when integrating a differentialequation to a given end-point.If a > 0 then the bounds implyThus (b) and (c) provide high relative accuracy for all i, while the relative accuracyof (a) can be expected to degrade as i increases.5.1.
By differentiating the Homer recurrence qi = xqi+1 + ai , qn = an, we obtainThe factors 2, 3, . . . . can be removed by redefiningThen5.2. Analysis similar to that for Homer’s rule shows thatfl(p(x)) = a0<n> + a1x<n + 1> + · · · + anxn<n + 1>.The total number of rounding errors is the same as for Homer’s algorithm, but theyare distributed more equally among the terms of the polynomial. Homer’s rule canbe expected to be more accurate when the terms |ai xi | decrease rapidly with i, suchas when p(x) is the truncation of a rapidly convergent power series. Of course, thisalgorithm requires twice as many multiplications as Homer’s method.5.3.
Accounting for the error in forming y, we have, using the relative error counternotation (3.9),Thus the relative backward perturbations are bounded by (3n/2 + 1)u instead of2nu for Homer’s rule.S OLUTIONSTOP ROBLEMS5455.4. Here is a MATLAB M-file to perform the task.function%LEJA%%%[a, perm] = leja(a)LEJA ordering.[A, PERM] = LEJA(A) reorders the points A by theLeja ordering and returns the permutation vector thateffects the ordering in PERM.n = max(size(a));perm = (l:n)’;% a(1) = max(abs(a)).[t, i] = max(abs(a));if i ~= 1a([1 i]) = a([i 11);perm([1 i]) = perm([i l]);endp = ones(n,l);for k = 2 : n - 1for i = k:np(i) = p(i)*(a(i)-a(k-1)) ;end[t, il = max(abs(p(k:n)));i = i+k-l;if i ~= ka([k i]) = a([i k]);p([k i]) = p([ik]);perm([k i]) = perm([i k]);endend5.5.
It is easy to show that the computed satisfies = p(x)(1 + θ 2 n + 1), |θ 2n+1 | <γ 2n+1. Thus has a tiny relative error. Of course, this assumes that the roots xiare known exactly!6.1. Forthen, using the Cauchy–Schwarz inequality,The first inequality is an equality iff |aij| = α, and the second inequality is anequality iff A is a multiple of a matrix with orthonormal columns. If A is realand square, these requirements are equivalent to A being a scalar multiple of aHadamard matrix. If A is complex and square, the requirements are satisfied by thegiven Vandermonde matrix, which istimes a unitary matrix.6.2.S OLUTIONS546TOP ROBLEMS6.3. By the Holder inequality,(A.3)We now show that equality is possible throughout (A.3).
Let x satisfy ||A|| =||Ax||/||x|| and let y be dual to Ax. ThenRe y*Ax = y*Ax = ||y||D||Ax|| = ||y||D||A|| ||x||,as required.6.4. From (6.19) we have ||Mn||pBut by taking x in thedefinition (6. 11) to be the vector of all ones, we see that ||Mn||P > µ n .6.5.
If A = PDQ* is an SVD then||AB||F = ||PDQ*B||F = ||DQ*B||F= ||A||2||B||F.Similarly, ||BC||F < ||B||F||C||2, and these two inequalities together imply the required one.6.6. By (6.6) and (6.8) it suffices to show that ||A-1||β,α =We have6.7. Let λ be an eigenvalue of A and x the corresponding eigenvector, and form thematrix X = [x, x,. . . ,x]Then AX =showing that |λ| < ||A||. For a subordinate norm it suffices to take norms in theequation Ax = λx.6.8.
The following proof is much simpler than the usual proof based on diagonalscaling to make the off-diagonal of the Jordan matrix small (see, e.g., Horn and Johnson [580, 1985, Lem. 5.6.10]). The proof is from Ostrowski [812, 1973, Thin. 19.3].Let δ −1 A have the Jordan canonical form δ−1 A = XJX--1. We can writewhere D = diag(λ i ) and the λi are the eigenvalues of A. ThenA = X(D + δN)X-1, soNote that we actually have ||A|| = ρ(A) + δ if the largest eigenvalue occurs in aJordan block of size greater than 1.