Arndt - Algorithms for Programmers (523138), страница 49
Текст из файла (страница 49)
Find a starting approximation x0 ≈ √1b then iteratexk+1=xk + xk√1dthen a final multiply with d gives(1 − d x2k )2(13.27)until the desired precision is reached. Convergence is again 2nd order: if xk =xk+1=1√bµ311 − e2 − e3221√(1b+ e) then¶(13.28)4 The asymptotics of the multiplication is set to ∼ N (instead of N log(N )) for the estimates made here, this gives arealistic picture for large N .5 using a second order iterationCHAPTER 13. ARITHMETICAL ALGORITHMS285Similar considerations as above (with squaring consideredas expensive as multiplication6 ) give an oper√1ation count of 4 multiplications for √d or 5 for d.Note that this algorithm is considerably better than the one where xk+1 := 21 (xk + xdk ) is used as iteration,because no long divisions are involved.In hfloat, when the achieved precision is below a certain limit a third order correction is used to assuremaximum precision at the last step:xk+1=xk + xk(1 − d x2k )3 (1 − d x2k )2+ xk28(13.29)An improved versionActually, the ‘simple’ version of √the square root iteration can be√used for practical purposes when rewrittenas a coupled iteration for both d and its inverse.
Using for d the iterationxk+1(x2k − d)2 xk(x2 − d)xk − vk+1 k2=xk −=(13.30)where v ≈ 1/x(13.31)√and for the auxiliary v ≈ 1/ d the iterationvk+1 = vk + vk (1 − xk vk )(13.32)where one starts with approximations√≈d≈ 1/x0x0v0(13.33)(13.34)and the v-iteration step precedes that for x. When carefully implemented this method turns out to besignificantly more efficient than the preceding version. [hfloat: src/hf/itsqrt.cc]TBD: details & analysis13.3.3TBD: last step versions for sqrt and invCube root extractionUse d1/3 = d (d2 )−1/3 , i.e. compute the inverse third root of d2 using the iterationxk+1=xk + xk(1 − d2 x3k )3(13.35)finally multiply with d.Convergence is 2nd order: if xk =1√3 (1bxk+113.4+ e) then=µ¶411 − 2e2 − e3 − e433(13.36)Square root extraction for rationalsFor rational x =pqthe well known iteration for the square root isΦ2 (x)6 Indeed1√3bit costs about23of a multiplication.=p2 + d q 2x2 + d=2x2pq(13.37)CHAPTER 13.
ARITHMETICAL ALGORITHMS286A general formula for an k-th order (k ≥ 2) iteration towardΦk (x)=√√³√ ´k√ ´k ³x+ d + x− d√d³=d´³´√ k√ kx+ d − x− dd is√ ´k√ ´k ³p+q d + p−q d³√ ´k ³√ ´kp+q d − p−q d³(13.38)Obviously, we have:Φm (Φn (x)) =All√Φmn (x)(13.39)d vanish when expanded, e.g. the third and fifth order versions areΦ3 (x)Φ5 (x)x2 + 3dp p2 + 3d q 2=3x2 + dq 3p2 + d q 242x + 10dx + 5d2= x 45x + 10dx2 + d2= x(13.40)(13.41)There is a nice expression for the error behavior of the k-th order iteration:√ 1+eΦk ( d ·)1−e=√d·1 + ek1 − ekAn equivalent form of 13.38 comes from the theory of continued fractions:µ¶√xΦk (x) =d cot k arccot √d(13.42)(13.43)√The iterations can also be obtained using Padé-approximants.
Let P[i,j] (z) be the Padé-expansion of zaround z = 1 of order [i, j]. An iteration of order i + j + 1 is given by x P[i,j] ( xd2 ). For i = j one getsthe iterations of odd orders, for i = j + 1 the even orders are obtained. Different combinations of i andj result in alternative iterations:[i, j] 7→x P[i,j] (d)x2(13.44a)2[1, 0] 7→[0, 1] 7→[1, 1] 7→[2, 0] 7→[0, 2] 7→x +d2x2x33x2 − dx2 + 3dx 23x + d3x4 + 6dx2 − 3d28x38x515x4 − 10dx2 + 3d2(13.44b)(13.44c)(13.44d)(13.44e)(13.44f)CHAPTER 13. ARITHMETICAL ALGORITHMSStill other forms are obtained by usingdx2872P[i,j] ( xd ):[i, j] 7→[1, 0] 7→[0, 1] 7→[1, 1] 7→[2, 0] 7→[0, 2] 7→dx2P[i,j] ( )xdx2 + d2x2d23dx − x3d (d + 3x3 )x (3d + x2 )−x4 + 6dx2 + 3d28xd8d343x − 10dx2 + 15d2(13.45a)(13.45b)(13.45c)(13.45d)(13.45e)(13.45f)√Using the expansion of 1/ x and x P[i,j] (x2 d) we get:[i, j]7→x P[i,j] (x2 d)(13.46a)2[1, 0]7→[0, 1]7→[1, 1]7→[2, 0]7→[0, 2]7→x (3 − d x )22xdx2 − 1dx2 + 3x3dx2 + 1x (3d2 x4 − 10dx + 15)88x−d2 x4 + 6dx2 + 3(13.46b)(13.46c)(13.46d)(13.46e)(13.46f)Extraction of higher roots for rationals√The Padé idea can be adapted for higher√roots: use the expansion of a z around z = 1 then x P[i,j] ( xda )produces an order i + j + 1 iteration for a z.
A second order iteration is given byµ¶d − xa(a − 1) xa + d1dΦ2 (x) = x +==(a − 1) x + a−1(13.47)a xa−1a xa−1ax√A third order iteration for a d isΦ3 (x) =x·p α pa + β q a dα xa + β d=·β xa + α dq β pa + α q a d(13.48)where α = a − 1, β = a + 1 for a even, α = (a − 1)/2, β = (a + 1)/2 for a odd.√With 1/ a x and x P[i,j] (xa d) division-free iterations for the inverse a-th root of d are obtained, seesection 13.5.
If you suspect a general principle behind the Padé idea, yes there is one: read on untilsection 13.8.4.13.5A general procedure for the inverse n-th rootThere is a nice general formula that allows to build iterations with arbitrary order of convergence ford−1/a that involve no long division.CHAPTER 13. ARITHMETICAL ALGORITHMS288One uses the identityd−1/a==−1/ax (1 − (1 − xa d))x (1 − y)−1/a wherey := (1 − xa d)(13.49)(13.50)Taylor expansion givesd−1/a=x∞X(1/a)k̄ y k(13.51)k=0where z k̄ := z(z + 1)(z + 2) .
. . (z + k − 1). Written out:µy (1 + a) y 2(1 + a)(1 + 2a) y 3−1/ad++= x 1+ +a2 a26 a3!Qn−1(1+ka)(1 + a)(1 + 2a)(1 + 3a) y 4n++ · · · + k=1 ny + ...24 a4n! a(13.52)A n-th order iteration for d−1/a is obtained by truncating the above series after the (n − 1)-th term,Φn (a, x):=xn−1X(1/a)k̄ y k(13.53)k=0xk+1=Φn (a, xk )(13.54)Convergence is n-th order:Φn (d−1/a (1 + e)) = d−1/a (1 + O(en ))(13.55)Second order is:Φ2 (a, x) :=Convergence: if x =1√a (1bx+x(1 − dxa )a(13.56)+ e) thenΦ2 (a, x) ==hi´1 ³√(1 + e) (1 + e)a − (a + 1)abµ¶1a+1 23√1−e−O(e)a2b(13.57)(13.58)Example 1: a = 1 (computation of the inverse of d):1=dΦ(1, x) =11−y¡¢x 1 + y + y2 + y3 + y4 + .
. .x(13.59)(13.60)Φ2 (1, x) = x (1 + y) is the iteration 13.14 on page 283.Convergence:1Φk (1, (1 + e)) =d¢1¡1 − (−e)kd(13.61)Composition:Φn m= Φn (Φm )(13.62)CHAPTER 13. ARITHMETICAL ALGORITHMS289There are simple closed forms for this iteration1 − yk1 − yk=xd1−y= x (1 + y) (1 + y 2 ) (1 + y 4 ) (1 + y 8 ) . . .Φk=Φk(13.63)(13.64)Example 2: a = 2 (computation of the inverse square root of d):1√d1= x√1−yÃ!¡2k¢ ky 3 y25 y335 y 4k y= x 1+ ++++ ··· ++ ...28161284k(13.65)(13.66)Φ2 (2, x) = x (1 + y/2) is the iteration 13.27 on page 284.An expression for the error behavior of the n-th order iteration similar to formula 13.42 is¢Pn ¡2n−1¢P2n+1 ¡(−e)k − k=n+1 2n−1(−e)kk=0kk−1/a 1 + e−1/a) = dΦn (d1−e(1 − e)2n−1Pn ¡2n−1¢k1+ck=0 n−k (−e)n¢=where c = e Pn ¡2n−11−c(−e)kk=0k(13.67)(13.68)e.g.
for k = 4 the fraction on the right hand side isΦ4 (d−1/a1+e)/d−1/a1−e1 − 7 e + 21 e2 − 35 e3 − 35 e4 + 21 e5 − 7 e6 + e71 − 7 e + 21 e2 − 35 e3 + 35 e4 − 21 e5 + 7 e6 − e73e − 7 e2 + 21 e − 35e41 − 7 e + 21 e2 − 35 e31 − 70e4 − 448e5 − 1680e6 − 4800e7 − . . .=: F =(13.69)c=(13.70)F=(13.71)The coefficients of the Taylor expansion are always integers.
There is always a partial fraction decomposition like7016814040F = 1−(13.72)4 −5 −6 −7(e − 1)(e − 1)(e − 1)(e − 1)Composition is not as trivial as for the inverse, e.g.:Φ4 − Φ2 (Φ2 )1x (y)416(13.73)x P (y) y n m(13.74)= −In general, one hasΦn m − Φn (Φm ) =2where P is a polynomial in y = 1 − d x . Also, in general Φn (Φm ) 6= Φm (Φn ) for n 6= m, e.g.:1515x (x2 d) y 6 =x (1 − y) y 610241024√of the second-order iteration for 1/ d:µ¶1= x 1+ ywhere y = 1 − d x22¶µ¶µ111 + y 2 (3 + y)= x 1+ y28µ¶1= Φ2 (x) 1 + y 2 (3 + y)8¶µ¡¢1 4= Φ2 (Φ2 (x)) 1 +y (3 + y)2 12 + y 2 (3 + y)512Φ3 (Φ2 ) − Φ2 (Φ3 )Product forms for compositionsΦ2 (x)Φ2 (Φ2 (x))Φ2 (Φ2 (Φ2 (x)))=(13.75)(13.76)(13.77)(13.78)(13.79)CHAPTER 13.
ARITHMETICAL ALGORITHMS13.6290Re-orthogonalization of matricesA task from graphics applications: a rotation matrix A that deviates from being orthogonal7 shall betransformed to the closest orthogonal matrix E. It is well known that (see e.g. [59])E1A (AT A)− 2=(13.80)With the division-free iteration for the inverse square rootµ¶13522 22 3Φ(x) = x 1 + (1 − dx ) + (1 − dx ) +(1 − dx ) + . . .2816(13.81)at hand the given task is pretty easy: As AT A is close to unity (the identity matrix) we can use the(second order) iteration with d = AT A and x = 1µ¶1 − AT AT− 12(A A)≈1+(13.82)2and multiply by A to get a ‘closer-to-orthogonal’ matrix A+ :¶µ1 − AT AA+ = A 1 +2≈E(13.83)The step can be repeated with A+ (or higher orders can be used) if necessary. Note the identical equationwould be obtained when trying to compute the inverse square root of 1:µ¶1 − x2x+ = x 1 +→1(13.84)2It is instructive to write things down in the SVD8 -representationA=U ΩV T(13.85)where U and V are orthogonal and Ω is a diagonal matrix with non-negative entries (cf.
[61]). The SVDis the unique decomposition of the action of the matrix as: rotation – element wise stretching – rotation.NowAT A=¡¢ ¡¢V ΩU TU ΩV T = V Ω2 V Tand positive exponents of A are ‘absorbed’ as powers of Ω¡¢nU ΩV T= U Ωn V T(13.86)(13.87)while for negative exponents¡¢−nU ΩV T=V Ω−n U T(13.88)Thereby1(AT A)− 2³=(V ΩU T ) (U ΩV T )´− 12¡¢− 1= V Ω2 V T 2 = V Ω−1 V T(13.89)and we have1A (AT A)− 27 typically8 singular=¡U ΩV T¢ ¡¢V Ω−1 V T = U V Tdue to cumulative errors from multiplications with many incremental rotationsvalue decomposition(13.90)CHAPTER 13. ARITHMETICAL ALGORITHMS291that is, the ‘stretching part’ was removed.Observe thatE=¶ µ¶µ1 − AT+ A+1 − AT AA· 1+· 1+· .
. . =: A P22(13.91)i.e. P is the accumulated product of the expressions in parenthesis of the right hand side of the iteration 13.83. One has P = V Ω−1 V T and A = P −1 E. This resembles The so called polar decomposition³´³´A = W H = A(AT A)−1/2(AT A)1/2(13.92)where H is the (unique, positive semidefinite) square root H = (AT A)1/2 and W = A(AT A)−1/2 , Wis orthogonal and H = H T (cf. [62]). One has W = E and H = W −1 A = W T A. Compute the polardecomposition asP0=Pk+1=Ek+1=1E0 = A¶µ1 − EkT EkPk 1 +2Ek Pk+1 → E = W(13.93a)→P =H(13.93b)(13.93c)higher orders can be added in the computation of Pk+1 . The polar decomposition can be seen as ananalogue to z = r ei φ for z ∈ C, identify H ∼ r, W ∼ ei φ .Similarly, [62] defines a sign decompositionA=³´³´= A(A2 )−1/2(A2 )1/2SN(13.94)where N = (A2 )−1/2 and S = A (A2 )−1/2 = A N (A, S and N commute pairwise).