Arndt - Algorithms for Programmers (523138), страница 5
Текст из файла (страница 5)
As in the procedure 1.4 the inner loops where swapped to save trigonometriccomputations.Extracting the ldm==1 stage of the outermost loop is again a good idea:Replace the lineforldm:=ldn to 1 step -1forldm:=ldn to 2 step -1byand insertCHAPTER 1. THE FOURIER TRANSFORM19for r:=0 to n-1 step 2{{a[r], a[r+1]} := {a[r]+a[r+1], a[r]-a[r+1]}}before the call of revbin_permute(a[], n).TBD: extraction of the j=0 case1.5Saving trigonometric computationsThe trigonometric (sin()- and cos()-) computations are an expensive part of any FFT. There are twoapparent ways for saving the involved CPU cycles, the use of lookup-tables and recursive methods.1.5.1Using lookup tablesThe idea is to save all necessary sin/cos-values in an array and later looking up the values needed.
This isa good idea if one wants to compute many FFTs of the same (small) length. For FFTs of large sequencesone gets large lookup tables that can introduce a high cache-miss rate. Thereby one is likely experiencinglittle or no speed gain, even a notable slowdown is possible. However, for a length-n FFT one does notneed to store all the (n complex or 2 n real) sin/cos-values exp(2 π i k/n), k = 0, 1, 2, 3, .
. . , n −1. Alreadya table cos(2 π i k/n), k = 0, 1, 2, 3, . . . , n/4 − 1 (of n/4 reals) contains all different trig-values that occurin the computation. The size of the trig-table is thereby cut by a factor of 8. For the lookups one canuse the symmetry relationscos(π + x)sin(π + x)= − cos(x)= − sin(x)(1.52)(1.53)(reducing the interval from 0 .
. . 2π to 0 . . . π),cos(π/2 + x)sin(π/2 + x)= − sin(x)= + cos(x)(1.54)(1.55)(reducing the interval to 0 . . . π/2) andsin(x) =cos(π/2 − x)(1.56)(only cos()-table needed).1.5.2Recursive generation of the sin/cos-valuesIn the computation of FFTs one typically needs the values{exp(i ω 0) = 1,exp(i ω δ),exp(i ω 2 δ),exp(i ω 3 δ),...}in sequence. The naive idea for a recursive computation of these values is to precompute d = exp(i ω δ)and then compute the next following value using the identity exp(i ω k δ)) = d · exp(i ω (k − 1) δ). Thismethod, however, is of no practical value because the numerical error grows (exponentially) in the process.Here is a stable version of a trigonometric recursion for the computation of the sequence: Precomputec = cos ω,(1.57)s=(1.58)α=βsin ω,1 − cos δδ= 2 (sin )22= sin δcancellation!ok.(1.59)(1.60)(1.61)CHAPTER 1.
THE FOURIER TRANSFORM20Then compute the next power from the previous as:cnextsnext= c − (α c + β s);= s − (α s − β c);(1.62)(1.63)(The underlying idea is to use (with e(x) := exp(2 π i x)) the ansatz e(ω + δ) = e(ω) − e(ω) · z which leadsto z = 1 − cos δ − i sin δ = 2 (sin 2δ )2 − i sin δ.)Do not expect to get all the precision you would get with the repeated call of the sin and cos functions,but even for very long FFTs less than 3 bits of precision are lost. When (in C) working with doublesit might be a good idea to use the type long double with the trig recursion: the sin and cos will thenalways be accurate within the double-precision.A real-world example from [FXT: fht dif core in fht/fhtdif.cc], the recursion is used if TRIG_REC is#defined:[...]double tt = M_PI_4/kh;#if defined TRIG_RECdouble s1 = 0.0, c1 = 1.0;double al = sin(0.5*tt);al *= (2.0*al);double be = sin(tt);#endif // TRIG_RECfor (ulong i=1; i<kh; i++){#if defined TRIG_RECc1 -= (al*(tt=c1)+be*s1);s1 -= (al*s1-be*tt);#elsedouble s1, c1;SinCos(tt*i, &s1, &c1);#endif // TRIG_REC[...]1.5.3Using higher radix algorithmsIt may be less apparent, that the use of higher radix FFT algorithms also saves trig-computations.
Theradix-4 FFT algorithms presented in the next sections replace all multiplications with complex factors(0, ±i) by the obvioussimpler operations. Radix-8 algorithms also simplify the special cases where sin(φ)por cos(φ) are ± 1/2. Apart from the trig-savings higher radix also brings a performance gain by theirmore unrolled structure. (Less bookkeeping overhead, less loads/stores.)1.61.6.1Higher radix DIT and DIF algorithmsMore notationAgain some useful notation, again let a be a length-n sequence.Let a(r%m) denote the subsequence of those elements of a that have subscripts x ≡ r (mod m); e.g. a(0%2)is a(even) , a(3%4) = {a3 , a7 , a11 , a15 , .
. . }. The length of a(r%m) is4 n/m.Let a(r/m) denote the subsequence of those elements of a that have indicesis a(right) , a(2/3) is the last third of a. The length of a(r/m) is also n/m.4 Throughoutthis book will m divide n, so the statement is correct.rnmn. . . (r+1)− 1; e.g. a(1/2)mCHAPTER 1.
THE FOURIER TRANSFORM1.6.221Decimation in timeFirst reformulate the radix 2 DIT step (formulas 1.43 and 1.44) in the new notation:hihin/2(0/2)F [a]= S 0/2 F a(0%2) + S 1/2 F a(1%2)hihin/2(1/2)F [a]= S 0/2 F a(0%2) − S 1/2 F a(1%2)(1.64)(1.65)(Note that S 0 is the identity operator).The radix 4 step, whose derivation is analogous to the radix 2 step, it just involves more writing anddoes not give additional insights, isIdea 1.3 (radix 4 DIT step) Radix 4 decimation in time step for the FFT:hihihihin/4(0/4)F [a]= +S 0/4 F a(0%4) + S 1/4 F a(1%4) + S 2/4 F a(2%4) + S 3/4 F a(3%4)hihihihin/4(1/4)F [a]= +S 0/4 F a(0%4) + iσS 1/4 F a(1%4) − S 2/4 F a(2%4) − iσS 3/4 F a(3%4)hihihihin/4(2/4)F [a]= +S 0/4 F a(0%4) − S 1/4 F a(1%4) + S 2/4 F a(2%4) − S 3/4 F a(3%4)hihihihin/4(3/4)F [a]= +S 0/4 F a(0%4) − iσS 1/4 F a(1%4) − S 2/4 F a(2%4) + iσS 3/4 F a(3%4)(1.66)(1.67)(1.68)(1.69)where σ = ±1 is the sign in the exponent.
In contrast to the radix 2 step, that happens to be identicalfor forward and backward transform (with both decimation frequency/time) the sign of the transformappears here.Or, more compactly:F [a]n/4(j/4)=hihi+eσ 2 i π 0 j/4 · S 0/4 F a(0%4) + eσ 2 i π 1 j/4 · S 1/4 F a(1%4)hihi+eσ 2 i π 2 j/4 · S 2/4 F a(2%4) + eσ 2 i π 3 j/4 · S 3/4 F a(3%4)(1.70)where j = 0, 1, 2, 3 and n is a multiple of 4.Still more compactly:(j/4)F [a]n/4=3Xhieσ2 i π k j/4 · S σk/4 F a(k%4)j = 0, 1, 2, 3(1.71)k=0where the summation symbol denotes element-wise summation of the sequences. (The dot indicatesmultiplication of every element of the rhs.
sequence by the lhs. exponential.)The general radix r DIT step, applicable when n is a multiple of r, is:Idea 1.4 (FFT general DIT step) General decimation in time step for the FFT:F [a](j/r)n/r=r−1Xiheσ 2 i π k j/r · S σ k/r F a(k%r)j = 0, 1, 2, .
. . , r − 1(1.72)k=01.6.3Decimation in frequencyThe radix 2 DIF step (formulas 1.50 and 1.51) was´i³hn/2(0%2)F [a]= F S 0/2 a(0/2) + a(1/2)´i³hn/2(1%2)F [a]= F S 1/2 a(0/2) − a(1/2)(1.73)(1.74)CHAPTER 1. THE FOURIER TRANSFORM22The radix 4 DIF step, applicable for n divisible by 4, isIdea 1.5 (radix 4 DIF step) Radix 4 decimation in frequency step for the FFT:(0%4)n/4(1%4)n/4(2%4)n/4(3%4)n/4(j%4)n/4F [a]F [a]F [a]F [a]====h³´iF S 0/4 a(0/4) +a(1/4) + a(2/4) +a(3/4)h³´iF S 1/4 a(0/4) + i σ a(1/4) − a(2/4) − i σ a(3/4)h³´iF S 2/4 a(0/4) −a(1/4) + a(2/4) −a(3/4)h³´iF S 3/4 a(0/4) − i σ a(1/4) − a(2/4) + i σ a(3/4)(1.75)(1.76)(1.77)(1.78)Or, more compactly:"F [a]=F Sσ j/43X#eσ 2 i π k j/4(k/4)·aj = 0, 1, 2, 3(1.79)k=0the sign of the exponent and in the shift operator is the same as in the transform.The general radix r DIF step isIdea 1.6 (FFT general DIF step) General decimation in frequency step for the FFT:#"r−1Xn/r(j%r)σ j/rσ 2 i π k j/r(k/r)F [a]= F Se·aj = 0, 1, 2, .
. . , r − 1(1.80)k=01.6.4Implementation of radix r = px DIF/DIT FFTsIf r = p 6= 2 (p prime) then the revbin_permute() function has to be replaced by its radix-p version:radix_permute(). The reordering now swaps elements x with x̃ where x̃ is obtained from x by reversingits radix-p expansion (see section 7.2).Code 1.7 (radix px DIT FFT) Pseudo code for a radix r:=px decimation in time FFT:procedure fftdit_r(a[], n, is)// complex a[0..n-1] input, result// p (hardcoded)// r == power of p (hardcoded)// n == power of p (not necessarily a power of r){radix_permute(a[], n, p)lx := log(r) / log(p) // r == p ** lxln := log(n) / log(p)ldm := (log(n)/log(p)) % lxif ( ldm != 0 ) // n is not a power of p{xx := p**lxfor z:=0 to n-1 step xx{fft_dit_xx(a[z..z+xx-1], is) // inlined length-xx dit fft}}for ldm:=ldm+lx to ln step lx{m := p**ldmmr := m/rfor j := 0 to mr-1{CHAPTER 1.
THE FOURIER TRANSFORM}}}23e := exp(is*2*PI*I*j/m)for k:=0 to n-1 step m{// all code in this block should be// inlined, unrolled and fused:// temporary u[0..r-1]for z:=0 to r-1{u[z] := a[k+j+mr*z]}radix_permute(u[], r, p)for z:=1 to r-1 // e**0 = 1{u[z] := u[z] * e**z}r_point_fft(u[], is)for z:=0 to r-1{a[k+j+mr*z] := u[z]}}Of course the loops that use the variable z have to be unrolled, the (length-px ) scratch space u[] has tobe replaced by explicit variables (e.g. u0, u1, ...
) and the r_point_fft(u[],is) shall be an inlinedpx -point FFT.With r = px there is a pitfall: if one uses the radix_permute() procedure instead of a radix-pxrevbin permute procedure (e.g. radix-2 revbin permute for a radix-4 FFT), some additional reordering isnecessary in the innermost loop: in the above pseudo code this is indicated by the radix_permute(u[],p)just before the p_point_fft(u[],is) line. One would not really use a call to a procedure, but changeindices in the loops where the a[z] are read/written for the DIT/DIF respectively.
In the code belowthe respective lines have the comment // (!).It is wise to extract the stage of the main loop where the exp()-function always has the value 1, which isthe case when ldm==1 in the outermost loop5 . In order not to restrict the possible array sizes to powersof px but only to powers of p one will supply adapted versions of the ldm==1 -loop: e.g. for a radix-4 DIFFFT append a radix 2 step after the main loop if the array size is not a power of 4.Code 1.8 (radix 4 DIT FFT) C++ code for a radix 4 DIF FFT on the array f[], the data length nmust be a power of 2, is must be +1 or -1:static const ulong RX = 4; // == rstatic const ulong LX = 2; // == log(r)/log(p) == log_2(r)voiddit4l_fft(Complex *f, ulong ldn, int is)// decimation in time radix 4 fft// ldn == log_2(n){double s2pi = ( is>0 ? 2.0*M_PI : -2.0*M_PI );const ulong n = (1<<ldn);revbin_permute(f, n);ulong ldm = (ldn&1); // == (log(n)/log(p)) % LXif ( ldm!=0 ) // n is not a power of 4, need a radix 2 step{for (ulong r=0; r<n; r+=2){Complex a0 = f[r];Complex a1 = f[r+1];5 cf.section 4.3.CHAPTER 1.