Arndt - Algorithms for Programmers (523138), страница 22
Текст из файла (страница 22)
Multiply by nr to get the inverse:i · nri · nri · nr≡≡≡i0 · nc · nr0i · (n − 1 + 1)i0(7.8)(7.9)(7.10)That is, element i will be moved to i0 = i · nr mod (n − 1).[FXT: transpose in aux2/transpose.h][FXT: transpose ba in aux2/transpose ba.h]Note that one should take care of possible overflows in the calculation i · nc .For the case that n is a power of two (and so are both nr and nc ) the multiplications modulo n − 1 arecyclic shifts.
Thus any overflow can be avoided and the computation is also significantly cheaper.[FXT: transpose2 ba in aux2/transpose2 ba.h]3 Asthe last element of the matrix is a fixed point the transposition moves around only the n − 1 elements 0 . . .
n − 2CHAPTER 7. PERMUTATIONS7.4128Revbin permutation vs. transpositionHow would you rotate an (length-n) array by s positions (left or right), without using any scratch space.If you do not know the solution then try to find it before reading on.Rotation by triple reversionThe nice little trick is to use reverse three times as in the following:template <typename Type>void rotate_left(Type *f, ulong n, ulong s)// rotate towards element #0// shift is taken modulo n{if ( s==0 ) return;if ( s>=n ){if (n<2) return;s %= n;}reverse(f,s);reverse(f+s, n-s);reverse(f,n);}Likewise for the other direction:template <typename Type>void rotate_right(Type *f, ulong n, ulong s)// rotate away from element #0// shift is taken modulo n{if ( s==0 ) return;if ( s>=n ){if (n<2) return;s %= n;}reverse(f,n-s);reverse(f+n-s, s);reverse(f,n);}[FXT: rotate left and rotate right in perm/rotate.h]What this has to do with our subject? When transposing an nr × nc matrix whose size is a power of two(thereby both nr and nc are also powers of two) the above mentioned rotation is done with the indices(written in base two) of the elements.
We know how to do a permutation that reverses the completeindices and reversing a few bits at the least significant end is not any harder:template <typename Type>void revbin_permute_rows(Type *f, ulong ldn, ulong ldnc)// revbin_permute the length 2**ldnc rows of f[0..2**ldn-1]// (f[] considered as an 2**(ldn-ldnc) x 2**ldnc matrix){ulong n = 1UL<<ldn;ulong nc = 1UL<<ldnc;for (ulong k=0; k<n; k+=nc) revbin_permute(f+k, nc);}And there we go:template <typename Type>void transpose_by_rbp(Type *f, ulong ldn, ulong ldnc)// transpose f[] considered as an 2**(ldn-ldnc) x 2**ldnc matrix{revbin_permute_rows(f, ldn, ldnc);CHAPTER 7.
PERMUTATIONS}129ulong n = 1UL<<ldn;revbin_permute(f, n);revbin_permute_rows(f, ldn, ldn-ldnc);// ... that is, columnsTBD: revbin-permute by transpositionTBD: 2dim generalization: triple shearing7.50:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:The zip permutation[ *[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*]]]]]]]]]]]]]]]]0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:[ *[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*]]]]]]]]]]]]]]]]Figure 7.3: Permutation matrices of the zip-permutation (left) and its inverse, the unzip-permutation(right). The zip-permutation moves the lower half of the array to the even indices, the upper half to theodd indices.An important special case of the ‘transposition by revbin permutation’ istemplate <typename Type>void zip(Type *f, ulong n)//// lower half --> even indices// higher half --> odd indices//// same as transposing the array as 2 x n/2 - matrix//// useful to combine real/imag part into a Complex array//// n must be a power of two{ulong nh = n/2;revbin_permute(f, nh); revbin_permute(f+nh, nh);revbin_permute(f, n);}[FXT: zip in perm/zip.h]The effect of the code is the same as that oftemplate <typename Type>void zip(const Type * restrict f, Type * restrict g, ulong n)// n must be even{ulong nh = n/2;for (ulong k=0, k2=0; k<nh; ++k, k2+=2) g[k2] = f[k];for (ulong k=nh, k2=1; k<n;++k, k2+=2) g[k2] = f[k];}CHAPTER 7.
PERMUTATIONS130Here the array length does not need to be a power of two.For the type double the ‘zip by revbin-permute’ technique can be optimized slightly.void zip(double *f, long n){revbin_permute(f, n);revbin_permute((Complex *)f, n/2);}Here it is assumed that the type Complex consists of two doubles lying contiguous in memory.The inverse of zip is unzip:template <typename Type>void unzip(Type *f, ulong n)//// inverse of zip():// put part of data with even indices// sorted into the lower half,// odd part into the higher half//// same as transposing the array as n/2 x 2 - matrix//// useful to separate a Complex array into real/imag part//// n must be a power of two{ulong nh = n/2;revbin_permute(f, n);revbin_permute(f, nh); revbin_permute(f+nh, nh);}[FXT: unzip in perm/zip.h] which can for the type double again be optimized asvoid unzip(double *f, long n){revbin_permute((Complex *)f, n/2);revbin_permute(f, n);}[FXT: unzip in perm/zip.cc] TBD: zip for length not a power of two0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:[ *[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*]]]]]]]]]]]]]]]]0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:[ *[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*]]]]]]]]]]]]]]]]Figure 7.4: Revbin-permutation matrices that, when multiplied together, give the zip-permutation andits inverse.
Let L,R be the permutations given on the left,right side, respectively. Then Z = R L andZ −1 = L R.While the above mentioned technique is usually not a gain for doing a transposition it may be usedto speed up the revbin_permute itself. Let us operator-ize the idea to see how. Let R be theCHAPTER 7. PERMUTATIONS131revbin-permutation revbin_permute, T (nr , nc ) the transposition of the nr × nc matrix and R(nc ) therevbin_permute_rows. ThenT (nr , nc )= R(nr ) · R · R(nc )(7.11)The R-operators are their own inverses while T is in general not self inverse4 .R=R(nr ) · T (nr , nc ) · R(nc )(7.12)There is a degree of freedom in this formula: for fixed n = nr × nc one can choose one of nr and nc (onlytheir product is given).7.60:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:The reversed zip-permutation[ *][* ][*][*][*][*][*][*][*][*][*][*][*][*][*][*]0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:[ *][*][*][*][*][*][*][*][* ][*][*][*][*][*][*][*]Figure 7.5: Permutation matrices of the reversed zip-permutation (left) and its inverse, the reversedunzip-permutation (right).A permutation closely related to the zip permutation istemplate <typename Type>void zip_rev(const Type * restrict x, Type * restrict y, ulong n)// n must be even{const ulong nh = n/2;for (ulong k=0, k2=0;k<nh; k++, k2+=2) y[k2] = x[k];for (ulong k=nh, k2=n-1; k<n;k++, k2-=2) y[k2] = x[k];}[FXT: zip rev in perm/ziprev.h] The in-place version can (if the array length is a power of two) beimplemented astemplate <typename Type>void zip_rev(Type *x, ulong n)// n must be a power of two{const ulong nh = n/2;reverse(x+nh, nh);revbin_permute(x, nh); revbin_permute(x+nh, nh);revbin_permute(x, n);}The inverse permutation, called unzip_rev, [FXT: unzip rev in perm/ziprev.h] can be implemented as4 Fornr = nc it of course is.CHAPTER 7.
PERMUTATIONS132template <typename Type>void unzip_rev(const Type * restrict x, Type * restrict y, ulong n)// n must be even{const ulong nh = n/2;for (ulong k=0, k2=0;k<nh; k++, k2+=2) y[k] = x[k2];for (ulong k=nh, k2=n-1; k<n;k++, k2-=2) y[k] = x[k2];}The in-place version istemplate <typename Type>void unzip_rev(Type *x, ulong n)// n must be a power of two{const ulong nh = n/2;revbin_permute(x, n);revbin_permute(x, nh); revbin_permute(x+nh, nh);reverse(x+nh, nh);}The given permutation is used in an algorithm where the cosine transform is computed using the Hartleytransform (see section 3.6, page 64).7.7The XOR permutation0:1:2:3:4:5:6:7:[ *[*[*[*[*[*[*[*x = 0]]]]]]]][*][ *][*][*][*][*][* ][*]x = 1[*][*][ *][*][*][* ][*][*]x = 2[*][*][*][ *][* ][*][*][*]x = 30:1:2:3:4:5:6:7:[*][*][*][* ][ *][*][*][*]x = 4[*][*][* ][*][*][ *][*][*]x = 5[*][* ][*][*][*][*][ *][*]x = 6[* ][*][*][*][*][*][*][ *]x = 7Figure 7.6: Permutation matrices of the XOR permutation for length 8 with parameter x = 0 .
. . 7.Compare to the table for the dyadic convolution given on page 86.The XOR permutation may be explained most simply by its trivial implementation: [FXT: xor permutein perm/xorpermute.h]:template <typename Type>void xor_permute(Type *f, ulong n, ulong x){if ( 0==x ) return;for (ulong k=0; k<n; ++k){ulong r = k^x;if ( r>k ) swap(f[r], f[k]);}}CHAPTER 7. PERMUTATIONS133The XOR permutation is evidently self-inverse. n must be divisible by the smallest power of two that isgreater than x: for example, n must be even if x = 1, n must be divisible by four if x = 2 of x = 3.
Withn a power of two and x < n one is on the safe side.The XOR permutation contains a few other permutations as important special cases (for simplicity assumethat the array length n is a power of two): when the third argument x equals n − 1 then the permutationis the reversion, with x = 1 neighboring even and odd indexed elements are swapped, with x = n/2 theupper and the lower half of the array are swapped.One hasXa Xb= Xb Xa = Xc(7.13)where c = a XOR b.
For the special case a = b the relation expresses the self-inverse property (asX0 = id).The XOR permutation often occurs in relations between other permutations where we will use the symbol Xx , the subscript denoting the third argument.7.8The Gray code permutation0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:[ *][*][*][*][*][*][*][*][* ][*][*][*][*][*][*][*]0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:[ *][*][*][*][*][*][*][*][*][*][* ][*][*][*][*][*]Figure 7.7: Permutation matrices of the Gray code permutation (left) and its inverse (right).The Gray code permutation reorders (length-2n ) arrays according to the Gray codestatic inline ulong gray_code(ulong x){return x ^ (x>>1);}which is most easily demonstrated with the according routine that does not work in-place ([FXT: fileperm/graypermute.h]):template <typename Type>inline void gray_permute(const Type *f, Type * restrict g, ulong n)// after this routine// g[gray_code(k)] == f[k]{for (ulong k=0; k<n; ++k) g[gray_code(k)] = f[k];}Its inverse istemplate <typename Type>CHAPTER 7.
PERMUTATIONS134inline void inverse_gray_permute(const Type *f, Type * restrict g, ulong n)// after this routine// g[k] == f[gray_code(k)]// (same as: g[inverse_gray_code(k)] == f[k]){for (ulong k=0; k<n; ++k) g[k] = f[gray_code(k)];}It also uses calls to gray_code() because they are cheaper than the computation ofinverse_gray_code(), cf. 8.12.It is actually possible to write an in-place version of the above routines that offers extremely goodperformance. The underlying observation is that the cycle leaders (cf. 7.13) have an easy pattern and canbe efficiently generated using the ideas from 8.4 (detection of perfect powers of two) and 8.9 (enumerationof bit subsets).template <typename Type>void gray_permute(Type *f, ulong n)// in-place version{ulong z = 1; // mask for cycle maximaulong v = 0; // ~zulong cl = 1; // cycle lengthfor (ulong ldm=1, m=2; m<n; ++ldm, m<<=1){z <<= 1;v <<= 1;if ( is_pow_of_2(ldm) ){++z;cl <<= 1;}else ++v;bit_subset b(v);do{// --- do cycle: --ulong i = z | b.next(); // start of cycleType t = f[i];// save start valueulong g = gray_code(i); // next in cyclefor (ulong k=cl-1; k!=0; --k){Type tt = f[g];f[g] = t;t = tt;g = gray_code(g);}f[g] = t;// --- end (do cycle) --}while ( b.current() );}}The inverse looks similar, the only actual difference is the do cycle block:template <typename Type>void inverse_gray_permute(Type *f, ulong n)// in-place version{ulong z = 1;ulong v = 0;ulong cl = 1;for (ulong ldm=1, m=2; m<n; ++ldm, m<<=1){z <<= 1;v <<= 1;if ( is_pow_of_2(ldm) ){++z;cl <<= 1;}CHAPTER 7.