Arndt - Algorithms for Programmers (523138), страница 16
Текст из файла (страница 16)
Even the one found to be superior due to its more localizedaccess is guaranteed to have a performance problem as soon as the array is long enough: all accesses areseparated by a power-of-two distance and cache misses will occur beyond a certain limit.
Rather bizarreattempts like inserting ‘pad data’ have been reported in order to mitigate the problem. The Gray codepermutation described in section 7.8 allows a very nice and elegant solution where the sub-arrays arealways accessed in mutually reversed order.template <typename Type>void walsh_gray(Type *f, ulong ldn)// decimation in frequency (DIF) algorithm{const ulong n = (1UL<<ldn);CHAPTER 5. THE WALSH TRANSFORM AND ITS RELATIVES}85for (ulong ldm=ldn; ldm>0; --ldm) // dif{const ulong m = (1UL<<ldm);for (ulong r=0; r<n; r+=m){ulong t1 = r;ulong t2 = r + m - 1;for ( ; t1<t2; ++t1,--t2){Type u = f[t1];Type v = f[t2];f[t1] = u + v;f[t2] = u - v;}}}[FXT: walsh gray in walsh/walshgray.h]The transform is not self-inverse, however its inverse can be implemented trivially:template <typename Type>void inverse_walsh_gray(Type *f, ulong ldn)// decimation in time (DIT) algorithm{const ulong n = (1UL<<ldn);for (ulong ldm=1; ldm<=ldn; ++ldm) // dit{const ulong m = (1UL<<ldm);for (ulong r=0; r<n; r+=m){ulong t1 = r;ulong t2 = r + m - 1;for ( ; t1<t2; ++t1,--t2){Type u = f[t1];Type v = f[t2];f[t1] = u + v;f[t2] = u - v;}}}}[FXT: inverse walsh gray in walsh/walshgray.h]The relation between walsh wak() and walsh gray() is thatinverse_gray_permute(f, n);walsh_gray(f, ldn);grs_negate(f, n);is equivalent to the call walsh wak(f, ldn).
The third line is a necessary fix-up for certain elementsthat have the wrong sign if uncorrected:template <typename Type>void grs_negate(Type *f, ulong n)// negate elements at indices where the// Golay-Rudin-Shapiro is negative.{for (ulong k=0; k<n; ++k){if ( grs_negative_q(k) ) f[k] = -f[k];}}[FXT: grs negate in aux1/grsnegate.h] The function grs negative q() is described in section 8.12.Using Q for the grs_negative_q-line we haveWk=Q Wg G−1 = G Wg−1 Q(5.25)CHAPTER 5.
THE WALSH TRANSFORM AND ITS RELATIVES86The same idea can be used with the Fast Fourier Transform. However, the advantage of the improvedaccess pattern is usually more than compensated by the increased number of sin/cos-computations (thetwiddle factors appear reordered so n · log n instead of n computations are necessary) cf. [FXT: filefft/gfft.cc].5.4Dyadic convolutionWalsh’s convolution has XOR where the usual one has plusUsingtemplate <typename Type>void dyadic_convolution(Type * restrict f, Type * restrict g, ulong ldn){walsh_wak(f, ldn);walsh_wak(g, ldn);for (ulong k=0; k<n; ++k) g[k] *= f[k];walsh_wak(g, ldn);}one gets the so called dyadic convolution defined byh=hτ:=a~ bˆXxˆ(5.26)ax by(5.27)y=τwhere the symbol ˆ stands for the XOR operator.The table equivalent to 2.1 is+-|0:1:2:3:01234567012310322301321045675476674576548 9 10 119 8 11 1010 11 8 911 10 9 8121314154:5:6:7:45675476674576540123103223013210121314151312151414151213151413128 9 10 119 8 11 1010 11 8 911 10 9 88:9:10:11:8 9 10 119 8 11 1010 11 8 911 10 9 8121314151312151414151213151413120123103223013210456754766745765412:13:14:15:121314158 9 10 119 8 11 1010 11 8 911 10 9 84567547667457654012310322301321013121514141512131514131289 10 1112 13 14 15131215141415121315141312The nearest equivalent to the acyclic convolution can be computed using a sequence that has bothprepended and appended runs of n/2 zeros:+-|0:1:2:3:01234567012310322301321045675476674576548 9 10 119 8 11 1010 11 8 911 10 9 8121314154:5:6:7:4567547667457654012310322301321012131415151413128 9 10 119 8 11 1010 11 8 911 10 9 824 25 26 2728 29 30 318:16 17 18 1920 21 22 2389 10 11131215141415121312 13 14 15131215141415121315141312CHAPTER 5.
THE WALSH TRANSFORM AND ITS RELATIVES9:10:11:17 16 19 1818 19 16 1719 18 17 1621 20 23 2222 23 20 2123 22 21 2025 24 27 2626 27 24 2527 26 25 2429 28 31 3030 31 28 2931 30 29 2812:13:14:15:2021222316171819282930312425262721202322222320212322212017161918181916171918171629283130303128293130292825242726262724258727262524An scheme similar to that of the weighted convolution can be obtained viawalsh_wal_dif2_core(f, ldn);walsh_wal_dif2_core(g, ldn);ulong n = (1UL<<ldn);for (ulong i=0,j=n-1; i<j; --j,++i)walsh_wal_dit2_core(g, ldn);fht_mul(f[i], f[j], g[i], g[j], 0.5);where fht_mul is the operation used for the convolution with fast Hartley transforms ([FXT: fht mul ininclude/fhtmulsqr.h]):static inline voidfht_mul(double xi, double xj, double &yi, double &yj, double v){double h1p = xi, h1m = xj;double s1 = h1p + h1m, d1 = h1p - h1m;double h2p = yi, h2m = yj;yi = (h2p * s1 + h2m * d1) * v;yj = (h2m * s1 - h2p * d1) * v;}One gets (negative contributions to a bucket have a minus appended):+-|0:1:2:3:01234567012310322301321045675476674576548 9 10 119 8 11 1010 11 8 911 10 9 8121314154:5:6:7:45675476674576540123103223013210121314158 9 10 119 8 11 1010 11 8 911 10 9 88:9:10:11:8 9 10 119 8 11 1010 11 8 911 10 9 8121314151312151414151213151413120123-1032-2301-3210-4567-5476-6745-7654-12:13:14:15:121314158 9 10 119 8 11 1010 11 8 911 10 9 84567-5476-6745-7654-0123-1032-2301-3210-13121514141512131514131289 10 1113121514141512131514131212 13 14 15131215141415121315141312Speedup using the Gray-variantThe walsh gray()-variant and its inverse can be utilized for a faster implementation of the dyadicconvolution:template <typename Type>void dyadic_convolution(Type * restrict f, Type * restrict g, ulong ldn){walsh_gray(f, ldn);walsh_gray(g, ldn);for (ulong k=0; k<n; ++k) g[k] *= f[k];for (ulong k=0; k<n; ++k) if ( grs_negative_q(k) ) g[k] = -g[k];inverse_walsh_gray(g, ldn);}The observed saving is about 25 percent for large arrays:CHAPTER 5.
THE WALSH TRANSFORM AND ITS RELATIVES88ldn=20n=1048576 repetitions: m=5 memsize=16384 kiloBytereverse(f,n2);dt=0.0418339rel=1walsh_wak(f,ldn);dt=0.505863rel= 12.0922walsh_gray(f,ldn);dt=0.378223rel= 9.04108dyadic_convolution(f, g, ldn);dt= 1.54834rel= 37.0117 << wakdyadic_convolution(f, g, ldn);dt= 1.19474rel= 28.5436 << grayldn=21 n=2097152 repetitions: m=5 memsize=32768 kiloBytereverse(f,n2);dt=0.0838011rel=1walsh_wak(f,ldn);dt=1.07741rel= 12.8567walsh_gray(f,ldn);dt=0.796644rel= 9.50636dyadic_convolution(f, g, ldn);dt=3.28062rel= 39.1477 << wakdyadic_convolution(f, g, ldn);dt=2.49583rel= 29.7401 << grayAnalogue tables for matrix multiplicationIt may be interesting to note that the table for matrix multiplication (4x4 matrices) looks like+-|0:1:2:3:0123456789 10 1112 13 14 150123............4567............891011............12131415............4:5:6:7:....0123............4567.........
8. 9. 10. 11............12131415........8:9:10:11:........0123............4567......... 8. 9. 10. 11............12131415....12:13:14:15:............0123............4567......... 8. 9. 10. 11............12131415But when the problem is made symmetric, i.e. the second matrix is indexed in transposed order, we get:+-|0:1:2:3:0123456789 10 1112 13 14 150....0....0....04....4....4....48....8....8....812 . . .. 12 . .. . 12 ..
. . 124:5:6:7:1....1....1....15....5....5....59....9....9....913 . . .. 13 . .. . 13 .. . . 138:9:10:11:2....2....2....26....6....6....610 . . .. 10 . .. . 10 .. . . 1014 . . .. 14 . .. . 14 .. . . 1412:13:14:15:3....3....3....37....7....7....711 . . .. 11 . .. . 11 .. . . 1115 . . .. 15 . .. . 15 .. . . 15Thereby dyadic convolution can be used to compute matrix products. The ‘un-polished’ algorithm is∼ n3 · log n as with the FT (-based correlation).If the above was F · G then G · F isMultiplication of quaternionsQuaternion multiplication can be achieved in eight real multiplication using the dyadic convolution:The scheme in figure 5.2 suggests to use the dyadic convolution with bucket zero negated as a startingCHAPTER 5.
THE WALSH TRANSFORM AND ITS RELATIVES+-- 0 1 2 3|0:0 1 2 31:4 5 6 72:8 9 10 113: 12 13 14 15456789 10 11................................................0 1 2 34 5 6 78 9 10 1112 13 14 15................................0 1 2 34 5 6 78 9 10 1112 13 14 15................4:5:6:7:................8:9:10:11:................................12:13:14:15:................................+-|0:1:2:3:01230123103223013210....+-- 0|0: -01:12:23:3............8912 13 14 150 1 2 34 5 6 78 9 10 1112 13 14 151231-03223-01321-01ijk+-|0:1:2:3:10i1j2k30* 1231 -03 -2*2 -3* -0132 -1* -0Figure 5.2: Scheme for the length-4 dyadic convolution (left), same with bucket zero negated (middle) andthe multiplication table for the units of the quaternions (entries correspond to ‘row unit’ times ‘columnunit’, right).
The asterisks mark those entries where the sign is different from the scheme in the middle.point which costs 4 multiplications. Some entries have to be corrected then which costs four moremultiplications.// f[] == [ re1, i1, j1, k1 ]// g[] == [ re2, i2, j2, k2 ]c0 := f[0] * g[0]c1 := f[3] * g[2]c2 := f[1] * g[3]c3 := f[2] * g[1]// length-4 dyadic convolution:walsh(f[]);walsh(g[]);for i:=0 to 3 g[i] := (f[i] * g[i])walsh(g[]);// normalization and correction:g[0] :=2 * c0 - g[0] / 4g[1] := - 2 * c1 + g[1] / 4g[2] := - 2 * c2 + g[2] / 4g[3] := - 2 * c3 + g[3] / 4The algorithm is taken from [60] which also gives a second variant.The complex multiplication by three real multiplications corresponds to one length-2 Walsh dyadic convolution and the correction for the product of the imaginary units:// f[] == [ re1, im1 ]// g[] == [ re2, im2 ]c0 := f[1] * g[1] // == im1 * im2// length-2 dyadic convolution:{ f[0], f[1] } := { f[0]+f[1], f[0]-f[1] }{ g[0], g[1] } := { g[0]+g[1], g[0]-g[1] }g[0] := f[0] * g[0]g[1] := f[1] * g[1]CHAPTER 5.
THE WALSH TRANSFORM AND ITS RELATIVES90{ g[0], g[1] } := { g[0]+g[1], g[0]-g[1] }// normalization:f[0] := f[0] / 2g[0] := g[0] / 2// correction:g[0] := - 2 * c0 + g[0]// here: g[] == [ re1 * re2 - im1 * im2, re1 * im2 + im1 * re2 ]For complex numbers of high-precision multiplication is asymptotically equivalent to two real multiplications as one FFT based (complex linear) convolution can be used for the computation. Similarly, highprecision quaternion multiplication is as expensive as four real multiplications.5.50:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:The Walsh transform: Walsh-Paley basis[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[********************************************************************************************* ** ********************************** * ** * **** ** ****** * ** * ******** ** ****** ** ********** * * * * * * * * * * * * * ** * * * * ** * * * * * * *** ** * * *** ** * * ** * * * * * * ** * * ** * * * ** ** ** *** ** ** ** ** * * ** ** ** ** ** * ** * * ** * ** ** * * ** * * ** ** ** * * ** * * ******************** *********** **** **** ****** *** ******* **** **** ** ** ** ** *** ** *** *** **** ** ** **** **** **** ** ****** *** ***** *** ** * * * * * * ]]]* * * * * * * ]* * *]* * * * ]* * * * ]* * *]** *]* ** * ]* ** * ]** *]** * ]* * * *]* * * *]** * ]***]**** ]**** ]***]*** ]** **]** **]*** ]* ** ]* ** *]* ** *]* ** ]** *]* *** ]* *** ]** *]( 0)( 1)( 3)( 2)( 7)( 6)( 4)( 5)(15)(14)(12)(13)( 8)( 9)(11)(10)(31)(30)(28)(29)(24)(25)(27)(26)(16)(17)(19)(18)(23)(22)(20)(21)Figure 5.3: Basis functions for the Walsh transform (Walsh-Paley basis).
Asterisks denote the value +1,blank entries denote −1.A Walsh transform with a different (ordering of the) basis can be obtained bytemplate <typename Type>void walsh_pal(Type *f, ulong ldn){const ulong n = 1UL<<ldn;revbin_permute(f, n);walsh_wak(f, ldn);// =^=//walsh_wak(f, ldn);//revbin_permute(f, n);}CHAPTER 5. THE WALSH TRANSFORM AND ITS RELATIVES91[FXT: walsh pal in walsh/walshpal.h] Actually one can also revbin_transform before the transform.That is,Wp=Wk R = R Wk(5.28)G Wp G = G−1 Wp G−1Z Wp Z = Z −1 Wp Z −1(5.29)One has for WpWp==(5.30)A function that computes the k-th base function of the transform istemplate <typename Type>void walsh_pal_basefunc(Type *f, ulong n, ulong k){k = revbin(k, ld(n));for (ulong i=0; i<n; ++i){ulong x = i & k;x = parity(x);f[i] = ( 0==x ? +1 : -1 );}}[FXT: walsh pal basefunc in walsh/walshbasefunc.h]5.6Sequency ordered Walsh transformsA term analogue to the frequency of the Fourier basis functions is the so called sequency of the Walshfunctions, the number1 of the changes of sign of the individual functions.