Arndt - Algorithms for Programmers (523138), страница 15

Файл №523138 Arndt - Algorithms for Programmers (Arndt - Algorithms for Programmers) 15 страницаArndt - Algorithms for Programmers (523138) страница 152013-09-152013-09-15СтудИзба

Arndt - Algorithms for Programmers

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 15)

The operation count of the CRT implementation as given above is significantlybetter than that of a straight forward implementation.8 cf.extended Euclidean algorithmCHAPTER 4. NUMBER THEORETIC TRANSFORMS (NTTS)4.678A modular multiplication techniqueWhen implementing a mod class on a 32 bit machine the following trick can be useful: It allows easymultiplication of two integers a, b modulo m even if the product a · b does not fit into a machine integer(that is assumed to have some maximal value z − 1, z = 2k ).Let hxiy denote x modulo y, bxc denote the integer part of x. For 0 ≤ a, b < m:º¹a·ba·b =· m + ha · bimmrearranging and taking both sides modulo z > m:¿¹ºÀa·ba·b−·mmz=(4.24)hha · bim iz(4.25)ºÀ Àa·b·mmz z(4.26)where the rhs. equals ha · bim because m < z.¿ha · bim=¿¹ha · biz −the expression on the rhs. can be translated into a few lines of C-code. The code given here assumes thatone has 64 bit integer types int64 (signed) and uint64 (unsigned) and a floating point type with 64 bitmantissa, float64 (typically long double).uint64 mul_mod(uint64 a, uint64 b, uint64 m){uint64 y = (uint64)((float64)a*(float64)b/m+(float64)1/2); // floor(a*b/m)y = y * m;// m*floor(a*b/m) mod zuint64 x = a * b;// a*b mod zuint64 r = x - y;// a*b mod z - m*floor(a*b/m) mod zif ( (int64)r < 0 ) // normalization needed ?{r = r + m;y = y - 1;// (a*b)/m quotient, omit line if not needed}return r;// (a*b)%m remnant}It uses the fact that integer multiplication computes the least significant bits of the result ha · biz whereasfloat multiplication computes the most significant bits of the result.

The above routine works if 0 <=a, b < m < 263 = z2 . The normalization is not necessary if m < 262 = z4 .When working with a fixed modulus the division by p may be replaced by a multiplication with theinverse modulus, that only needs to be computed once:Precompute:float64 i = (float64)1/m;and replace the linebyuint64 y = (uint64)((float64)a*(float64)b/m+(float64)1/2);uint64 y = (uint64)((float64)a*(float64)b*i+(float64)1/2);so any division inside the routine avoided. But beware, the routine then cannot be used for m >= 262 :it very rarely fails for moduli of more than 62 bits. This is due to the additional error when invertingand multiplying as compared to dividing alone.This trick is ascribed to Peter Montgomery.TBD: montgomery mult.CHAPTER 4.

NUMBER THEORETIC TRANSFORMS (NTTS)4.779Number theoretic Hartley transform *Let r be an element of order n, i.e. rn = 1 (but there is no k < n so that rk = 1) we like to identify rwith exp(2 i π/n).Then one can set2πn2πi sinncos≡≡r2 + 12rr2 − 12r(4.27)(4.28)For This choice of sin and cos the relations exp() = cos() + i sin() and sin()2 + cos()2 = 1 should hold.2x2 −1The first check is trivial: x2+1x + 2 x = x.

The second is also easy if we allow to write i for some element222122(x +1) −(x −1)x −1 22that is the square root of −1: ( x2+1= 1. Ok, but what is i in the modularx ) +( 2xi ) =4 x2ring? Simply rn/4 , then we have i2 = −1 and i4 = 1 as we are used to. This is only true in cyclic rings.Chapter 5The Walsh transform and itsrelativesHow to make a Walsh transform out of your FFT:‘Replace exp(something) by 1, done.’TBD: complex Walsh transform5.1The Walsh transform: Walsh-Kronecker basisRemoving all exp(something) from the radix-2, decimation in time Fourier transform we getvoid slow_walsh_wak_dit2(double *f, ulong ldn)// (this routine has a problem){ulong n = (1<<(ulong)ldn);for (ulong ldm=1; ldm<=ldn; ++ldm){const ulong m = (1<<ldm);const ulong mh = (m>>1);for (ulong j=0; j<mh; ++j){for (ulong r=0; r<n; r+=m){const ulong t1 = r+j;const ulong t2 = t1+mh;double v = f[t2];double u = f[t1];f[t1] = u+v;f[t2] = u-v;}}}}The transform involves proportional n log2 (n) additions (and subtractions) and no multiplication at all.The transform is its own inverse, so there is nothing like the is in the FFT procedures here.As the slow in the name shall suggest, the implementation as a problem as given.

The memory accessis highly non-local. Let’s make a slight improvement: Here we just took the code 1.4 and threw away alltrigonometric computations (and multiplications). But the swapping of the inner loops, that we did forthe FFT in order to save trigonometric computations is now of no advantage anymore. So we try thispiece of code:template <typename Type>void walsh_wak_dit2(Type *f, ulong ldn)80CHAPTER 5. THE WALSH TRANSFORM AND ITS RELATIVES0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[*[** * * * ***** ** ** * ******* * * * ***** ** ** * ******* * * * ***** ** ** * ******* * * * ***** ** ** * ******* * ***** *** ** * **** ***** *** * ***** *** ** * **** ***** *** * * * ***** ** ** * ********* ** **** ** ** * ***** * ******* ** *************** * ***** *** ** * ***** ** * **** * ***** ** *** * * * ***** ** ** * ******* * * * ***** ** ** * ******** **** ** ******* ** ** * * *** *** * * ** ***** ** **** ** ** * * *** *** * * ** ****** * ***** *** ** * **** **** ** ***** ** * *** *** * * ** *** * * * ***** ** ** * ********* ** ****** *** ** * * ** **** ** **** ** ** * * ** *** * **** * * * **** ** *****************81* *]* ]]*]]*]* *]* ]]*]* *]* ]* *]* ]]*]]*]* *]* ]* *]* ]]*]* *]* ]]*]]*]* *]* ]Figure 5.1: Basis functions for the Walsh transform (Walsh-Kronecker basis).

Asterisks denote thevalue +1, blank entries denote −1.// transform wrt. to walsh-kronecker basis (wak-functions)// decimation in time (DIT) algorithm{ulong n = (1UL<<ldn);for (ulong ldm=1; ldm<=ldn; ++ldm){const ulong m = (1UL<<ldm);const ulong mh = (m>>1);for (ulong r=0; r<n; r+=m){ulong t1 = r;ulong t2 = r+mh;for (ulong j=0; j<mh; ++j, ++t1, ++t2){Type u = f[t1];Type v = f[t2];f[t1] = u + v;f[t2] = u - v;}}}}[FXT: walsh wak dit2 in walsh/walshwak.h]Which performance impact can this innocent change in the code have? For large n it gave a speedup bya factor of more than three when run on a computer with a main memory clock of 66 Megahertz and a5.5 times higher CPU clock of 366 Megahertz.

[FXT: walsh wak dit2 in walsh/walshwak.h]The equivalent code for the decimation in frequency algorithm isCHAPTER 5. THE WALSH TRANSFORM AND ITS RELATIVES82template <typename Type>void walsh_wak_dif2(Type *f, ulong ldn){const ulong n = (1UL<<ldn);for (ulong ldm=ldn; ldm>=1; --ldm){const ulong m = (1UL<<ldm);const ulong mh = (m>>1);for (ulong r=0; r<n; r+=m){ulong t1 = r;ulong t2 = r+mh;for (ulong j=0; j<mh; ++j, ++t1, ++t2){Type u = f[t1];Type v = f[t2];f[t1] = u + v;f[t2] = u - v;}}}}[FXT: walsh wak dif2 in walsh/walshwak.h]The Walsh transform of integer input is integral, cf.

section 6.6.A function that computes the k-th base function of the transform istemplate <typename Type>void walsh_wak_basefunc(Type *f, ulong n, ulong k){for (ulong i=0; i<n; ++i){ulong x = i & k;x = parity(x);f[i] = ( 0==x ? +1 : -1 );}}[FXT: walsh wak basefunc in walsh/walshbasefunc.h]5.2The Kronecker productThe length-2 Walsh transform can be seen to be equivalent to the multiplication of a 2-component vectorby the matrix·¸+1 +1W2 =(5.1)+1 −1The length-4 Walsh transform corresponds toW4=+1 +1 +1+1+1 +1 +1−1 +1 −1 +1 −1 −1 −1 −1 +1(5.2)One might be tempted to write·W4=+W2+W2+W2−W2¸(5.3)This idea can indeed be turned into a well-defined notation which turns out to be quite powerful whendealing with orthogonal transforms and their fast algorithms.CHAPTER 5.

THE WALSH TRANSFORM AND ITS RELATIVESLet A be an m × n-matrixA=a0,0a1,0...a0,1a1,1...······a0,n−1a1,n−1...am−1,0am−1,1···am−1,n−183then the Kronecker product (or Tensor product) with a matrix B isa0,0 Ba0,1 B···a0,n−1 B a1,0 BaB···a1,n−1 B1,1A ⊗ B := .........am−1,0 B am−1,1 B · · · am−1,n−1 B(5.4)(5.5)There is no restriction on the dimensions of B.

If B is a r × s-matrix then the dimensions of the givenKronecker product is mr × ns and ck+ir,l+js = ai,j bk,l .For a scalar factor α the relations(αA) ⊗ B = α(A ⊗ B)A ⊗(αB) = α(A ⊗ B)(5.6)(5.7)should be immediate.The Kronecker product is not commutative, that is, A ⊗ B 6= B ⊗ A in general.

The following relationsare the same as for the ordinary matrix product. Bilinearity (the matrices left and right from a plus signmust be of the same dimensions):(A + B) ⊗ C = A ⊗ C + B ⊗ CA ⊗(B + C) = A ⊗ B + A ⊗ C(5.8)(5.9)Associativity:A ⊗(B ⊗ C)=(A ⊗ B) ⊗ C(5.10)The matrix product (indicated by a dot) of Kronecker products can be rewritten as(A ⊗ B) · (C ⊗ D) = (A · C) ⊗(B · D)(L1 ⊗ R1 ) · (L2 ⊗ R2 ) · .

. . · (Ln ⊗ Rn ) = (L1 · L2 · . . . · Ln ) ⊗(R1 · R2 · . . . · Rn )(A · B) ⊗(C · D) = (A ⊗ C) · (B ⊗ D)(L1 · R1 ) ⊗(L2 · R2 ) ⊗ . . . ⊗(Ln · Rn ) = (L1 ⊗ L2 ⊗ . . . ⊗ Ln ) · (R1 ⊗ R2 ⊗ . . . ⊗ Rn )(5.11)(5.12)(5.13)(5.14)Here the matrices left and right from a dot must be compatible for ordinary matrix multiplication.One has(A ⊗ B)T−1(A ⊗ B)==AT ⊗ B TA−1⊗B−1Back to the Walsh transform, we have W1 = [1] and for n = 2k , n > 1:¸·+Wn/2 +Wn/2= W2 ⊗ Wn/2Wn =+Wn/2 −Wn/2(5.15)(5.16)(5.17)In order to see that this relation is the statement of a fast algorithm split the (to be transformed) vector xinto halves·¸x0x =(5.18)x1CHAPTER 5. THE WALSH TRANSFORM AND ITS RELATIVES84and write·Wn x =Wn/2 x0 + Wn/2 x1Wn/2 x0 − Wn/2 x1¸·=Wn/2 (x0 + x1 )Wn/2 (x0 − x1 )¸(5.19)That is, a length-n transform can be computed by two length-n/2 transforms of the sum and differenceof the first and second half of x.Now when A = B in relation 5.16 we have (A ⊗ A)−1 = A−1 ⊗ A−1 , (A ⊗ A ⊗ A)−1A−1 ⊗ A−1 ⊗ A−1 and so on.

Using a notation equivalent to the sum (or product) sign this isÃ nO!−1AnO=k=1A−1=(5.20)k=1Using the notation for the Walsh transformlog2 (n)WnO=W2(5.21)W2−1(5.22)k=1(where the empty product equals [1] = W1 ) andlog2 (n)Wn−1O=k=1The latter relation isn’t that exciting as W2−1 = W2 , however, it obviously also holds when the inversetransform is different from the forward transform. Thereby, given a fast algorithm for some transform inform of a Kronecker product, the fast algorithm for the backward transform is immediate.The direct sum of two matrices is defined as·A⊕B:=A00B¸(5.23)We havenMA:=In ⊗ A(5.24)k=1where In is the n × n-identity matrix.5.3Computing the Walsh transform fasterAll operations necessary for the Walsh transform are cheap: loads, stores, additions and subtractions.The memory access pattern is a major concern with direct mapped cache, as we have verified comparingthe first two implementations in this chapter.

Характеристики

Тип файла

PDF-файл

Размер

1,52 Mb

Материал

Arndt - Algorithms for Programmers

Тип материала

Книга

Предмет

Численные методы

Учебное заведение

Неизвестно

Список файлов книги

arndt-algorithms-for-programmers-1304897859-1379237746.rar

Arndt - Algorithms for Programmers.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.