Arndt - Algorithms for Programmers (523138), страница 8

Файл №523138 Arndt - Algorithms for Programmers (Arndt - Algorithms for Programmers) 8 страницаArndt - Algorithms for Programmers (523138) страница 82013-09-152013-09-15СтудИзба

Arndt - Algorithms for Programmers

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 8)

It actually is equivalent to writing an interpreter for the language used plusthe necessary data flow analysis11 .A practical compromise is to write a program that, while theoretically not even close to a meta-generator,creates output that, after a little hand editing, is a usable generator code. The implemented perl script[FXT: file scripts/metagen.pl] is capable of converting a (highly pedantically formatted) piece of C++code12 into something that is reasonable close to a generator.Further one may want to print the current values of the loop variables inside comments at the beginningof a block. Thereby it is possible to locate the corresponding part (both wrt. file and temporal location)of a piece of generated code in the original file.

In addition one may keep the comments of the originalcode.With FFTs it is necessary to identify (‘reverse engineer’) the trigonometric values that occur in the processin terms of the corresponding argument (rational multiples of π). The actual values should be inlinedto some greater precision than actually needed, thereby one avoids the generation of multiple copies ofthe (logically) same value with differences only due to numeric inaccuracies. Printing the arguments,both as they appear and gcd-reduced, inside comments helps to understand (or further optimize) thegenerated code:doubledoubledoubledoublec1=.980785280403230449126182236134;s1=.195090322016128267848284868476;c2=.923879532511286756128183189397;s2=.382683432365089771728459984029;////////========cos(Pi*1/16)sin(Pi*1/16)cos(Pi*2/16)sin(Pi*2/16)========cos(Pi*1/16)sin(Pi*1/16)cos(Pi*1/8)sin(Pi*1/8)Automatic verification of the generated codes against the original is a mandatory part of the process.A level of abstraction for the array indices is of great use: When the print statements in the generatoremit some function of the index instead of its plain value it is easy to generate modified versions of thecode for permuted input.

That is, instead ofcout<<"sumdiff(f0, f2, g["<<k0<<"], g["<<k2<<"]);" <<endl;cout<<"sumdiff(f1, f3, g["<<k1<<"], g["<<k3<<"]);" <<endl;usecout<<"sumdiff(f0, f2, "<<idxf(g,k0)<<", "<<idxf(g,k2)<<");" <<endl;cout<<"sumdiff(f1, f3, "<<idxf(g,k1)<<", "<<idxf(g,k3)<<");" <<endl;where idxf(g, k) can be defined to print a modified (e.g.

the revbin-permuted) index k.Here is the length-8 DIF FHT core as an example of some generated code:template <typename Type>inline void fht_dit_core_8(Type *f)// unrolled version for length 8{{ // start initial loop{ // fi = 0 gi = 1Type g0, f0, f1, g1;sumdiff(f[0], f[1], f0, g0);sumdiff(f[2], f[3], f1, g1);sumdiff(f0, f1);sumdiff(g0, g1);Type s1, c1, s2, c2;sumdiff(f[4], f[5], s1, c1);sumdiff(f[6], f[7], s2, c2);sumdiff(s1, s2);sumdiff(f0, s1, f[0], f[4]);11 Ifyou know how to utilize gcc for that, please let me know.only a small subset of C++.12 ActuallyCHAPTER 1. THE FOURIER TRANSFORM38sumdiff(f1, s2, f[2], f[6]);c1 *= M_SQRT2;c2 *= M_SQRT2;sumdiff(g0, c1, f[1], f[5]);sumdiff(g1, c2, f[3], f[7]);}} // end initial loop}// ------------------------// opcount by generator: #mult=2=0.25/pt#add=22=2.75/ptThe generated codes can be of great use when one wants to spot parts of the original code that need furtheroptimization.

Especially repeated trigonometric values and unused symmetries tend to be apparent inthe unrolled code.It is a good idea to let the generator count the number of operations (e.g. multiplications, additions,load/stores) of the code it emits. Even better if those numbers are compared to the corresponding valuesfound in the compiled assembler code.It is possible to have gcc produce the assembler code with the original source interlaced (which is agreat tool with code optimization, cf.

the target asm in the FXT makefile). The necessary commands are(include- and warning flags omitted)# create assembler code:c++ -S -fverbose-asm -g -O2 test.cc -o test.s# create asm interlaced with source lines:as -alhnd test.s > test.lstAs an example the (generated)template <typename Type>inline void fht_dit_core_4(Type *f)// unrolled version for length 4{{ // start initial loop{ // fi = 0Type f0, f1, f2, f3;sumdiff(f[0], f[1], f0, f1);sumdiff(f[2], f[3], f2, f3);sumdiff(f0, f2, f[0], f[2]);sumdiff(f1, f3, f[1], f[3]);}} // end initial loop}// ------------------------// opcount by generator: #mult=0=0/pt#add=8=2/ptdefined in [FXT: file fht/shortfhtditcore.h] results, using// file test.cc:int main(){double f[4];fht_dit_core_4(f);return 0;}in (some lines deleted plus some editing for readability)11:test.cc@fht_dit_core_4(f);23:shortfhtditcore.h @ fht_dit_core_4(Type *f)24:shortfhtditcore.h @ // unrolled version for length 425:shortfhtditcore.h @ {27:shortfhtditcore.h @ { // start initial loop28:shortfhtditcore.h @{ // fi = 029:shortfhtditcore.h @Type f0, f1, f2, f3;30:shortfhtditcore.h @sumdiff(f[0], f[1], f0, f1);45:sumdiff.h @ template <typename Type>46:sumdiff.h @ static inline void47:sumdiff.h @ sumdiff(Type a, Type b, Type &s, Type &d)CHAPTER 1.

THE FOURIER TRANSFORM3948:sumdiff.h @ // {s, d} <--| {a+b, a-b}49:sumdiff.h @ { s=a+b; d=a-b; }305 0006 DD442408fldl 8(%esp)306 000a DD442410fldl 16(%esp)31:shortfhtditcore.h @sumdiff(f[2], f[3], f2, f3);319 000e DD442418fldl 24(%esp)320 0012 DD442420fldl 32(%esp)32:shortfhtditcore.h @sumdiff(f0, f2, f[0], f[2]);333 0016 D9C3fld %st(3)334 0018 D8C3fadd %st(3),%st335 001a D9C2fld %st(2)336 001c D8C2fadd %st(2),%st339 001e D9C1fld %st(1)340 0020 D8C1fadd %st(1),%st341 0022 DD5C2408fstpl 8(%esp)342 0026 DEE9fsubrp %st,%st(1)343 0028 DD5C2418fstpl 24(%esp)344 002c D9CBfxch %st(3)349 002e DEE2fsubp %st,%st(2)350 0030 DEE2fsubp %st,%st(2)353 0032 D9C0fld %st(0)354 0034 D8C2fadd %st(2),%st355 0036 DD5C2410fstpl 16(%esp)356 003a DEE1fsubp %st,%st(1)357 003c DD5C2420fstpl 32(%esp)33:shortfhtditcore.h @sumdiff(f1, f3, f[1], f[3]);Note that the assembler code is not always in sync with the corresponding source lines which is especiallytrue with higher levels of optimization.1.13Optimization considerations for fast transforms• Reduce operations: use higher radix, at least radix 4 (with high radix algorithms note that the intelx86-architecture is severely register impaired)• Mass storage FFTs: use MFA as described• Trig recursion: loss of precision (not with mod FFTs), use stable versions, use table for initial valuesof recursion.• Trig table: only for small lengths, else cache problem.• Fused routines:combine first/last (few) step(s) in transformsing/normalization/revbin/transposition etc.

e.g. revbin-squaring in convolutions,withsquar-• Use explicit last/first step with radix as high a possible• Write special versions for zero padded data (e.g. for convolutions), also write a special version ofrevbin permute for zero padded data• Integer stuff (e.g.

exact convolutions): consider NTTs but be prepared for work & disappointments• Image processing & effects: also check Walsh transform etc.• Direct mapped cache: Avoid stride-2n access (e.g. use gray-FFTs, gray-Walsh); try to achieve unitstride data access. Use the general prime factor algorithm. Improve memory locality (e.g. use thematrix Fourier algorithm (MFA))• Vectorization: SIMD versions often boost performance• For correlations/convolutions save two revbin permute (or transpose) operations by combining DIFand DIT algorithms.CHAPTER 1.

THE FOURIER TRANSFORM40• Real-valued transforms & convolution: use Hartley transform (also for computation of spectrum).Even use complex FHT for forward step in real convolution.• Reducing multiplications: Winograd FFT, mainly of theoretical interest (today the speed of multiplication is almost that of addition, often multiplies go parallel to adds)• Only general rule for big sizes: better algorithms win.• Do NOT blindly believe that some code is fast without profiling.

Statements that some code is”the fastest” are always bogus.1.14Eigenvectors of the Fourier transform *For aS := a + a, the symmetric part of a sequence a:F [F [aS ]]= aS(1.101)Now let u+ := aS + F [aS ] and u− := aS − F [aS ] thenF [u+ ]F [u− ]= F [aS ] + aS = aS + F [aS ] = +1 · u+= F [aS ] − aS = −(aS − F [aS ]) = −1 · u−(1.102)(1.103)u+ and u− are symmetric.For aA := a − a, the antisymmetric part of a we haveF [F [aA ]] = −aA(1.104)Therefore with v+ := aA + i F [aA ] and v− := aA − i F [aA ]:F [v+ ]F [v− ]==F [aA ] − i aA = −i (aA + i F [aA ]) = −i · v+F [aA ] + i aA = +i (aA − i F [aA ]) = +i · v−(1.105)(1.106)v+ and v− are antisymmetric.u+ , u− , v+ and v− are eigenvectors of the FT, with eigenvalues +1, −1, −i and +i respectively.

Theeigenvectors are pairwise perpendicular.Usinga=1(u+ + u− + v+ + v− )2(1.107)we can, for a given sequence, find a transform that is the ‘square root’ of the FT: Simply compute u+ ,u− , v+ , v− . Then for λ ∈ R one can define a transform F λ [a] asF λ [a] =¢1¡(+1)λ u+ + (−1)λ u− + (−i)λ v+ + (+i)λ v−2(1.108)F 0 [a]£ is the identity,F 1 [a] is the (usual) FT, F 1/2 [a] (which is not unique) is a transform so that¤F 1/2 F 1/2 [a] = F [a], that is, a ‘square root’ of the FT.The eigenvectors of the Hartley Transform are u+ := a + H [a] (with eigenvalue +1) and u+ := a − H [a](with eigenvalue −1).Chapter 2Convolutions2.1Definition and computation via FFTThe cyclic convolution of two sequences a and b is defined as the sequence h with elements hτ as follows:h=hτ:=a~bX(2.1)ax byx+y≡τ (mod n)The last equation may be rewritten ashτ:=n−1Xax bτ −x(2.2)x=0where negative indices τ − x must be understood as n + τ − x, it is a cyclic convolution.Code 2.1 (cyclic convolution by definition) Compute the cyclic convolution of a[] with b[] usingthe definition, result is returned in c[]procedure convolution(a[],b[],c[],n){for tau:=0 to n-1{s := 0for x:=0 to n-1{tx := tau - xif tx<0 then tx := tx + ns := s + a[x] * b[tx]}c[tau] := s}}This procedure uses (for length-n sequences a, b) proportional n2 operations, therefore it is slow for largevalues of n.

The Fourier transform provides us with a more efficient way to compute convolutions thatonly uses proportional n log(n) operations. First we have to establish the convolution property of theFourier transform:F [a ~ b] =F [a] F [b]i.e. convolution in original space is ordinary (element-wise) multiplication in Fourier space.41(2.3)CHAPTER 2. CONVOLUTIONS42Here is the proof:F [a]k F [b]kX=ax z k xXxby z k y(2.4)ywith y := τ − xXX=ax z k xbτ −x z k (τ −x)xXX=τ −xax z k x bτ −x z k (τ −x)x τ −xÃX X=τÃ "=F!xXzk τax bτ −x#!ax bτ −xxk= (F [a ~ b])kRewriting formula 2.3 asa~b= F −1 [F [a] F [b]](2.5)tells us how to proceed:Code 2.2 (cyclic convolution via FFT) Pseudo code for the cyclic convolution of two complex valuedsequences x[] and y[], result is returned in y[]:procedure fft_cyclic_convolution(x[], y[], n){complex x[0..n-1], y[0..n-1]// transform data:fft(x[], n, +1)fft(y[], n, +1)// convolution in transformed domain:for i:=0 to n-1{y[i] := y[i] * x[i]}// transform back:fft(y[], n, -1)// normalise:for i:=0 to n-1{y[i] := y[i] / n}}It is assumed that the procedure fft() does no normalization.

Характеристики

Тип файла

PDF-файл

Размер

1,52 Mb

Материал

Arndt - Algorithms for Programmers

Тип материала

Книга

Предмет

Численные методы

Учебное заведение

Неизвестно

Список файлов книги

arndt-algorithms-for-programmers-1304897859-1379237746.rar

Arndt - Algorithms for Programmers.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.