1994. Compiler Transformations for High-Perforamce Computing (798436), страница 6

Файл №798436 1994. Compiler Transformations for High-Perforamce Computing (1994. Compiler Transformations for High-Perforamce Computing) 6 страница1994. Compiler Transformations for High-Perforamce Computing (798436) страница 62019-09-182019-09-18СтудИзба

1994. Compiler Transformations for High-Perforamce Computing

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 6)

Note that,as withloop-invariantcode motion,if there is any chance thatthe conditionevaluationwillcause anexception,it must be guardedby a testthat the loop will be executed.In a loop nest where the inner loop hasunknownbounds,if code is generatedstraightforwardlythere will be a test before the body of the inner loop to determine whetherit shouldbe executedatall. The test for the innerloop will berepeatedevery time the outer loop is executed.

When the compileruses an intermediaterepresentationof the programthat exposes the test explicitly,unswitching can be appliedto move the test outside of the outer loop. The RS/6000XLC/Fortrancompileruses unswitchingforthis purpose [0’Brienet al. 1990].6.2moregeneraltermreferringto anytransfo~mationthatmoves a- comput~tion to an earlierpoint in the program[Aho et al. 1986].

Whileloop-invariantcode motionis one particularlycommonformof hoisting,thereare others:anexpressioncan be evaluatedearliertoreduce registerpressureor avoid arithmetic unit latency,or a load instructionmightbe movedupwardto reducetheeffect of memoryaccess latency.TransformationsLoopRerxderingIn this section we describetransformations thatchangethe relativeorder ofexecutionof the iterationsof a loop nestor nests. These transformationsare primarilyused to expose parallelismandimprovememorylocality.Some compilersonly apply reorderingtransformationsto perfect loop nests (seeSection 6.2.7). To increase the opportunities for optimization,such a compilercansometimesapply loop distributionto extract perfect loop nests from an imperfectnesting.The compilerdetermineswhetheraloop can be executed in parallelby examiningthe loop-carrieddependence.Theobvious case is when all the dependencedistancesfor the loop are O (direction=), meaningthatthereare no dependencecarriedacross iterationsby theloop.

Figuren(a) is an example;the distance vector for the loop is (O, 1), so theouter loop is parallelizable.ACM ComputingSurveys,Vol. 26, No. 4, December1994362Daviddoi=l,j]= a[i,+ cdo ia[*]1024* stride,= a[i]stride+ cend dodoi=i,loop is parallelizable(a) loop withndoj=l,I etride\ CacheMissesna[i,j]= a[i-l,j]+ a[i-l,j+l]dovaryingITLBstrideIMissesRelative\Speed (%)16421002128483do(b)innerloopDependenceis parallelizable.conditionsforparallelizingloops.More generally,the pth loop in a loopnest is parallelizableif for every distancevector V=(vl,. .

..vP. v~), v~),up =ov3q<p:vq>o.In Figure1 l(b),thedistancevectorsare {(1, 0), (1, – l)}, so the innerloop isparallelizable.Bothreferenceson theright-handside of the expressionreadelementsof a from row i — 1, which waswrittenduringthe previousiterationofthe outer loop. Thereforethe elementsofrow i may be calculatedand writteninany order.Loop interchangeexchangesthe positionof two loops in a perfect loop nest, generally movingone of the outer loops to theinnermostposition[Allenand Kennedy1984; 1987; Wolfe 1989b]. Interchangeisone of the most powerfultransformationsand can improveperformancein manyways.Loop interchangemay be performedto:eenablevectorizationan inner, dependentindependentloop;*improvevectorizationby movingtheindependentloop with the largest rangeinto the innermostposition;ebyimproveparallelperformanceing an in-depende-ntloop outwa~dComputingSurveys,by interchangingloop with an outer,Volmov-in a26, No 4, December1994tI116 I64 I1024 I1024 I32 I128 I2319256 I102451212512 I102410248(b) effect of strideFigure 12.

Predictedeffect of stride on performance of an IBM RS/6000for the above loop.Array elements are double precision (8 bytes): missrates are per 1024 iterations.Beyond stride 16,TLB misses dominate [IBM 1992],nest to increase the granularityofeach iterationand reduce the numberof barriersynchronizations;loop. reduce●6.2.1 Loop InterchangeACM= 1,a[i](a) outerFigure Il.j-1]dodoendprecisiondoublena[i,endendet al.ndoj=2,endF. Baconstride,ideallyto stride1; andincreasethe numberof loop-invariantexpressionsin the inner loop.Care must be taken that these benefitsdo not canceleach otherout, For instance,an interchangethatimprovesregisterreuse may change a stride-1access patternto a stride-naccess patternwith much lower overall performancedueto increasedcache misses.Figure12demonstratesthe dramaticeffect of different strides on an IBM RS /6000.In Figure13(a), the inner l’oop accessesarray a with stride n (recall that we areassumingthe Fortranconventionof column-major storageorder).Byinterchangingthe loops, we convert the innerloopto stride-1access,as showninFigure13(b).For a large array in whichless thanone columnfits in the cache, this opti-Compilerdo i= l,ndo jdoi=2,= l,ntotalTransformationsdo j[i]= total[il+ a[i,jlnj]= a[i-l,j+i]end doend doend do(a) original(a)loop nestJJdo j= l,ndo i = i,ntotal363= 1, n-1a[i,end do●[i]123123= total[il+ a[i,jlend doenddo(b) interchangedFigure 13.loop nestLoop interchange,(b)(c)Figure 14.

Original loop (a); original traversalder (b); traversal order after interchange (c).mizationreducesthe numberof cachemisses on a from n 2 to n 2 X elementsizeilinesize,or n 2/ 16 with4-byte elementsand the 64-byte lines of S-DLX. However,the originalloopallowstotal [i] to beeliminating theplacedin a register,load\ store operationsin the innerloop(see scalar replacementin Section 6.5.4).So the optimizedversionincreasesthenumberof load/storeoperationsfor totalfrom 2n to 2n 2. If a fits in the cache, theoriginalloop is better.On a vectorarchitecture,the transformedloopenablesvectorizationbyeliminatingthe dependenceon total [i] inthe inner loop.Interchangingloops is legal when thealtereddependenceare legal and whenthe loop boundscan be switched.If twoloops p and q in a perfect loop nest of dloops are interchanged,each dependencevectorV=(ul,.

. ..up. vq, ,,u~). .,u~)intheoriginalloopnestbecomesV’ =(vi,...,u~,UP,,,v~)..,v~)in the transformedloop nest. If V’ is lexicographicallypositive,thenthedependencerelationshipsof the originalloop are satisfied.A loop nest of just two loops can beinterchangedunless it has a dependencevector of the form ( < , >). Figure14(a)shows the loop nest with the dependence(1, – 1), givingrise to the loop-carrieddependenceshown in Figure14(b). Theorderin whichthe iterationsare exe-or-cutedis shownby thedottedline.The traversalorder after interchangeisshownin Figure14(c): some iterationsare executedbefore iterationsthat theydepend on, so the interchangeis illegal.Switchingthe loop bounds is straightforwardwhen the iterationspace is rectangular,as in the loop nest in Figure13.In this case the bounds of the inner loopare independentof the indices of the containingloop, and the two can simplybeexchanged.When the iterationspace isnot rectangular,computingthe bounds ismore complex.Triangularspaces are often used by programmers,and trapezoidalspacesare introducedby loopskewing(discussedin the next section).Furthertechniquesarenecessarytomanageimperfectlynestedloops.

Someof the variationsare discussedin detailby Wolfe[ 1989b]and Wolfand Lam[1991].6.2.2 Loop SkewingLoop skewingis an enablingtransformation that is primarilyuseful in combinationwithloopinterchange[Lamport1974;Muraoka1971;Wolfe1989b].Skewingwas inventedto handlewavefront computations,so called because theupdatesto the arraypropagatelike awave across the iterationspace.ACMComputingSurveys,Vol. 26, No.

4, December1994364“doI=David2,F. Baconet al.dex variablesto compensate,skewingdoes not change the meaningof the program and is always legal.The originalloop nest in Figure15(a)can be interchanged,but neitherloop canbe parallelized,because there is a dependence on both the inner loop (O, 1) and onthe outer loop (1, O).The result when f = 1 is shown in Figure15(c–d).The transformedcode isequivalentto the original,but the effecton the iterationspace is to alignthediagonalwave frontsof the originalloopnest so that for a given valueof j, alliterationsin i can be executed in parallel.To expose this parallelism,the skewedloop nest must also be interchanged.After skew and interchange,the loop nesthas distancevectors{(1, O), (1, 1)}. Thefirst dependenceallows the inner loop tobe parallelizedbecause the corresponding dependencedistanceis O.

The seconddependenceallowsthe innerloop to beparallelizedbecause it is a dependenceon previousiterationsof the outer loop.Skewingcan expose parallelismfor anest of two loops witha set of distancevectors {Vk } only ifn-1do J = 2, m-1a[l, Jl = (a[~-t,Jl+ a[i,j-11+a[l+l,jl+ a[l,J+ll)/4end doend do(a) ongmalcode(b) originaldo1 = 2,do Idependencespace{(1, O), (O, 1)}(c) skewed spacen-1= i+2,a[l,~-11i+m-1= (a[l-i,a[l+l,j-d+ a[i,j-1-ii+j-11+ a[l,J+l-ll)/4end doend do(d) skewed codedoI= 4,{(1, 1), (O, 1)}n+n-2do 1 = max(2,a[i,dependencej-dJ-m+l),min(n-l,= (a[i-l,a[i+l,~-2)j-d+ a[l,j-1-1]+j-ii+ a[l,]+l-d)/4end doenddo(e) skewedand interchangedcodedependence{(1,0),(1,1)}Figure 15.Loop skewing.In Figure15(a) we showa typicalwavefrontcomputation,Each elementiscomputedby averagingits four nearestneighbors,Whileneitherof the loops isparallelizablein their originalform, eacharraydiagonal(“ wavefront”)couldbecomputedin parallel.The iterationspaceand dependenceare shownm Figure15(b), with the dotted lines indicatingthewave fronts.Skewingis performedby addingtheouterloop indexmultipliedby a skewfactor,f, to the bounds of the inner iteration variable,and then subtractingthesame quantityfrom every use of the inner iterationvariableinside the loop.

Because it alters the loop bounds but thenalters the uses of the correspondingin-ACMComputmgSurveys,Vol26, No4, December1994distanceWhen skewingby f, the originalvector (vi, vz) becomes (rJl, fvl + Pz). Forany dependencewhere Uz < 0, the goal isf suchthatful + V2 > 1. Theto findcorrect skew factor f is computedby taking the maximumof f, = [(1 – u12)/u,llover all the dependence[Kennedyet al.1993].Interchangingof skewed loops is complicatedby the fact thattheirboundsdepend on the iterationvariablesof theenclosingloops.Fortwoloopswithil = 11, U1 and iz = lZ, Uz whereboundslZ and Uz are expressionsindependentofil, the skewed inner loop has boundsizinterchange,= fi ~ + 12, fi ~ + U2.

Характеристики

Тип файла

PDF-файл

Размер

6,02 Mb

Материал

1994. Compiler Transformations for High-Perforamce Computing

Тип материала

Статья

Предмет

Конструирование компиляторов

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов статьи

1994.-compiler-transformations-for-high-perforamce-computing.rar

1994. Compiler Transformations for High-Perforamce Computing.pdf

Прочти меня!!!.txt

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.