1994. Compiler Transformations for High-Perforamce Computing (798436), страница 9

Файл №798436 1994. Compiler Transformations for High-Perforamce Computing (1994. Compiler Transformations for High-Perforamce Computing) 9 страница1994. Compiler Transformations for High-Perforamce Computing (798436) страница 92019-09-182019-09-18СтудИзба

1994. Compiler Transformations for High-Perforamce Computing

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 9)

Baconet al.ns=s+a[i]enddo(a)TS[1:64]TIdo=1,n,64= TS[1:64]+ a[TI:TI+63]doTI= 1,s = sendloop= 0.0TS[1:64]endreductionTS [64]realdoa sum64+ TSITI]do(b) looptransformedFigure 29.forReductionis describedby Chen and Kuck [19751, byKuck[1977;1978],and Wolfe[1989b].Blelloch[1989]describescompilationstrategiesfor recognizingand exploitingparallelprefix operations.The compilercan recognizeand convertotheridiomsthatarespeciallysupportedby hardwareor software.Forinstance,the CMAXFortranpreprocessor convertsloops implementingvectorand matrixoperationsinto calls to assembly-codedBLAS (Basic LinearAlgebraSubroutines)[SabotandWholey1993]. The VAXFortrancompilerconverts string copy loops into block transferinstructions[Harrisand Hobbs 1994].vectorizationrecognition.6.4.3 Array Statement ScalarizationWhen a loop is expressedin array notation, the compilercan eitherconvertitinto vector operationsor scalarizeit intomanticsand the programmer’sintent,asdescribedin Section2.1.

Providedthatbitwise identicalresults are not required,the partialsums can be computedinparallel.Maximumparallelismis achievedbycomputingthe reductionwitha tree:pairs of elementsare summed,then pairsof these resultsare summed,and so on.The numberof serialsteps is reducedfrom O(n) to O(log n).Operationssuch as and, or, rein, andmax are trulyassociative,and their reductioncan be parallelizedunderallcircumstances.6.4,2 Loop Idiom RecognitionParallelarchitecturesoften providespecializedhardwarethat the compilercantake advantageof. Frequently,for example, SIMDmachinessupportreductiondirectlyin the processorinterconnectionnetwork.Some parallelmachines,suchas the ConnectionMachineCM-2 [Thinking Machines1989],includehardwarenot only for reductionbut for parallelprefix operations,allowinga loop with abody of the form a[il = a[i – 1] + a[i] tobe parallelized.The parallelizationof amore generalclass of linearrecurrencesACM Computmg Surveys, Vol 26, No.

4, December 1994oneormoreHowever,pletelyrequiresperformedhandsideand“obvious”(butrial(b),formreasonis thatelementoperationmentsare30showsnotationTheto se-eachas ifThetheyinelementgeneralsolutionisT andwritesthewerethetotobacktwobementtherefusedtoassignmentis noloop,thetemporary,to thefused,dependencewhereoriginalS1aa sepa-valueslegallyele-introducehaveFigure30(c).Thethenbe eliminatedcanallincor-previousa, as showninraryarraycanloopstheincre-is incrementedof thearraybyelement.happenthat(c).Figure30(b)is not cortheoriginalcode,everyvalueloopitsconversionupdatedtemporarya(a),conversiona correctsimultaneously;version,by thement.whenonbeforeperformed.previoustoperformedtheright-of a is to be incrementedof thebeon theFigureincorrect)andcomarraycomputedarraythatinvaluerateininnotsubexpressionwereareexamplecomputationrectthevalueeveryside198!2b].isthatassignmentsTheTherect[Wolfebecauseas if everyleft-handanyloopsconversionstraightforwardnotationtheserialtheintotempoif thenamely,S2 ‘~) SIistheandarray.S2inassignistheCompilera[2:n-1]= a[2:n-1](a) initialdoi= 2,a[i]endarrayenddo2+ a[i-1]incorrect= 2,T[i]1●Parallelism.Vectormachinesoftendividememoryintobanks,allowingvector registersto be loaded in a parallel or pipelinedfashion.Superscalarmachinesoftensupportdoubleorquadwordload and store instructions;0WorkingSet Size.If all the memorvelemen~saccessed insideof a loop d;not fit in the data cache, then itemsthat will be accessed in later iterationsmay be flushed,decreasingQc.

If morevariablesare simultaneouslylive thanthereare availableregisters(thatis,the registerpressureis high),thenloads and stores will have to spill values into memory,decreasingQ. If morepages are accessed in a loop than thereare entriesin the TLB, then the TLBmay thrash.scalarization+ a[i-1]= 2,n-1= T[i]do(c) correcti= n-1,2,= a[i]a[i]endReuse, denoted by Q and Qc, the ratioof uses of an item to the numberoftimes it is loaded (describedin Section3);n-1= a[i.]iend(d)0doa[i]doexpressiondoiscalarization-1+ a[i-1]doreversingeliminatesbothneeda[2:n-11= a[2:n-il(e) array expressionFigure 30.Arrayloopsallowsfusionfor temporaryarray+ a[i:n-2]requiringstatementandT+ a[3:n]a temporaryscalarization.In thiscase thereis an antidependence, butitcanbe removedbyreversingthe loops, enablingfusion,andeliminating the temporary,as shown in Figure30(d).However,thearraylanguagestatementinFigure 30(e) requiresatemporary since an antidependenceexists regardless of the directionof the loop.6.5 Memory375n-1= a[i](b)do“As a result,optimizationof the use ofthe memorysystem has become steadilymore important.Factorsaffectingmemory performanceinclude:+ a[l:n-2]languageTransformationsAccessTransformationsHigh-performanceapplicationsareasfrequentlymemorylimitedas they arecomputelimited.In fact, for the last fifteen years CPU speeds have doubledevery threeto five years,whileDRAMspeeds have doubledaboutonce everydecade (DRAMs,or dynamicrandomaccess memorychips,are the low-cost,low-powermemorychips used for mainmemoryin most computers).Since the registersare the top of thememoryhierarchy,efficientregisterusage is absolutelycrucialto high performance.

Untilthe late seventies,registerallocationwas consideredthe single mostimportantproblemin compileroptimization for whichtherewas no adequatesolution.The introductionof techniquesbased on graphcoloring[Chaitin1982;Chaitinet al. 1981; Chow and Hennessy1990] yielded very efficientglobal (withina procedure)registerallocation.Mostcompilersnow make use of some variantof graphcoloringintheirregisterallocator.In general,memoryoptimizationsareinhibitedby low-levelcoding that relieson particularstoragelayouts.FortranEQUIVALENCEstatementsare the mostobvious example.C and Fortran77 bothspecify the storage layout for each type ofdeclaration.Programmersrelyingon aparticularstoragelayoutmay defeatawide range of importantoptimization.ACMComputingSurveys,Vol.

26, No 4, December1994376●DavidF. Baconet al.Optimizationscoveredin othersections that also can improvememorysystem performanceare loop interchange(6.2.1), loop tiling(6.2.6), loop unrolling(6.2.8),and various(6.3. 1), 100P fusionoptimizationthateliminateregistersaves at procedurecalls (6.8).realdoa[8,512]i= 1,512a[i,i]= a[l,endi]+ cdo(a) originalrealcodea[9,512]6.5.1 Array PaddingPaddingis a transformationwherebyunused data locationsare insertedbetweenthe columnsof an arrayor betweenarrays.

Paddingis used to ameliorateanumberof memorysystemconflicts,inparticular:- bank conflictson vectorbanked memory[BurnettJr. 1970];. cacheset or TLB[Bacon* false sharingof cachememorymultiprocessors.lineset al. 1994];on shared-Bank conflictson vector machineslikethe Crayand our hypotheticalV-DLXcan be caused when the programindexesthroughan arraydimensionthat is notlaid out contiguouslyin memory,leadingto a nonunitstride whose value we willdefine to be s. If s is an exact multipleofthe numberof banks (B), the bandwidthof the memorysystem will be reduced bya factor of B (8, on V-DLX)because allmemoryaccesses are to the same bank.The numberof memorybanks is usuallya power of two.In general,if an array will be accessedwith stride s, the array should be paddedby the smallestp such thatgcd(s +p, B) = 1.

This will ensurethatB successive accesses with stride s + p will alladdressdifferentbanks.An exampleisshown in Figure3 l(a): the loop accessesmemorywith stride 8, so all memoryreferences will be to the first bank. Afterpadding,successiveiterationsaccessmemorywith stride 9, so they go to successive banks, as shown in Figure3 l(b).Cachedmemorysystems,especiallythose thatare set-associative,are lessACMComputmgSurveysjVoli=a[l,end1,i]512= a[l,i]+ cdo(b)paddinga eliminatesconfbctsFigure 31.memorybankon V-DLXArraypaddingmachineswithand Coffman,set conflicts;o cache miss jamminganddo26, No 4, December1994sensitiveto lowpower-of-twostrides.However,large power-of-twostrides willcause extremelypoor performancedue tocache set and TLBset conflicts.Theproblemis that addressesare mappedtosets simply by using a range of bits fromthe middleof the address.For instance,on S-DLX,the low 6 bits of an addressare the byte offset withinthe line, andthe next 8 bits are the cache set.

Figure12 shows the effect of stride on the performanceof a cache-basedsuperscalarmachine.A numberof researchershave notedthatotherstridesyieldreducedcacheperformance.Bailey[1992]has quantified this by notingthatbecausemanywords fit into a cache line, if the numberof sets times the numberof words per setis almostexactlydivisibleby the stride(say by n f e), then for a limitednumberof referencesr, every nth memoryreference will map to the same set, If [ r/n]isgreaterthantheassociativityof thecache, then the later referenceswill evictthe lines loaded by the earlier references,precludingreuse.Set and bank conflictscan be causedby a bad stride over a single array, or bya loop that accesses multiplearrays thatall align to the same set or bank.

Характеристики

Тип файла

PDF-файл

Размер

6,02 Mb

Материал

1994. Compiler Transformations for High-Perforamce Computing

Тип материала

Статья

Предмет

Конструирование компиляторов

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов статьи

1994.-compiler-transformations-for-high-perforamce-computing.rar

1994. Compiler Transformations for High-Perforamce Computing.pdf

Прочти меня!!!.txt

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.