1994. Compiler Transformations for High-Perforamce Computing (798436), страница 17

Файл №798436 1994. Compiler Transformations for High-Perforamce Computing (1994. Compiler Transformations for High-Perforamce Computing) 17 страница1994. Compiler Transformations for High-Perforamce Computing (798436) страница 172019-09-182019-09-18СтудИзба

1994. Compiler Transformations for High-Perforamce Computing

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 17)

The compilerfocuses on computations that updatearrayelementsbasedon their neighbors,It finds the patternofneighborsthat are needed to computeagivenelement,calledthe stencil.Thecross, a commonstencil,representsacomputationthat updates each array element using the value of its neighborstothe north,east, west, and south. Stencilsareaggregatedintolargerpatterns,usingoptimizationsormultistencils,analogousto loop unrollingand stripmining.The multistencilsare mappedto thehardwareso as to minimizecommunication.SIMD compilersdevelopedat Compass[Knobe and Natarajan1993; Knobe et al.1988; 1990] constructa preferencegraphbasedon the computationsbeingperformed.The graph representsalignmentrequeststhatwouldyieldtheleast communicationoverhead.The compilersatisfies every request when possible,using agreedy algorithmto find a solutionwhenpreferencesare in conflict.ACMComputmgSurveys,Vol26,No4, December1994402●DavidF.

Baconet al.Wholey[1992] begins by computingadetailedcost functionfor the computation. The functionis the inputto a hillclimbingalgorithmthatsearchesthespace of possibleallocationsof data toprocessors.It begins with two processors,then four,and so forth;each timethenumberof processorsis doubled,the algorithmuses the most efficientallocationfromthe previousstage as a startingpoint.In additionto general-purposemappingalgorithms,the compilercan usespecific transformationslike loop flattening [von Hanxledenand Kennedy1992]to avoididle timedue to nonuniformcomputations.7.6 VLIW TransformationsThe Very Long InstructionWord (VLIW)architectureis anotherstrategyfor executinginstructionsin parallel.A VLIWprocessor[Colwellet al.

1988; Fisheretal. 1984; FloatingPointSystems1979;Rau et al. 1989]exposesits multiplefunctionalunits explicitly;each machineinstructionis very large (on the order of512 bits) and controlsmany independentfunctionalunits.Typicallythe instruction is dividedinto 32-bit pieces, each ofwhich is routedto one of the functionalunits.Withtraditionalprocessors,thereisa cleardistinctionbetweenlow-levelschedulingdone via instructionselectionand high-levelschedulingbetweenprocessors.Thisdistinctionis blurredinVLIWarchitectures.Theyrelyon thecompilerto expose the paralleloperationsexplicitlyand schedulethematcompiletime. Thus the task of generating objectcode incorporateshigh-levelresourceallocationand schedulingdecisions that occur only betweenprocessorsona moreconventionalparallelarchitecture.Tracescheduling[Ellis1986; Fisher1981; Fisheret al.

1984] is an analysisstrategythatwas developedfor VLIWarchitectures.BecauseVLIWmachinesrequiremore parallelismthanis typically availablewithina basic block, com-ACMComputingSurveys,Vol26,No.4, December1994pilersforthesemachinesmustlookthroughbranchesto find additionalinstructionsto execute. The multiplepathsof controlrequirethe introductionofspeculativeexecution—computingsomeresultsthat may turn out to be unused(ideallyat no additionalcost, by functionalunitsthat wouldotherwisehavegone unused).The speculationcan be handleddynamicallyby the hardware[ Sohi andVajapayem1990], but thatcomplicatesthe design significantly.The trace-scheduling alternativeis to have the compilerdo it by identif~ngpaths(or traces)throughthecontrol-flowgraphthatmightbe taken.The compilerguesseswhich way the branchesare likelyto goand hoists code above them accordingly.Superblockscheduling[Hwu et al.

19931is an extensionof trace schedulingthatrelies on programprofileinformationtochoose the traces.8. TRANSFORMATIONFRAMEWORKSGiventhe manytransformationsthatcompilerwritershave available,they facea dauntingtask in determiningwhichones to apply and in what order. There isno single best order of application;onetransformationcan permitor preventasecondfrombeingapplied,or it canchangethe effectivenessof subsequentchanges.Currentcompilersgenerallyuse a combinationof heuristicsand apartiallyfixedorderofapplyingtransformations.There are two basic approachesto thisproblem:uniffingthe transformationsina single mechanism,and applyingsearchtechniquesto the transformationspace.In fact there is often a degree of overlapbetweenthese two approaches.8.1 UnifiedTransformationA promisingstrategyis to encode boththecharacteristicsof thecode beingtransformedand the effect of each transformation;then the compilercan quicklysearch the space of possible sets of transformationsto find an efficientsolution.CompilerOne frameworkthat is being activelyinvestigatedis based on unimodularmatrix theory [Banerjee1991; Wolf and Lam1991].

It is applicableto any loop nestwhose dependencecan be described witha distancevector;a subset of the loopsthat requirea directionvector can alsobe handled.The transformationsthat aunimodularmatrixcan describeare interchange,reversal,and skew.The basic principleis to encode eachtransformationof the loop in a matrixand apply it to the dependencevectors ofthe loop. The effect on the dependencepatternof applyingthe transformationcan be determinedby multiplyingthematrixand the vector.

The form of theproduct vector reveals whetherthe transformationis valid. To model the effect ofapplyinga sequenceof transformations,the correspondingmatricesare simplymultiplied.Figure60 shows a loop nest that canbe transformedwith unimodularmatrices. The distancevectorthatdescribesthe loop is D = (1, O), representingthedependenceof iterationi on i – 1 in theouter loop.Becauseof the dependence,it is notlegal to reverse the outer loop of the nest.The reversaltransformationisR=–::.The product[1isHRD=P1=‘:.This demonstratesthat the transformation is not legal, because PI is not lexicographicallypositive.We can also test whetherthe two loopscan be interchanged;theinterchangetransformationthatto Disyields1 =1~ ~ .

Applying[.P2 = ~ . In this case,[1the resultingvector is lexicographicallypositiveshowingthat the transformationis legal.Transformationsdoi403●=2,10doj= 1,a[i, j]end doend doFigure 60.10= a[i-l,Unimodularjl+ a[i, jltransformationexample.Any loop nest whose dependenceareall representableby a distance vector canbe transformedvia skewing into a canonical form called a fully permutableloopnest. In this form, any two loops in thenest can be interchangedwithoutchanging the loop semantics.Once in thiscanonicalform, the compilercan decompose the loop into the granularitythatmatches the target architecture[Wolf andLam 1991].Sarkar and Thekkath[1992] describe aframeworkfor transformingperfect loopnests that includesunimodulartransformations,tiling,coalescing,and parallelloop execution.The transformationsareencodedin an orderedsequence.Rulesare providedfor mappingthe dependencevectors and loop bounds,and the transformationto the loop body is describedbya template.Pugh[1991]describesa moreambitious(andtime-consuming)techniquethatcan transformimperfectlynestedloops and can do most of the transformations possiblethrougha combinationofstatementreordering,interchange,fusion, skewing,reversal,distribution,andparallelization.It views the transformation problemas that of findingthe bestschedule for a set of operationsin a loopnest.

A methodis given for generatingand testingcandidateschedules.8.2 Searchingthe TransformationSpaceWang[1991]and Wangand Gannon[1989]describea parallelizationsystemthatuses heuristicsearchtechniquesintelligenceto finda profrom artificialgram transformationsequence.The target machineis representedby a set offeaturesthat describe the type, size, andspeed of the processors,memory,and in-ACMComputingSurveys,Vol26,No4, December1994404“DavidF. Baconet al.terconnect.The heuristicsare organizedhierarchically.The mainfunctionsare:descriptionof parallelismin the programand in the machine;matchingof programparallelismto machineparallelism;andcontrol of restructuring.9.

COMPILEREVALUATIONResearchersare still tryingto find a goodway to evaluatethe effectivenessof compilers. There is no generallyagreed uponway to determinethe best possibleperformanceof a particularprogramon aparticularmachine,so it is difficulttodeterminehow well a compileris doing.Since some applicationsare better structuredthan othersfor a given architecture or a given compiler,measurementsfor a particularapplicationor groupofapplicationswill not necessarilypredicthow well anotherapplicationwill fare.Nevertheless,a wide varietyof measurementstudiesdo exist thatseek toevaluatehow applicationsbehave,howwell they are beingcompiled,and howwell they could be compiled.We havedivided these studies into several groups.9.1 BenchmarksBenchmarkshave receivedby far themost attentionsince they measuredelivered performance,yieldingresultsthatare used to marketmachines.They wereoriginallydevelopedto measuremachinespeed,not compilereffectiveness.TheLivermoreLoops [McMahon1986] is oneof the early benchmarksuites; it soughtto compare the performanceof supercomputers.

The suite consists of a set of smallloops based on the most time-consuminginner loops of scientificcodes.TheSPECbenchmarksuite[Dixit1992] includesboth scientificand general-purposeapplicationsintendedto berepresentativeof an engineering/scientific workload.The SPEC benchmarksarewidely used as indicatorsof machineperformance,but are essentiallyuniprocessor benchmarks.As architectureshavebecomemorecomplex,it has become obvious that theACMComputmgSurveys,Vol26, No 4, December1994benchmarksmeasure the combinedeffectivenessof the compilerand the targetmachine.Thus two compilersthat targetthe same machinecan be comparedbyusing the SPEC ratingsof the generatedcode.For parallelmachines,the contributionof the compileris even more important,since the differencebetweennaiveandoptimizedcode can be manyordersofmagnitude.ParallelbenchmarksincludeSPLASH(StanfordParallelApplicationsfor SharedMemory)[Singhet al.

1992]and the NASANumericalAerodynamicSimulation(NAS)benchmarks[Baileyet al. 1991].The Perfect Club [Berryet al. 1989] isa benchmarksuite of computationallyintensive programsthat is intendedto helpevaluateserial and parallelmachines.9.2 Code CharacteristicsA numberof studies have focused on theapplicationsthemselves,examiningtheirsourcecode for programmeridiomsorprofilingthe behaviorof the compiledexecutable.Knuth[1971] carriedout an early andinfluentialstudy of Fortranprograms.Hestudied 440 programscomprising250,000lines (punchedcards). The most important effect of this study was to dramatizethe fact that the majorityof the execution time of a programis usuallyspent ina very small proportionof the code. Otherinterestingstatisticsare that 9570 of allthe do loops incrementedtheirindexvariableby 1, and 4070 of all do loopscontainedonly one statement.Shen et al.

[1990]examinedthe subscript expressionsthat appearedin a setof Fortranmathematicallibrariesandscientificapplications.They applied various dependencetests to theseexpressions. The resultsdemonstratehow thetests compareto one another,showingthat one of the biggest problemswas unknownvariables.These variableswerecaused by procedurecalls and by the useof an arrayelementas an index valueintoanotherarray.Coupledsubscriptsalso caused problemsfor tests that exam-Compilerine a single arraydimensionat a time(coupledsubscriptsarediscussedinSection 5.4).A study by Petersenand Padua [1993]evaluatesapproximatedependencetestsagainst the omega test [Pugh 1992], finding that the latterdoes not expose substantiallymore parallelismin the PerfectClub benchmarks.9.3 CompilerEffectivenessAs we mentionedabove, researchersfindit difficultto evaluatehow well a compiler is doing.

Характеристики

Тип файла

PDF-файл

Размер

6,02 Mb

Материал

1994. Compiler Transformations for High-Perforamce Computing

Тип материала

Статья

Предмет

Конструирование компиляторов

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов статьи

1994.-compiler-transformations-for-high-perforamce-computing.rar

1994. Compiler Transformations for High-Perforamce Computing.pdf

Прочти меня!!!.txt

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.