1994. Compiler Transformations for High-Perforamce Computing, страница 15

PDF-файл 1994. Compiler Transformations for High-Perforamce Computing, страница 15 Конструирование компиляторов (53101): Статья - 7 семестр1994. Compiler Transformations for High-Perforamce Computing: Конструирование компиляторов - PDF, страница 15 (53101) - СтудИзба2019-09-182019-09-18zzyxelСтудИзба

1994. Compiler Transformations for High-Perforamce Computing399

Описание файла

PDF-файл из архива "1994. Compiler Transformations for High-Perforamce Computing", который расположен в категории "". Всё это находится в предмете "конструирование компиляторов" из 7 семестр, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 15 страницы из PDF

The transformationis safe if thereare no loop-carrieddependenceinvolv-ACMComputingSurveys,Vol. 26, No. 4, December1994396David●F. Bacondoi=l,et al.nc = b[i]a[i]endFigure 57.Scalar= a[i]+ cdovaluesuitablefor privatization.ing the scalar; before the scalar is usedin the loop body, its value is always updated withinthe same iteration.Figure57 shows a loop that could benefit from privatization.The scalar value cis simplya scratch value; if it is privatized, the loop is revealedto have inde~endentiterations.If thearravsare.allocatedto separateprocessorsbeforethe loop is executed,no furthercommunicationis necessary.In this very simpleexample the variablecould also have beeneliminatedentirely,but in more complexloops temporaryvariablesare repeatedlyupdatedand referenced.Scalarprivatizationis analogoustoscalarexpansion(see Section6.5.2); inbothcases, spuriousdependenceareeliminatedby replicatingthe variable.The transformationis widelyincorporated into parallelcompilers[Allenet al.1988b; Cytronand Ferrante1987].7.1.4 Array PrivatizationArrayprivatizationis similarto scalarprivatization;each processoris given itsown copy of the entire array.

The effect ofprivatizingan array is to reduce communicationrequirementsandto exposeadditionalparallelismby removingunnecessary dependencecaused by storagereuse.Parallelizationstudiesof real application programshave found that privatization is necessaryfor highperformance[Eigenmannet al.1991;SinghandHennessy1991]. Despiteits importance,verifyingthat privatizationof an array islegalis muchmoredifficultthantheequivalenttest on a scalar value.Traditionally,data-flowanalysis[Muchnickand Jones 1981] treats an entire array as a single value. This lack ofprecisionpreventsmost loop optimiza-ACMComputmgSurveys,Vol.

26, No. 4, December1994tiondiscussedin this survey and led tothe developmentof dependenceanalysis.Feautrier[ 1988;1991]extendsdependence analysisto supportarrayexpansion, essentiallythe same optimizationas privatization.The analysisis expensive because it requiresthe use of parametricintegerprogramming.Otherprivatizationresearchhasextendeddata-flowanalysisto consider the behavior of individualarray elements[Li 1992;Maydanet al. 1993; Tu and Padua 1993].In additionto determiningwhetherprivatizationis legal, the compilermustalso check whetherthe values that areleft in an array are used by subsequentcomputations.If so, afterthe arrayisprivatizedthe compilermust add code topreserve those final values by performinga copy-outoperationfromthe privatecopies to a global array.7.1.5Cache AlignmentOn shared-memoryprocessorslike sMX,when one processor modifiesstorage, thecache line in which it resides is flushedby any other processorsholdinga copy.The flush is necessaryto ensure propersynchronizationwhenmultipleprocessors are updatinga single sharedvariable.

If several .m-ocessors are continually .updatingdifferentvariablesor array elements that reside in the same cache line,however.the constantflushingof cachelines may severely degrade pe~formance.This problemis called fake sharing,Whenfalsesharingoccursbetween~arallellootI iterationsaccessingdiffer;ntcolumn’sof an arrayor ~ifferentstructures,falsesharingcan be addressed by aligningeach of the columnsor structuresto a cache line boundarv.False sharingof scalars can be address~dby insertingpaddingbetweenthemsothat each shared scalar resides in its owncache line (paddingis discussedin Section 6.5. 1). A furtheroptimizationis toplace sharedvariables‘and theircorrespondinglock variablesin the same cacheline, which will cause the lock acquisitionto act as a prefetchof the data [Torrellaset al.

1994],Compiler7.2ExposingCoarse-GrainedParallelismTransferringdata can be an expensiveoperationon a machinewithasynchronouslyexecutingprocessors. One wayto avoid the inefficiencyof frequentcommunicationis to divide the programintosubcomputationsthat will execute for anextendedperiod of time withoutgenerating messagetraffic.The followingoptimizationare designedto identifysuchcomputationsor to helpthe compilercreate them.7.2.1ProcedureCall ParallelizationOne potentialsourceof coarse-grainedparallelismin imperativelanguagesisthe procedurecall.

In order to determinewhethera call can be performedas anindependenttask, the compilermust useinterproceduralanalysisto examineitsbehavior.Because of its importanceto all formsof optimization,interproceduralanalysishas receiveda greatdeal of attentionfrom compilerresearchers.One approachis to propagatedata-flowand dependenceinformationthroughcall sites [Banning1979; Burkeand Cytron1986; Callahanet al.

1986; Myers1981]. Anotheris tosummarizethe array access behaviorof aprocedureand use that summaryto evaluate whethera call to it can be executedin parallel[Balasundaram1990; Callahan and Kennedy1988a;Li and Yew1988; Trioletet al, 1986].7.2.2SplitA more comprehensiveapproachto program decompositionsummarizesthe datausage behaviorof an arbitraryblock ofcode in a symbolic descriptor[Grahametal. 1993].

As in the previoussection, thesummaryidentifiesindependencebetween computations—betweenprocedurecalls or betweendifferentloop nests. Additionally,the descriptoris used in applying the split transformation.Split is unusualin that it is not applied in isolationto a computationlike aloop nest.

Instead,split considersthe relationshipof pairs of computations.Twoloop nests, for example,may have depen-Transformations*397dencebetweenthemthatpreventthecompilerfrom executingboth simultaneously on differentprocessors.However,itis often the case that only some of theiterationsconflict.Splituses memorysummarizationto identifythe iterationsin one loop nest that are independentofthe other one, placingthose iterationsina separateloop. Applyingsplit to an applicationexposesadditionalsourcesofconcurrencyand pipelining.7.2.3GraphPartitioningData-flowlanguages[Ackerman1982;McGraw1985; Nikhil1988] expose parallelismexplicitly.The programis converted into a graph that representsbasiccomputationsas nodes and the movementof dataas arcs betweennodes.Data-flowgraphs can also be used as aninternalrepresentationto expresstheparallelstructureof programswritteninimperativelanguages.Oneof themajordifficultieswithdata-flowlanguagesis that they exposeparallelismat the levelof individualarithmeticoperations.As it is impractical to use softwareto schedulesuch asmall amountof work, early projectsfocused on developingarchitecturesthatembed data-flowexecutionpoliciesintohardware[ArvindandCuller1986;Arvindet al.

1980; Dennis1980]. Suchmachineshave not proven to be successful commercially,so researchersbegan todeveloptechniquesfor compilingdataflow languageson conventionalarchitectures. The most commonapproachis tointerpretthe data-flowgraphdynamically,executinga node representingacomputationwhen all of its operandsareavailable.To reduceschedulingoverhead, the data-flowgraphis generallytransformedby gatheringmanysimplecomputationsinto larger blocks that areexecutedatomically[AndersonandHudak1990; Hudakand Goldberg1985;Mirchandaneyet al.

1988; Sarkar1989].7.3ComputationPartitioningIn additionto decomposingdata as discussed in Section 7.1, parallelcompilersACMComputingSurveys,Vol. 26, No. 4, December1994398David0F. Baconet al.must allocatethe computationsof a program to differentprocessors.Each processor will generallyexecute a restrictedset of iterationsfrom a particularloop.The transformationsin this section seekto enforcecorrectexecutionof the programwithoutintroducingunnecessaryoverhead.7.3.1 GuardIntroductiondoi=l,na[i]= a[i]+ cb[i]= b[i]+ cenddo(a)LBA=(n/Pnum)*PidUBA =(n/Pnurn)*LBB(n/Pnum)=dMXdoesnothaveaglobalname-space, so each processor’sinstanceof anarrayis separate.In Figure58(b), thecompilercould take advantageof the isolationof each processorto remapthearrayelements.Insteadof the originalarray of size n, each processorcould deeleclarea smallerarrayof n / Pnumments and modify the loop to iteratefrom1 to n / Pnum.

Althoughthe remappingwould simplifythe loop bounds,we pre-ACMComputmgSurveysjVol. 26, No. 4, December1994doi=l,(LBAa[i]if<=(LBB+ 1)and.<=i<= UBA)i<= UBB)+ ciand.= b[i]+ cparallelized=withguardsintroduced+ 1(n/Pnum)*(Piddoi=l,+ 1)nif(LBa[i]b[i]endendloop(n/Pnum)*PidUB =<=iand.i= a[i]+ c= b[i]+ c<= US)thenifdo(c)=afterguard(n/Pnum)*PidUB =combination+ 1(n/Pnum)*(Piddoi=LB,+ 1)UBa[i]= a[i]+ cb[i]= b[i]+ cend+ 1do(b)LBia[il=b[i]LB+ 1)*Pidnifendloop+ 1(Pid(n/Pnum)*(pidUBB =Aftera loop is parallelized,not all theprocessorswill computethe same iterations or send the same slices of data. Thecompilercan ensure that the correct computationsare performedby introducingconditionalstatements(knownasguards)into the program[CallahanandKennedy1988b].Figure58(a) shows an exampleof asequentialloop thatwe willtransformfor parallelexecution.Figare 58(b) showsthe parallelversionof the loop.

The codecomputesupperand lower boundsthatthe guardsuse to preventcomputationon arrayelementsthatare not storedlocallywithinthe processor.The boundsdepend on the size of the array, the number of processors,and the Pid of theprocessor.We have made a numberof simplifyingassumptions:the value of n is an evenmultipleof Pnum,and the arraysarealreadydistributedacross the processorswith a block decomposition.We have alsochosen an examplethat does not requireinterprocessorcommunication.To parallelize the loops encounteredin practice,compilersmustintroducecommunication statementsand much more complexguardandinductionvariableexpressions.originaldo(d)Figure58.afterboundsComputationreductionpartitioning.servetheoriginalelementindicestomaintaina closer relationshipbetweenthe sequentialcode and its transformedversion.7.3.2 RedundantGuardEliminationIn the same way that the compilercanuse code hoistingto optimizecomputation, it can reduce the numberof guardsCompilerby hoistingthemto the earliestpointwherethey can be correctlycomputed.Hoistingoftenrevealsthatidenticalguards have been introducedand that allbut one can be removed.In Figure58(b),the two guardexpressionsare identicalbecause the arrays have the same decomposition.When the guards are hoisted tothe beginningof the loop, the compilercan eliminatethe redundantguard.

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.