1994. Compiler Transformations for High-Perforamce Computing, страница 10

PDF-файл 1994. Compiler Transformations for High-Perforamce Computing, страница 10 Конструирование компиляторов (53101): Статья - 7 семестр1994. Compiler Transformations for High-Perforamce Computing: Конструирование компиляторов - PDF, страница 10 (53101) - СтудИзба2019-09-182019-09-18zzyxelСтудИзба

1994. Compiler Transformations for High-Perforamce Computing399

Описание файла

PDF-файл из архива "1994. Compiler Transformations for High-Perforamce Computing", который расположен в категории "". Всё это находится в предмете "конструирование компиляторов" из 7 семестр, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 10 страницы из PDF

Thuspaddingcan be insertedbetween columnsof an array (intraarraypadding),or between arrays (interarraypadding).CompilerA furtherperformanceartifact,calledcache miss J“amming,can occur on machines that allow processingto continueduringa cache miss: if cache misses arespread nonuniformlyacross the loop iterations,the asynchronyof the processorwillnot be exploited,and performancewillbe reduced.Jammingtypicallyoccurs whenseveralarraysare accessedwith the same stride and when all havethe same alignmentrelativeto cache lineboundaries(thatis, the same low address bits).

Bacon et al. [1994]describecache miss jammingin detailand present a unifiedframeworkfor interandintraarraypaddingto handleset conflicts and jamming.The disadvantagesof paddingare thatit increasesmemoryconsumptionandmakesthesubscriptcalculationsforoperationsover the wholearraymorecomplex,since the array has “holes.”Inparticular,paddingreduces the benefitsof loop collapsing(see Section 6.3.4).6.5.2 Scalar ExpansionLoops oftencontainvariablesthatareused as temporarieswithinthe loop body.Such variableswill create an antidependence S’z ‘~) S1 from one iterationto thenext, and will have no other loop-carrieddependence.Allocatingone temporaryfor each iterationremovesthe dependence and makes the loop a candidateforparallelization[Paduaet al.

1980; Wolfe1989b],as shownin Figure32. If thefinal value of c is used after the loop, cmust be assigned the value of T[n].Scalarexpansionis a fundamentaltechniquefor vectorizingcompilers,andwas performedby the BurroughsScientific Processor[Kuckand Stokes1982]and Cray-1 [Russell1978] compilers.Analternativefor parallelmachinesis touse privatevariables,where each processor has its own instanceof the variable;these may be introducedby the compiler(see Section 7.1.3) or, if the languagesuPportsprivatevariables,bytheprogrammer.If the compilervectorizesor parallelizes a loop, scalar expansionmust beTransformationsdoi=l,●nc =a[i]endb[i]= a[i]+ cdo(a) originalrealloopT[n]doalli=l,T[i]n= b [i]a[i]end377= a[i]do+ T[i]all(b)Figure 32.afterscalarScalarexpansionexpansion.performedforanycompiler-generatedtemporariesin a loop.

To avoid creatingunnecessarilylarge temporaryarrays,avectorizingcompilercan performscalarexpansionafter strip mining,expandingthe temporaryto the size of the vectorstrip.Scalar expansioncan also increaseinstruction-levelparallelismby removingdependence.6.5.3Array ContractionAftertransformationof a loop nest, itmay be possibleto contractscalarsorarraysthathavepreviouslybeenexpanded.It may also be possibleto contract other arrays due to interchangeorthe use of redundantstorageallocationby the programmer[Wolfe 1989b].If the iterationvariableof the pth loopin a loop nest is being used to index thek th dimensionof an arrayx, then dimensionk may be removedfrom x if (1)loop p is not parallel,(2) all distancevectorsV involvingx have VP = O, and(3) x is not used subsequently(that is, xis dead after the loop).

The lattertwoconditionsaretrueforcompilerexpandedvariablesunless the loop structure of the programwas changedafterexpansion.In particular,loop distributioncan inhibitarraycontractionbycausingthesecondconditionto beviolated.ACMComputingSurveys,Vol. 26, No.

4, Decsmber1994378David0realT[n,F. Baconet al.don]idodoi=l,doallj=l,end=endnT[i,j]= a[i,j]*3b[i,j]= T[i,j]dol,nji,ntotalnend=end+ b[i,j]/T[i,[i](a)originalloopnestdodocodei=dorealjl,ndoi=l,totalndoallj=l,end= a[i,j]doendnT[j]b[i,=[i]l,nT = T + a[i,T[n]endj]*3= T[j][i]= Tdo(b)+ b[i,j]doafterscalarreplacementj]/T[j]Figure 34.allScalarreplacement.do(b)afterFigure 33.arrayArraycontractioncontraction.Contractionreducestheamountofstorage consumedby compiler-generatedtemporaries,as wellas reducingthenumberof cache lines referenced.Othermethodsfor reducingstorageconsumption by temporariesare strip mining(seeSection6.2.4) and dynamicallocationoftemporaries,either from the heap or froma staticblockof memoryreservedfortemporaries.Scalar ReplacementEven when it is not possibleto contractan arrayinto a scalar,a similaroptimizationcan be performedwhen a frequentlyreferencedarrayelementisinvariantwithinthe innermostloop orloom.

In this case. the arrav elementcanbe loadedinto a scalar (anfi presumablythereforea register)before the inner loopand, if it is modified,storedaftertheinner loop [Callahanet al. 1990].ReplacementmultipliesQ for the array elementby the numberof iterationsin the inner loop(s). It can also eliminateunnecessarysubscriptcalculations,although that optimizationis often done byloop-invariantcode motion(see SectionACMj]doj]T = total6.5.4+ a[i,all(a) originalend= total[ildoComputmgSurveys,Vol. 26, No, 4, December19946.1.3).

Loop interchangecan be used toenableor improvescalarreplacement;Carr [1993] examinesthe combinationofscalar replacement,interchange,and unroll-and-jamin thecontextof cacheoptimization.An exampleof scalarreplacementisshown in Figure34; for a discussionofthe interactionbetweenreplacementandloop interchange,see Section 6,2.1.6.5.5Code CollocationCode collocationimprovesmemoryaccessbehaviorby placingrelatedcode in closeproximity.The earliestwork rearrangedcode (often at the granularityof a procedure) to improvepaging behavior[F’errari1976; Hatfieldand Gerald1971].Morerecentstrategiesfocus on improvingcache behaviorby placingthemost frequentsuccessor to a basic block(or the most frequentcallee of a procedure) immediatelyadjacentto it in instructionmemory[Hwu and Chang 1989;Pettis and Hansen1990].An estimateis made of the frequencywith whicheach arc in the controlflowgraph will be traversedduringprogramexecution(using either profilinginformation or static estimates).Proceduresaregroupedtogetherusinga greedyalgorithmthat always takes the pair of pro-Compilercedures(or proceduregroups)withthelargest numberof calls betweenthem.Withina procedure,basic blocks can begroupedin the same way (althoughthedirectionof the controlflowmustbetaken into account),or a top-downalgorithmcan be used that startsfrom theprocedureentry node.

Basic blocks with afrequencyestimateof zero can be movedto a separatepage to increaselocalityfurther.However,accessingthatpagemay requirelong displacementjumpstobe introduced(see the next subsection),creatingthe potentialfor performanceloss if the basic blocks in questionareactuallyexecuted.Procedureinlining(see Section6.8.5)can also affect code locality,and has beenstudiedboth in conjunctionwith[Hwuand Chang1989]and independentof[McFarling1991] code positioning.Inlining improvesperformanceoften by reducing overheadand increasinglocality,butif a procedureis called more than once ina loop, inliningwilloften increasethenumberof cache misses because the procedurebody willbe loadedmore thanonce.6.5.6DkplacementMinimizationThe target of a branchor a jump is usually specified relativeto the currentvalueof the programcounter(PC).

The largestoffset that can be specifiedvaries amongarchitectures;it can be as few as 4 bits. Ifcontrolis transferredtoa locationoutside of the range of the offset, a multiinstructionsequenceor long-formatinstructionis requiredto performthe jump.Forinstance,theS-DLXinstructionBEQZ R4, error is only legal if error iswithin215 bytes. Otherwise,the instruction must be replacedwith:BNEZLILUIR4,R8,R8,JRR8centerrorerror>>16; reversed test;get low bits;get high bits;jump to targetcent:This sequencerequiresthreeextrastructions.Giventhe cost of longindis-Transformations●379the code shouldbeplacementjumps,organizedto keep relatedsectionsclosetogetherin memory,in particularthosesectionsexecutedmostfrequently[Szymanski19781.Displacementminimizationcan also beappliedto data.

For instance,a base register may be allocatedfor a Fortrancommon block or group of blocks:common /big / q, r, x[20000],y, zIf the arrayx containsword-sizedelements, the commonblock is largerthanthe amountof memoryindexableby theoffset field in the load instruction(216bytes on S-DLX).To addressy and z,multiple-instructionsequencesmustbeused in a manneranalogousto the longjumpsequencesabove. The problemisavoided if the layout of big is:common /big / q, r, y, z, x[20000]6.6 PartialEvaluationPartialevaluationrefers to the generaltechniqueof performingpart of a computation at compile time. Most of the classical optimizationsbasedon data-flowanalysisare either a form of partialevaluationor of redundancyelimination(describedin Sectiondata-flowoptimizationsSection 6.1.6.7).

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.