K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 101

Файл №798440 K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition)) 101 страницаK. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440) страница 1012019-09-182019-09-18СтудИзба

K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition)

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 101)

As we shall see, the tools of optimization also play a large role in the compiler’s back end. For these reasons, itis important to introduce optimization and explore some of the issues that itraises before discussing the techniques used in a compiler’s back end.8.2.1 ExamplesTo provide a focus for this discussion, we will begin by examining twoexamples in depth. The first, a simple two-dimensional array-address calculation, shows the role that knowledge and context play in the kind of codethat the compiler can produce. The second, a loop nest from the routinedmxpy in the widely-used linpack numerical library, provides insight intothe transformation process itself and into the challenges that transformedcode can present to the compiler.Improving an Array-Address CalculationConsider the ir that a compiler’s front end might generate for an array reference, such as m(i,j) in fortran.

Without specific knowledge about m,i, and j, or the surrounding context, the compiler must generate the full8.2 Background 409expression for addressing a two-dimensional array stored in column-majororder. In Chapter 7, we saw the calculation for row-major order; fortran’scolumn-major order is similar:@m + (j − low2 (m)) × (high 1 (m) − low1 (m) + 1) × w + (i − low1 (m)) × wwhere @m is the runtime address of the first element of m, lowi (m) and highi (m)are the lower and upper bounds, respectively, of m’s ith dimension, and w isthe size of an element of m. The compiler’s ability to reduce the cost of thatcomputation depends directly on its analysis of the code and the surroundingcontext.If m is a local array with lower bounds of one in each dimension and knownupper bounds, then the compiler can simplify the calculation to@m + (j − 1) × hw + (i − 1) × wwhere hw is high1 (m) × w.

If the reference occurs inside a loop where jruns from 1 to k, the compiler might use operator strength reduction toreplace the term (j − 1) × hw with a sequence j01 , j02 , j03 , . . . j0k , where0 + hw. If i is also the inductionj01 = (1 − 1) × hw = 0 and ji0 = ji−1variable of a loop running from 1 to l, then strength reduction can replace(i − 1) × w with the sequence i01 , i02 , i03 , . . . il0 , where i01 = 0 and i0j =i0j−1 + w. After these changes, the address calculation is just@m+j0 +i0The j loop must increment j0 by hw and the i loop must increment i0 by w.If the j loop is the outer loop, then the computation of @m + j0 can be movedout of the inner loop. At this point, the address computation in the inner loopcontains an add and the increment for i0 , while the outer loop contains an addand the increment for j0 .

Knowing the context around the reference to m(i,j)allows the compiler to significantly reduce the cost of array addressing.If m is an actual parameter to the procedure, then the compiler may not knowthese facts at compile time. In fact, the upper and lower bounds for m mightchange on each call to the procedure. In such cases, the compiler may beunable to simplify the address calculation as shown.Improving a Loop Nest in LINPACKAs a more dramatic example of context, consider the loop nest shownin Figure 8.1.

It is the central loop nest of the fortran version of theroutine dmxpy from the linpack numerical library. The code wraps twoloops around a single long assignment. The loop nest forms the core of aStrength reductiona transformation that rewrites a series ofoperations, for examplei ·c, (i +1)·c, . . . , (i +k)·cwith an equivalent seriesi10 , i20 , . . .

, ik0 ,0 +cwhere i10 = i ·c and ij0 = ij−1See Section 10.7.2.410 CHAPTER 8 Introduction to Optimizationsubroutine dmxpy (n1, y, n2, ldm, x, m)double precision y(*), x(*), m(ldm,*)...jmin = j+16do 60 j = jmin, n2, 16do 50 i = 1, n1y(i) = ((((((((((((((( (y(i))+ x(j-15)*m(i,j-15)) + x(j-14)*m(i,j-14))+ x(j-13)*m(i,j-13)) + x(j-12)*m(i,j-12))+ x(j-11)*m(i,j-11)) + x(j-10)*m(i,j-10))+ x(j- 9)*m(i,j- 9)) + x(j- 8)*m(i,j- 8))$$$$$$$$+ x(j- 7)*m(i,j- 7)) + x(j- 6)*m(i,j- 6))+ x(j- 5)*m(i,j- 5)) + x(j- 4)*m(i,j- 4))+ x(j- 3)*m(i,j- 3)) + x(j- 2)*m(i,j- 2))+ x(j- 1)*m(i,j- 1)) + x(j) *m(i,j)continue5060continue...endn FIGURE 8.1 Excerpt from dmxpy in LINPACK.routine to compute y + x × m, for vectors x and y and matrix m.

We willconsider the code from two different perspectives: first, the transformationsthat the author hand-applied to improve performance, and second, the challenges that the compiler faces in translating this loop nest to run efficientlyon a specific processor.Before the author hand-transformed the code, the loop nest performed thefollowing simpler version of the same computation:do 60 j = 1, n2do 50 i = 1, n1y(i) = y(i) + x(j) * m(i,j)50continue60 continueLoop unrollingThis replicates the loop body for distinctiterations and adjusts the index calculations tomatch.To improve performance, the author unrolled the outer loop, the j loop,16 times. That rewrite created 16 copies of the assignment statement withdistinct values for j, ranging from j through j-15.

It also changed theincrement on the outer loop from 1 to 16. Next, the author merged the16 assignments into a single statement, eliminating 15 occurrences ofy(i) = y(i) + · · · ; that eliminates 15 additions and most of the loads and8.2 Background 411stores of y(i). Unrolling the loop eliminates some scalar operations. It oftenimproves cache locality, as well.To handle the cases where the the array bounds are not integral multiplesof 16, the full procedure has four versions of the loop nest that precede theone shown in Figure 8.1. These “setup loops” process up to 15 columnsof m, leaving j set to a value for which n2 - j is an integral multipleof 16. The first loop handles a single column of m, corresponding to an oddn2.

The other three loop nests handle two, four and eight columns of m.This guarantees that the final loop nest, shown in Figure 8.1, can process thecolumns 16 at a time.Ideally, the compiler would automatically transform the original loop nestinto this more efficient version, or into whatever form is most appropriatefor a given target machine.

However, few compilers include all of the optimizations needed to accomplish that goal. In the case of dmxpy, the authorperformed the optimizations by hand to produce good performance across awide range of target machines and compilers.From the compiler’s perspective, mapping the loop nest shown in Figure 8.1onto the target machine presents some hard challenges.

The loop nest contains 33 distinct array-address expressions, 16 for m, 16 for x, and onefor y that it uses twice. Unless the compiler can simplify those addresscalculations, the loop will be awash in integer arithmetic.Consider the references to x. They do not change during execution of theinner loop, which varies i. The optimizer can move the address calculationsand the loads for x out of the inner loop. If it can keep the x values in registers, it can eliminate a large part of the overhead from the inner loop.

For areference such as x(j-12), the address calculation is just @x + (j − 12) × w.To further simplify matters, the compiler can refactor all 16 references tox into the form @x + jw − ck , where jw is j · w and ck is k · w for each0 ≤ k ≤ 15. In this form, each load uses the same base address, @x + jw,with a different constant offset, ck .To map this efficiently onto the target machine requires knowledge of theavailable addressing modes. If the target has the equivalent of iloc’s loadAIoperation (a register base address plus a small constant offset), then all theaccesses to x can be written to use a single induction variable. Its initial valueis @x + jmin · w. Each iteration of the j loop increments it by w.The 16 values of m used in the inner loop change on each iteration.

Thus,the inner loop must compute addresses and load 16 elements of m oneach iteration. Careful refactoring of the address expressions, combinedwith strength reduction, can reduce the overhead of accessing m. The value412 CHAPTER 8 Introduction to Optimization@m + j · high1 (m) · w can be computed in the j loop. (Notice that high1 (m) isthe only concrete dimension declared in dmxpy’s header.) The inner loop canproduce a base address by adding it to (i − 1) · w. Then, the 16 loads can usedistinct constants, ck · high1 (m), where ck is k · w for each 0 ≤ k ≤ 15.To achieve this code shape, the compiler must refactor the address expressions, perform strength reduction, recognize loop-invariant calculations andmove them out of inner loops, and choose the appropriate addressing modefor the loads.

Even with these improvements, the inner loop must perform 16loads, 16 floating-point multiplies, and 16 floating-point adds, plus one store.The resulting block will present a challenge to the instruction scheduler.If the compiler fails in some part of this transformation sequence, the resulting code might be substantially worse than the original. For example, if itcannot refactor the address expressions around a common base address for xand one for m, the code might maintain 33 distinct induction variables—onefor each distinct address expression for x, m, and y.

If the resulting demandfor registers forces the register allocator to spill, it will insert additional loadsand stores into the loop (which is already likely to be memory bound). Incases such as this one, the quality of code produced by the compiler dependson an orchestrated series of transformations that all must work; when onefails to achieve its purpose, the overall sequence may produce lower qualitycode than the user expects.8.2.2 Considerations for OptimizationIn the previous example, the programmer applied the transformations in thebelief that they would make the program run faster. The programmer hadto believe that they would preserve the meaning of the program.

Характеристики

Тип файла

PDF-файл

Размер

8,27 Mb

Материал

K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition)

Тип материала

Книга

Предмет

Конструирование компиляторов

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

k.-cooper-l.-torczon-engineering-a-compiler-2011-2nd-edition.rar

K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition).pdf

Прочти меня!!!.txt

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.