MIT 28 More loop optimization (798433)
Текст из файла
CS 4120 Lecture 28More loop optimizations2 November 2011Lecturer: Andrew Myers1Bounds check removalIn type-safe languages, accesses to array elements generally incur a bounds checks. Before accessing a[i],the language implementation must ensure that i is a legal index. For example, in Java array indices start atzero, so the language must test that 0 ≤ i < a.length.Returning to our example from earlier, after strength reduction we can expect the code to look more likethe following (more likely, the equivalent CFG):s = a + 3*i;while (i < a.length) {j = s;if (i < 0 | i ≥ a.length) goto Lerr ;[j] = [j] + 1;i = i + 2;s = s + 6;}This extra branch inside the loop is likely to add overhead. Furthermore, it prevents the inductionvariable elimination operation just discussed.One simple improvement we can make is to implement the check 0 ≤ i < n in a single test.
Assumingthat n is a signed positive integer, and i is a signed integer, this test can be implemented by doing anunsigned comparison i<n. If i is negative, it will look like a large unsigned integer that will fail the unsignedcomparison, as desired. Processor architectures have a unsigned comparison mode that supports this.For example, the jae instruction (“jump above or equal”) on the Intel architecture implements unsignedcomparison.Even better would be to eliminate the test entirely. The key insight is that the loop guard (in this case,i < a.length) often ensures that the bounds check succeeds. If this can be determined statically, the boundscheck can be removed. If it can be tested dynamically, the loop can be split into two versions, a fast versionthat does not do the bounds check and a slow one that does.The bounds-check elimination works under the following conditions:1.
Induction variable j has a test (j < u) vs. a loop-invariant expression u, where the loop is exited if thetest failed.2. Induction variable k in the same family as j has a test equivalent to k < n vs. a loop-invariant expressionn, where j < u implies k < n, again exiting the loop on failure. The test on k must be dominated by thetest on j, and k and j must go in the same direction (both increase or both decrease).Under these conditions, the bound checks on k is superfluous and can be eliminated.
But when does j < uimply k < n? Suppose that j = hi, a j , b j i and k = hi, ak , bk i. If the j test succeeds, then a j i + b j < u. Without lossof generality, assume a j > 0. Then this implies that i < (u − b j )/a j . Therefore, k = ak i + bk < ak (u − b j )/a j + bk .If we can show statically or dynamically that this right-hand side is less than or equal to n, then we knowk < n. So the goal is to show that ak (u − b j )/a j + bk ≤ n. This can be done either at compile time or by hoistinga test before the loop. In our example, the test for i < a.length is that 1 ∗ (a.length − 0)/1 + 0 ≤ a.length,which can be determined statically.
The compiler does still need to insert a test that i ≥ 0 before the loop:1s = a + 3*i;if (i < 0) goto Lerr ;while (i < a.length) {j = s;[j] = [j] + 1;i = i + 2;s = s + 6;}After linear-function test replacement, this code example will become subject to induction variableelimination.2Loop unrollingLoop guards and induction variable updates add significant overhead to short loops. The cost of loopguards involving induction variables can often be reduced by unrolling loops to form multiple consecutivecopies. Multiple loop guard tests can be then be combined into a single conservative test. If the loop isunrolled to make n copies, and the loop guard has the form i < u where u is a loop-invariant expression andi is an induction variable with stride c that is updated at most once per loop iteration, the test i < c(n − 1)conservatively ensures that all n loop copies can be run without having any guard fail.Since this guard is conservative, there still may be < n iterations left to be performed.
So the unrolledloop has to be followed by another copy of the original loop, the loop epilogue, which performs any remainingiterations one by one. Since loop unrolling therefore results in n + 1 copies of the original loop, it trades offcode size for speed. If the original loop was only going to be executed for a small number of iterations, loopunrolling could hurt performance rather than improve it.Updates to basic linear induction variables inside the unrolled loop can also be combined. If a variablei is updated with the statement i = i + c, then n copies of the update can be replaced with a single updatei = i + nc.
However, any uses of i within the copies of the loop must be changed as well – in the second loopcopy, to i + c, in the third copy (if any), to i + 2c, and so on. These additions can often be folded into existingarithmetic computations.3Redundancy eliminationAn optimization aims at redundancy elimination if it prevents the same computation from being done twice.4Local value numberingLocal value numbering is a redundancy elimination optimization. It is called a “local” optimization becauseit is performed on a single basic block. It can as easily be applied to an extended basic block (EBB), which isa sequence of nodes in which some nodes are allowed to have exit edges, as depicted in Figure 1. Moregenerally, it can be applied to any tree-like subgraph in which all nodes have only one predecessor andthere is a single node that dominates all other nodes.
Each path through such a subgraph forms an EBB.The idea of value numbering is to label each computed expression with a distinct identifier (a “number”),such that recomputations of the same expression are assigned the same number. If an expression is labeledwith the same number as a variable, the variable can be used in place of the expression. Value numberingoverlaps but does not completely subsume constant propagation and common subexpression elimination.For example, every expression and variable in the following code has been given a number, written as asubscript. Expressions with the same number always compute the same value.2Figure 1: Extended basic blocka2j1b2i2ifd3= (i1 + 5)2= i1= 5 + j1= i1 + 5c4 = (i2 + 1)3 goto L1= (i2 + 1)3To do this numbering, we start at the beginning of the EBB and assign numbers to expressions as theyare encountered.
As expressions are assigned numbers, the analysis remembers the mapping betweenthat expression, expressed in terms of the value numbers it uses, and the new value number. If the sameexpression is seen again, the same number is assigned to it.In the example, the variable i is given number 1. Then the sum i+1 is given number 2, and the analysisrecords that the expression (value(1) + 1) is the same as value(2).
The assignment to a means that variablehas value 2 at that program point. Assignment j=i puts value 1 into j, so j is also given number 1 at thatprogram point. The expression 5 + j computes (value(1) + 1), so it is given number 2. This means we canreplace 5+j with a, eliminating a redundant computation.After doing the analysis, we look for value numbers that appear multiple times. An expression that usesa value computed previously is replaced with a variable with the same value number.
If no such variableexists, a new variable is introduced at the first time the value was computed, to be used to replace lateroccurrences of the value.For example, the above code can be optimized as follows:a2j1b2i2t3ifd3= (i1 + 5)2= i1= a2= a2= (a2 + 1)3c4 = t3 goto L1= t3The mapping from previously computed expressions to value numbers can be maintained implicitlyby generating value numbers with a strong hash function. For example, we could generate the actualrepresentation of value 2 by hashing the string “plus(v1, const(1))”, where v1 represents the hash valueassigned to value 1. Note that to make sure that we get the same value number for the computation 5+j,3it will be necessary to order the arguments to commutative operators (e.g., +) in some canonical ordering(e.g., sorting in dictionary order).There are global versions of value numbering that operate on a general CFG.
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.