MIT 22 Data ow analysis (798428)
Текст из файла
CS 4120 Lecture 22Dataflow analysis17 October 2011Lecturer: Andrew Myers1 Control flow graphs for program analysisWe have seen that we can easily convert the lowered IR representation into a control flow graph (CFG) inwhich each node n has one of the following contents:Node contentsAn assignment x := eA memory store [e1 ] := e2A conditional branch if eA start node START.An exit node EXIT.Lowered IR equivalentMOVE(TEMP(x), e)MOVE(MEM(e1 ), e2 )CJUMP(e, e2 )——Expressions e can be any of the following, except that function calls can appear only at top level:e ::= e1 OP e2| f (e1 , .
. . en )| [e]OP ::= + | − | ∗ | / | mod | lshift | rshift | . . .Other nodes such as LABEL and JUMP are represented by the graph structure. (Assuming that wecan identify all the possible values of e as nodes in the CFG, we can treat a node JUMP(e) where e is acomplex expression as a fancy kind of if node in which there are any number of exit edges.) It is handy fordescribing some analyses to have distinguished START and EXIT nodes, though often these are omittedwhen drawing CFGs.A node has zero or more entry edges and either one or two exit edges; only if has more than one exitedge, as depicted in Figure 1.x=y[x+4]=0x=y+1y = 2*x(if nodesonly)if x<ya basic blockone CFG nodeFigure 1: CFG nodes1We will often find that program CFGs have linear sequences of several nodes where each node exceptthe first has a single entry edge, and each node except that last has a single exit edge.
Such a sequenceof nodes is a basic block. It is possible to do program analysis over a CFG of basic blocks instead of overindividual nodes. This complicates the analysis, but gives the same result. For some analyses, it is possibleto speed up analysis somewhat by performing it at the granularity of basic blocks.2 Live variable analysis: recapIn a dataflow analysis, we associate information with each program point, where program points are pointsin between the execution of nodes: at the beginning and end of each node. We write in(n) to represent theinformation at the program point before the node is executed, and out(n) to represent the information justafter it is executed.For live variable analysis, the information associated with each program points is the set of live variables: a set containing each variable whose value might be used before the value of the variables is changed.2.1Dataflow equationsA variable is live on exit from a node if it is live on entry to any successor node .
And it is live on entryto the node if it is used by the node, or if it is live on exit from the node and this node does not redefinethe value of the variable. These observations are captured by the following dataflow equations. Note that todenote that node n0 is a successor of n, we write either n0 ∈ succ(n), or n0 n.in(n) = use(n) ∪ (out(n) − def (n))[out(n) =in(n0 )n0 nThe functions use(n) and def (n) are defined according to the following table, where use(e) refers to theset of variables used in the expression e:nx=e[e1 ] = e2if eSTARTEXITuse(n)def (n)use(e)xuse(e1 ) ∪ use(e2 )∅use(e)∅∅∅∅ or rv∅3 Worklist algorithmHow do we solve the dataflow equations? A first step is to substitute the definition of out(n) into that ofin(n), eliminating half the equations:[in(n) = use(n) ∪ (in(n0 ) − def (n))n0 nIf we want to use this dataflow equation to define the value of in(n), it is clear that the informationis flowing backward along the arrows in the CFG.
For this reason, we say that live variable analysis is abackward analysis.The usual way to solve dataflow equations is the worklist algorithm, which uses a FIFO queue called theworklist to keep track of nodes whose equations might not be satisfied at any given step. The algorithm isas follows for a backward analysis such as live variable analysis:1. Initialize the worklist to contain all nodes.22. Initialize the value of in(n) to some initial value (for live variable analysis, ∅).3. While the worklist contains some node n:• Remove n from the worklist.• Set the value of in(n) using the dataflow equation. For live variable analysis:[in(n) := use(n) ∪ (in(n0 ) − def (n))n0 n• Push all predecessors of n onto the worklist.Now let’s look at why this algorithm works.
Each node has its own dataflow equation, so for N totalnodes, there are N dataflow equations. The worklist algorithm simply chooses to apply these equationsiteratively in a particular order to update the value of in(n), until convergence. What we haven’t explainedis why this order of application ends in the correct result.3.1MonotonicityThe dataflow equation for a given node n has an interesting monotonicity property: adding more elementsto in(n0 ) for its successor nodes n can only add elements to (or have no effect on) the value of in(n) according to the equation.
Think about an iteration of the algorithm that updates in(n). There was some previousiteration that updated in(n) (which might be the original initialization to ∅). Suppose all changes that haveoccurred to the values of in(n0 ) for successor nodes n0 have been to add elements. In that case this update,if it has any effect, must also add elements.Clearly the very first iteration of the worklist algorithm can only add elements (or have no effect), sincein(n) = ∅. Therefore the second iteration of the algorithm can also only add elements, since all prioriterations have only added elements.
Inductively, we can see that every iteration of the worklist algorithm,when using the live variable analysis dataflow equation, can only add elements.Therefore a given set in(n) only increases in size during the execution of the worklist algorithm.3.2ComplexityThere is a finite number of variables (call it V ), and the set in(n) can only grow, so the maximum numberof times it can change during execution of the algorithm is V . How many times can the main loop of thealgorithm execute? Once for each time a node is pushed onto the worklist. How many times can a node bepushed onto the worklist? Once at the beginning, and once for each change to one of its successor nodes.Since a node has at most two successors, this means at most 2V + 1 pushes. Therefore the running time isO(V N ), or O(N 2 ).
With some reasonable assumptions about the structure of control flow graphs for realprograms, this can be lowered to O(dN ) where d corresponds to the loop nesting depth of the code, and istypically no more than 4 or so.3.3CorrectnessWe’ve just seen that the algorithm must terminate. If it does terminate, we would like to know that all thedataflow equations are satisfied. To see this, notice that the worklist algorithm maintains a loop invariant:every node that is not on the worklist has its equation satisfied.
Clearly this invariant holds at the beginningbecause all nodes are on the worklist. Each time that a node n is removed from the worklist, it is because itsequation is satisfied by updating in(n). And each time this is done, the nodes whose equations might havebecome unsatisfied (the predecessors) are pushed onto the worklist.Therefore, when the worklist is empty, all dataflow equations are satisfied.34 Available copies analysisYou may be suspecting that we can generalize the live variable dataflow analysis into a general frameworkfor dataflow analysis. Before we try to do that, let’s look at another dataflow analysis so we can identifywhat is in common. We will consider an analysis called available copies, which keeps track of variable copies.This is useful for doing copy propagation optimizations.
It is really a special case of a more general analysiscalled available expressions, which we will see later.4.1Copy propagationThe idea of this optimization is that we replace variables with other variables known to contain the sameinformation. This means we aren’t wasting registers on redundant information. For example, in the codebelow, the variables x and y hold the same value after the assignment. Assuming there are no assignmentsto either variable between that point and the assignment to z, we can replace the use of x with y, as shownon the right. If that transformation means the variable x is no longer live, the assignment x=y becomes deadcode and can be removed.x = y...=⇒z = 2*x + 14.2x = y...z = 2*y + 1Dataflow valuesThe information associated with each program point will be a set of equalities known to hold betweendifferent variables.
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.