Comp 512
Spring 2011
Order from Chaos
— the big picture —
Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
Students enrolled in Comp 512 at Rice University have explicit permission to
make copies of these materials for their personal use.
Faculty from other educational institutions may use these materials for nonproft
educational purposes, provided this copyright notice is preserved.
COMP 512, Rice
University
1
Optimization
The subject is confusing
Cooper McKinley, & Torczon cite
237 distinct papers in the
survey!
• Whole notion of optimality
• Incredible number of transformations
• Odd, inconsistent terminology
Maybe this stuf is inherently hard
•
•
•
•
{
Value numbering
Redundancy elimination
Common
subexpressions
Many intractable problems
Many NP-complete problems
Much overlap between problems and between solutions
If optimization wasn’t confusing, why take COMP 512 ?
COMP 512, Rice
University
2
Optimization
A brief catalog
(circa 1982 )
And this was before the literature exploded
COMP 512, Rice
University
3
Optimization
{
The literature throws fuel on the fre
Revival
• Terminology is non-standard & non-intuitive Partially-dead code
• Explanations are terse and incomplete
Forward propagation
• Little comparative data that is believable
• No sense of perspective
• Papers give conficting advice
}
Not all those papers can
be the best !
An example – Is inline substitution proftable?
• Holler’s thesis: it almost always helps
• Hall’s thesis: it occasionally helps, but has lots of problems
• MacFarland’s thesis: it causes instruction cache misses
Reality lies somewhere in the middle
Waterman showed that program-specifc heuristics win
COMP 512, Rice
University
4
Optimization
Improvement should be objective
• Easy to quantify
• Produce concrete improvements
• Taking measurements seems easy
Code either gets better or it gets worse
But, …
•
•
•
•
Linear-time heuristics for hard problems
Unforeseen consequences & poorly understood interactions
“Obvious wins” have non-obvious downsides
Multiple ways to achieve the same end
Experimental computer science takes a lot of work
COMP 512, Rice
University
5
The Role of Comp 512
Bringing order out of chaos
• Provide a framework for thinking about optimization
• Diferentiate analysis from transformation†
• Think about how things help, not what they do
Goal: a rational approach to the subject matter
• Objective criteria for evaluating ideas & papers
• Bring high school level science back into the game
†
The Comp 512 Motto:
Knowledge alone does not make code run faster.
You have to change the code to make it run faster.
COMP 512, Rice
University
6
Classic Taxonomy
Machine independent transformations
•
•
•
•
Applicable across a broad range of machines
Decrease ratio of overhead to real work
Reduce running time or space
Examples: dead code elimination
Machine dependent transformations
• Capitalize on specifc machine properties
• Improve the mapping from IR to this machine
• Might use an exotic instruction
(shift the reg. window for a
loop)
• Example: instruction scheduling
COMP 512, Rice
University
7
Classic Taxonomy
Distinction is not always clear
• Replacing multiply with shifts and adds
• Eliminating a redundant expression
The truth is somewhat muddled
• Machine independent means that we deliberately & knowingly
ignore target-specifc constraints
• Machine dependent means that we explicitly consider targetspecifc constraints
Redundancy elimination might ft in either category
> Versions that consider register pressure
COMP 512, Rice
University
8
The Comp 512 Taxonomy
An efects-based classifcation (for speed)
• Five machine-independent ways to speed up code
>
>
>
>
>
Eliminate a redundant computation
Move code to a place where it executes less often
Eliminate dead code
Specialize a computation based on context
Enable another transformation
• Three machine-dependent ways to speed up the code
Manage or hide latency
> Take advantage of special hardware features
> Manage fnite resources
>
For scalar optimization, this covers most of them
COMP 512, Rice
University
9
The Comp 512 Taxonomy
Machine Independent
Redundancy
Code motion
Dead code
Specialization
Redundancy elim.
Dead code elim.
Replication
Partial red. elim.
Partial d.c.e.
Strength Reduction
Consolidation
Constant propagation
Method caching
Algebraic identities
Heapstack allocation
Loop-invariant c.m.
Consolidation
Tail recursion elimination
Create opportunities
Reassociation
Replication
Global Scheduling [Click]
Constant propagation
COMP 512, Rice
University
From §6 of Cooper, McKinley, & Torczon
10
The Comp 512 Taxonomy
Machine Dependent
Hide latency
Manage resources
Special features
Scheduling
Allocate (registers, tlb slots)
Instruction selection
Blocking references
Schedule
Peephole optimization
Prefetching
Data packing
Code layout
Coloring memory locations
Data packing
COMP 512, Rice
University
From §6 of Cooper, McKinley, & Torczon
11
What have we seen so far?
• Redundancy elimination
LVN, SVN, DVN
> It is a category in taxonomy by itself
>
• Loop Unrolling
>
Form of specialization
• Dead store elimination
>
Form of dead code elimination
• Block Placement
>
Form of latency hiding & resource management
• Inline substitution
>
Form of specialization
COMP 512, Rice
University
12
What about Fortran H?
(Lecture 2)
• Eliminate a redundant computation
Commoning
Move code to a place where it executes less often
> Backward motion
Eliminate dead code
> (must have done it, but don’t talk about it)
Specialize a computation based on context
> Strength reduction
Enable another transformation
> Reassociation
Manage or hide latency
Didn’t really talk about
Take advantage of special hardware features these, if they did them
Manage fnite resources
> Register allocation
>
•
•
•
•
•
•
•
}
COMP 512, Rice
University
* 13
Scope of Optimization
(another axis)
Local
• Handles individual basic blocks
Maximal length sequence of straight line code
> Each of the bi’s is a basic block
>
• Basic blocks are easy to analyze
• Can prove strongest results
• Code quality sufers at block boundaries
b1
b2
b4
b3
b5
Local methods
• Value numbering, instruction scheduling
COMP 512, Rice
University
b6
* 14
Scope of Optimization
Superlocal
• Handles extended basic blocks
>
Sequence of blocks where each has a unique predecessor
>
Use results for bi to help with bj
• Analysis & transformation over larger region
• Fewer rough edges
• Can make it efcient by reusing results
b1
b2
b4
b3
b5
Superlocal methods
• Value numbering, instruction scheduling
COMP 512, Rice
University
b6
* 15
Remember
Fortran H
Scope of Optimization
Regional
• Arbitrary subset of blocks
( loop nests, dominator
subtrees)
• Use results from one block to improve others
• Limiting scope can increase focus on
performance critical regions
b1
b2
• Can eliminate some global impediments
b3
Regional methods
• Loop xforms (unroll, fuse, interchange,
strip mining, blocking), DVNT, OSR,
register promotion, prefetch insertion,
software pipelining, trace scheduling ...
COMP 512, Rice
University
b4
b5
* 16
Scope of Optimization
Whole procedure (global or intraprocedural )
•
•
•
•
Handles entire procedure
Make decisions based on global knowledge & global beneft
No rough edges inside procedure (tied to compilation unit)
Classic data-fow analysis
b1
b2
Global methods
• CSE (AVAIL, PRE, LCM), constant
propagation, GCRA, dead code elim.,
hoisting, copy coalescing, ...
COMP 512, Rice
University
b4
b3
b5
b6
17
Scope of Optimization
Whole program (interprocedural )
• Handles more than one procedure, up to entire program
• Create even larger scopes for optimization
• Limited interactions between procedures
>
Parameters + global variables
• Analysis problems are harder
• Opportunities are diferent
• Issues for compiler structure
b1
b2
b4
b3
b2
b6
b4
COMP 512, Rice
University
b5
b1
• Inline substitution, procedure cloning,
program analysis to support global xforms
b3
b6
Interprocedural methods
constant propagation, using whole
b1
b5
b2
b4
b3
b5
b6
18
Decision Complexity
Yet another axis on which to categorize optimizations
Constant time
LVN, SVN, DVNT
Block placement
Tree balancing (ILP)
COMP 512, Rice
University
Low-order
polynomial time
Loop unrolling
Hard Problems
Reassociation
Inline substitution
Spill Choice in RA
Copy Coalescing
19
Near-term Roadmap
Data-fow analysis
• Deeper treatment of iterative data-fow theory
• Quick look at other solvers
Static single assignment form
• Construction and destruction of SSA
• Example algorithms that use SSA
Populating the Taxonomy
• More transformations, more transformations, more …
• Try to fll in the odd corners of the taxonomy
COMP 512, Rice
University
20