12 Instruction Scheduling(Engl) (1157540)
Текст из файла
12. Instruction Scheduling
12.1 Introduction
On many processors, the order in which operations are presented for execution has a significant effect on the length of time it takes to execute a sequence of instructions. Different operations take different lengths of time. On a typical commodity microprocessor, integer addition and subtraction require less time than integer division; similarly, floating-point division takes longer than floating-point addition or subtraction. Multiplication usually falls between the corresponding addition and division operations. The time required to complete a load from memory depends on where in the memory hierarchy the value resides at the time that the load is issued.
The task of ordering the operations in a block or a procedure to make effective use of processor resources is called instruction scheduling. The scheduler takes as input a partially ordered list of operations in the target machine’s assembly language; it produces as output an ordered version of the same list. The scheduler assumes that the code has already been optimized and it does not try to duplicate the optimizer’s work. Instead, it packs operations into the available cycles and functional unit issue slots so that the code will run as quickly as possible.
Conceptual Roadmap
The order in which the processor encounters operations has a direct impact on the speed of execution of compiled code. Thus, most compilers include an instruction scheduler that reorders the final operations to improve performance. The scheduler’s choices are constrained by the flow of data, by the delays associated with individual operations, and by the capabilities of the target processor. The scheduler must account for all these factors if it is to produce a correct and efficient schedule for the compiled code.
The dominant technique for instruction scheduling is a greedy heuristic called list scheduling. List schedulers operate on straightline code and use a variety of priority ranking schemes to guide their choices. Compiler writers have invented a number of frameworks to schedule over larger regions in the code than basic blocks; these regional and loop schedulers simply create conditions where the compiler can apply list scheduling to a longer sequence of operations.
Overview
On most modern processors, the order in which instructions appear has an impact on the speed with which the code executes. Processors overlap the execution of operations, issuing successive operations as quickly as possible given the finite (and small) set of functional units. In principle this strategy makes good utilization of hardware resources and decreases execution time by overlapping the execution of successive operations. The difficulty arises when an operation issues before its operands are ready.
Processor designs handle this situation in one of two ways. The processor can stall the premature operation until its operands are available. On a machine that stalls premature operations, the scheduler reorders the operations in an attempt to minimize the number of such stalls. Alternatively, the processor can execute the premature operation, albeit with the incorrect operands. This approach relies on the scheduler to maintain enough distance between a value’s definition and its various uses to maintain correctness. If insufficient useful operations are available to cover the delay associated with some operation, the scheduler must insert nops to fill the gap.
Commodity microprocessors often have operations that have different latencies. Typical values might be one cycle for an integer add or subtract, three
Stall the delay caused by a hardware interlock that prevents a value from being read until its defining operation completes An interlock is the mechanism that detects the premature issue and creates the actual delay.
Statically scheduled A processor that relies on compiler insertion of NOPs for correctness is a statically scheduled processor.
Dynamically scheduled A processor that provides interlocks to ensure correctness is a dynamically scheduled processor.
Superscalar A processor that can issue distinct operations to multiple distinct functional units in a single cycle is considered a superscalar processor.
Instruction level parallelism (ILP) the availability of independent operations that can execute concurrently
Характеристики
Тип файла документ
Документы такого типа открываются такими программами, как Microsoft Office Word на компьютерах Windows, Apple Pages на компьютерах Mac, Open Office - бесплатная альтернатива на различных платформах, в том числе Linux. Наиболее простым и современным решением будут Google документы, так как открываются онлайн без скачивания прямо в браузере на любой платформе. Существуют российские качественные аналоги, например от Яндекса.
Будьте внимательны на мобильных устройствах, так как там используются упрощённый функционал даже в официальном приложении от Microsoft, поэтому для просмотра скачивайте PDF-версию. А если нужно редактировать файл, то используйте оригинальный файл.
Файлы такого типа обычно разбиты на страницы, а текст может быть форматированным (жирный, курсив, выбор шрифта, таблицы и т.п.), а также в него можно добавлять изображения. Формат идеально подходит для рефератов, докладов и РПЗ курсовых проектов, которые необходимо распечатать. Кстати перед печатью также сохраняйте файл в PDF, так как принтер может начудить со шрифтами.