Volume 1 Basic Architecture (794100), страница 16

Файл №794100 Volume 1 Basic Architecture (Intel and AMD manuals) 16 страницаVolume 1 Basic Architecture (794100) страница 162019-04-282019-04-28СтудИзба

Intel and AMD manuals

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 16)

This microarchitecture pipeline is made up of three sections: (1) the front end pipeline, (2) the out-oforder execution core, and (3) the retirement unit.2-10 Vol. 1INTEL® 64 AND IA-32 ARCHITECTURESSystem BusFrequently used pathsLess frequently usedpathsBus Unit3rd Level CacheOptional2nd Level Cache8-Way1st Level Cache4-wayFront EndFetch/DecodeTrace CacheMicrocode ROMExecutionOut-Of-OrderCoreRetirementBranch History UpdateBTBs/Branch PredictionOM16521Figure 2-2.

The Intel NetBurst Microarchitecture2.2.2.1The Front End PipelineThe front end supplies instructions in program order to the out-of-order executioncore. It performs a number of functions:••••••Prefetches instructions that are likely to be executedFetches instructions that have not already been prefetchedDecodes instructions into micro-operationsGenerates microcode for complex instructions and special-purpose codeDelivers decoded instructions from the execution trace cachePredicts branches using highly advanced algorithmThe pipeline is designed to address common problems in high-speed, pipelinedmicroprocessors.

Two of these problems contribute to major sources of delays:•time to decode instructions fetched from the targetVol. 1 2-11INTEL® 64 AND IA-32 ARCHITECTURES•wasted decode bandwidth due to branches or branch target in the middle ofcache linesThe operation of the pipeline’s trace cache addresses these issues. Instructions areconstantly being fetched and decoded by the translation engine (part of thefetch/decode logic) and built into sequences of µops called traces. At any time,multiple traces (representing prefetched branches) are being stored in the tracecache.

The trace cache is searched for the instruction that follows the active branch.If the instruction also appears as the first instruction in a pre-fetched branch, thefetch and decode of instructions from the memory hierarchy ceases and the prefetched branch becomes the new source of instructions (see Figure 2-2).The trace cache and the translation engine have cooperating branch prediction hardware. Branch targets are predicted based on their linear addresses using branchtarget buffers (BTBs) and fetched as soon as possible.2.2.2.2Out-Of-Order Execution CoreThe out-of-order execution core’s ability to execute instructions out of order is a keyfactor in enabling parallelism.

This feature enables the processor to reorder instructions so that if one µop is delayed, other µops may proceed around it. The processoremploys several buffers to smooth the flow of µops.The core is designed to facilitate parallel execution. It can dispatch up to six µops percycle (this exceeds trace cache and retirement µop bandwidth). Most pipelines canstart executing a new µop every cycle, so several instructions can be in flight at atime for each pipeline. A number of arithmetic logical unit (ALU) instructions canstart at two per cycle; many floating-point instructions can start once every twocycles.2.2.2.3Retirement UnitThe retirement unit receives the results of the executed µops from the out-of-orderexecution core and processes the results so that the architectural state updatesaccording to the original program order.When a µop completes and writes its result, it is retired.

Up to three µops may beretired per cycle. The Reorder Buffer (ROB) is the unit in the processor which bufferscompleted µops, updates the architectural state in order, and manages the orderingof exceptions. The retirement section also keeps track of branches and sendsupdated branch target information to the BTB.

The BTB then purges pre-fetchedtraces that are no longer needed.2.2.3Intel® Core™ MicroarchitectureIntel Core microarchitecture introduces the following features that enable highperformance and power-efficient performance for single-threaded as well as multithreaded workloads:2-12 Vol. 1INTEL® 64 AND IA-32 ARCHITECTURES•Intel® Wide Dynamic Execution enable each processor core to fetch,dispatch, execute in high bandwidths to support retirement of up to four instructions per cycle.— Fourteen-stage efficient pipeline— Three arithmetic logical units— Four decoders to decode up to five instruction per cycle— Macro-fusion and micro-fusion to improve front-end throughput— Peak issue rate of dispatching up to six micro-ops per cycle— Peak retirement bandwidth of up to 4 micro-ops per cycle— Advanced branch prediction— Stack pointer tracker to improve efficiency of executing function/procedureentries and exits•Intel® Advanced Smart Cache delivers higher bandwidth from the secondlevel cache to the core, and optimal performance and flexibility for singlethreaded and multi-threaded applications.— Large second level cache up to 4 MB and 16-way associativity— Optimized for multicore and single-threaded execution environments— 256 bit internal data path to improve bandwidth from L2 to first-level datacache•Intel® Smart Memory Access prefetches data from memory in response todata access patterns and reduces cache-miss exposure of out-of-orderexecution.— Hardware prefetchers to reduce effective latency of second-level cachemisses— Hardware prefetchers to reduce effective latency of first-level data cachemisses— Memory disambiguation to improve efficiency of speculative executionexecution engine•Intel® Advanced Digital Media Boost improves most 128-bit SIMD instructionwith single-cycle throughput and floating-point operations.— Single-cycle throughput of most 128-bit SIMD instructions— Up to eight floating-point operation per cycle— Three issue ports available to dispatching SIMD instructions for executionIntel Core 2 Extreme, Intel Core 2 Duo processors and Intel Xeon processor 5100series implement two processor cores based on the Intel Core microarchitecture, thefunctionality of the subsystems in each core are depicted in Figure 2-3.Vol.

1 2-13INTEL® 64 AND IA-32 ARCHITECTURESInstruction Fetch and P reD ecodeInstruction Q ueueM icrocodeROMD ecodeS hared L2 C acheU p to 10.7 G B /sFS BR enam e/A llocR etirem ent U nit(R e-O rder B uffer)S chedulerA LUB ranchM M X /S S E /FPM oveA LUFA ddM M X /S S EA LUFM ulM M X/S S ELoadS toreL1D C ache and D T LBFigure 2-3.

The Intel Core Microarchitecture Pipeline Functionality2.2.3.1The Front EndThe front end of Intel Core microarchitecture provides several enhancements to feedthe Intel Wide Dynamic Execution engine:•Instruction fetch unit prefetches instructions into an instruction queue tomaintain steady supply of instruction to the decode units.•Four-wide decode unit can decode 4 instructions per cycle or 5 instructions percycle with Macrofusion.•Macrofusion fuses common sequence of two instructions as one decodedinstruction (micro-ops) to increase decoding throughput.•Microfusion fuses common sequence of two micro-ops as one micro-ops toimprove retirement throughput.••Instruction queue provides caching of short loops to improve efficiency.Stack pointer tracker improves efficiency of executing procedure/function entriesand exits.2-14 Vol. 1INTEL® 64 AND IA-32 ARCHITECTURES•Branch prediction unit employs dedicated hardware to handle different types ofbranches for improved branch prediction.•Advanced branch prediction algorithm directs instruction fetch unit to fetchinstructions likely in the architectural code path for decoding.2.2.3.2Execution CoreThe execution core of the Intel Core microarchitecture is superscalar and can processinstructions out of order to increases the overall rate of instructions executed percycle (IPC).

The execution core employs the following feature to improve executionthroughput and efficiency:•••••Up to six micro-ops can be dispatched to execute per cycle••Up to eight floating-point operation per cycle•Reduced exposure to data access delays using Intel Smart Memory AccessUp to four instructions can be retired per cycleThree full arithmetic logical unitsSIMD instructions can be dispatched through three issue portsMost SIMD instructions have 1-cycle throughput (including 128-bit SIMD instructions)Many long-latency computation operation are pipelined in hardware to increaseoverall throughput2.2.4SIMD InstructionsBeginning with the Pentium II and Pentium with Intel MMX technology processorfamilies, five extensions have been introduced into the Intel 64 and IA-32 architectures to perform single-instruction multiple-data (SIMD) operations.

These extensions include the MMX technology, SSE extensions, SSE2 extensions, SSE3extensions, and Supplemental Streaming SIMD Extensions 3. Each of these extensions provides a group of instructions that perform SIMD operations on packedinteger and/or packed floating-point data elements.SIMD integer operations can use the 64-bit MMX or the 128-bit XMM registers. SIMDfloating-point operations use 128-bit XMM registers. Figure 2-4 shows a summary ofthe various SIMD extensions (MMX technology, SSE, SSE2, SSE3, and SSSE3), thedata types they operate on, and how the data types are packed into MMX and XMMregisters.The Intel MMX technology was introduced in the Pentium II and Pentium with MMXtechnology processor families.

MMX instructions perform SIMD operations on packedbyte, word, or doubleword integers located in MMX registers. These instructions areuseful in applications that operate on integer arrays and streams of integer data thatlend themselves to SIMD processing.Vol. 1 2-15INTEL® 64 AND IA-32 ARCHITECTURESSSE extensions were introduced in the Pentium III processor family. SSE instructionsoperate on packed single-precision floating-point values contained in XMM registersand on packed integers contained in MMX registers.

Several SSE instructions providestate management, cache control, and memory ordering operations. Other SSEinstructions are targeted at applications that operate on arrays of single-precisionfloating-point data elements (3-D geometry, 3-D rendering, and video encoding anddecoding applications).SSE2 extensions were introduced in Pentium 4 and Intel Xeon processors. SSE2instructions operate on packed double-precision floating-point values contained inXMM registers and on packed integers contained in MMX and XMM registers.

Характеристики

Тип файла

PDF-файл

Размер

3,16 Mb

Материал

Intel and AMD manuals

Тип материала

Книга

Предмет

Архитектура ЭВМ

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.