Microprocessors (794225), страница 2
Текст из файла (страница 2)
PIPELINING AND SUPERSCALAR DESIGNS.
It’s the technique that allows a processor to start the execution of a new instruction before completing the current one. Pipelining saves time by ensuring that the microprocessor doesn’t have to wait for instructions; however, processor can still complete just one instruction per clock cycle. To increase efficiency and thereby save processing time, today’s processors (Compaq/Digital’s Alpha, IBM/Motorola’s PowerPC, Intel’s Pentium line, and Sun’s SPARC) feature superscalar architecture. The main benefit of superscalar technology is that it allows processors to execute more than one instruction per clock cycle with multiple pipelines.
In a superscalar design, the processor looks for instructions that can be handles within the same clock cycle and processors these together. In the Pentium processor, for example, Simple instructions such as mov, or and add can bee processed in this way, although only under specific circumstances (one instruction cannot require the result of a second). But more complex instructions such as those involving floating-point operations can’t be handled this way at all.
Parallel processing offers obvious speed benefit, but superscalar technology has critics. Some argue that it wastes many opportunities for parallel executions, because combining individual instructions takes too much time and because individual instructions are often delayed while waiting for resources. For example, say instruction A is being executed from one pipeline and instruction B from another. Instruction C waits in the first pipeline for instruction A to finish. When instruction A finishes, the obvious thing to do is to replace it with instruction C, the next one in that pipeline. But if instruction C needs the result of instruction B, currently being executed from the other pipeline, it was to sit and wait. This defeats the attempt at parallel execution and ruins any chance of increasing speed. Your expensive new processor is basically twiddling its thumbs.
Even in a well-designed program – one that attempts to make full use of both pipeline and parallel execution – pipelining pipeline can get clogged. To help combat this gridlock, engineers have some designed superscalar processors to perform out-of-order execution. If a free pipeline has nothing to do because instruction C needs the results of instruction B, the processor can look for the first instruction in the program that doesn't depend on instruction B (instruction H, let's say). It then starts working on instruction H and related instructions until instruction B is finished, at which point it goes back to instruction C. Instead of sending the results of the out-of-order instruction into the registers (where the processor directly deals with data), the processor sends them to a buffer for storage, then sorts everything into the proper order before releasing them.
One possible problem with out-of-order execution is that two instructions may need to use the same register. To compensate for this, today's processors can change the names of registers on the fly, in a process known as register renaming. Clearly, out-of-order execution requires extremely careful processor design, because programs will possibly fail if instructions are not processed in the proper order.
INSTRUCTION SETS: RISC AND CISC
An instruction set is the specific group of instructions that a particular processor can recognize and execute. Over the years, a debate has raged over two processor-design philosophies surrounding the implementation of instruction sets. The first approach, initially known as microcode, is CISC (complex instruction-set computing). On the other end is RISC (reduced instruction-set computing).
The first processors used hardware to execute each instruction. These hard-wired processors were extremely fast, since there was no software for the instructions to work through. As you might imagine, though, this approach caused a major problem: Any change to the hardware required a change to the (software) instructions as well, and vice versa. Simple programs were possible, but adding complexity was nearly impossible. To get around this problem, IBM devised microcode—simple software stored on the chip from which the processor obtained its instructions.
One advantage of CISC's microcode was that the instruction set could be modified much more easily than before, and thus increasingly complex instructions were possible. Also, because each instruction replaced several simple hard-wired instructions, programs could be written with a smaller number of instructions overall. Another advantage was that CISC programs took less memory space, and memory was expensive back in the sixties and seventies.
The Duo version of Intel Core (Yonah) includes two computational cores, providing performance per watt almost as good as any previous single core Intel processors. In battery-operated devices such as notebook computers, this translates to getting as much total work done per battery charge as with older computers, although the same total work may be done faster. When parallel computations and multiprocessing are able to utilize both cores, the Intel Core Duo delivers much higher peak speed compared to the single-core chips previously available for mobile devices.
The shortcomings of Intel Core (Yonah) are:
-
The same or even slightly worse "performance per watt" in single threaded or non-parallel applications compared to its predecessor.
-
32-bit processes only. 64-bit processes are not supported. (See the Intel Core 2 successor, which is a 64-bit processor.)
-
High memory latency due to the lack of on-die memory controller (further aggravated by system-chipset's use of DDR-II RAM)
-
Limited Floating Point Unit (multiply/divide) throughput for non-parallel computations or single-threaded processes; this is due to the smaller number of floating-point units in each CPU core compared to some previous designs.
The Yonah platform requires all main-memory transactions to pass through the Northbridge of the chipset, increasing latency compared to the AMD's Turion platform. However, application tests showed Intel Core's L2-cache system is quite effective at overcoming main-memory latency; despite this limitation, Intel Core (Yonah) sometimes managed to outperform AMD's Turion.
The Sossaman processor for servers, which is based on Yonah, also lacks Intel 64-bit support. For the server market, this had more severe consequences, since all major server operating systems already supported x86-64, and Microsoft Exchange Server 2007 even requires a 64-bit processor to run.
According to Mobile Roadmaps from 2005, Intel's Yonah project originally focused more on reducing the power consumption of its p6+ Pentium M-based processor and aimed to reduce it by 50% for Intel Core (Yonah). Intel continued recommending Pentium NetBurst-based processors for mobile high performance applications (although these were less power efficient) until the Yonah project succeeded in extracting higher performance from its lower-power-consumption design. The Intel Core Duo's inclusion of two highly-efficient cores on one chip can provide better performance than a Pentium NetBurst core, but with much better power-efficiency. Intel no longer recommends its Pentium Netburst-based processors for mobile devices.
On July 27, 2006, Intel's Core 2 processors were released. By 2Q 2007, Intel expected 90% of its laptop CPU production to be converted to the heavily-revised Intel Core 2 processors. The original Intel Core (Yonah) product had an unusually short lifespan as a stepping stone to the 64-bit Intel Core 2.
Sources:
-
http://www.pcmech.com/show/processors/35/11/
-
http://www.intel.com/technology/architecture/index.htm
-
http://en.wikipedia.org/wiki/Intel_Core_2
-
PC Magazine 9, 1998
-
PC Magazine 30, 1998
8