Volume 1 Application Programming (794095), страница 52
Текст из файла (страница 52)
Such results are clamped (limited) to the maximum or minimum value representable bythe destination data type when the true result exceeds that maximum or minimum representable value.Saturation avoids the need for code that tests for potential overflow or underflow. Saturating data isuseful for representing physical-world data, such as sound and color. It is used, for example, whencombining values for pixel coloring.64-Bit Media Programming197AMD64 Technology24592—Rev. 3.13—July 20075.3.5 Branch RemovalBranching is a time-consuming operation that, unlike most 64-bit media vector operations, does notexhibit parallel behavior (there is only one branch target, not multiple targets, per branch instruction).In many media applications, a branch involves selecting between only a few (often only two) cases.Such branches can be replaced with 64-bit media vector compare and vector logical instructions thatsimulate predicated execution or conditional moves.Figure 5-5 shows an example of a non-branching sequence that implements a two-way multiplexer—one that is equivalent to the ternary operator “?:” in C and C++.
The comparable code sequence isexplained in “Compare and Write Mask” on page 220.The sequence in Figure 5-5 begins with a vector compare instruction that compares the elements oftwo source operands in parallel and produces a mask vector containing elements of all 1s or 0s. Thismask vector is ANDed with one source operand and ANDed-Not with the other source operand toisolate the desired elements of both operands.
These results are then ORed to select the relevantelements from each operand. A similar branch-removal operation can be done using floating-pointsource operands.operand 1operand 263a30a2a163a00b3b2b1b0CompareFFFF 0000 0000 FFFFAndAnd-Nota3 0000 0000 a00000 b2b1 0000Ora3b2b1a0513-127.epsFigure 5-5. Branch-Removal Sequence19864-Bit Media Programming24592—Rev. 3.13—July 2007AMD64 Technology5.3.6 Floating-Point (3DNow!™) Vector OperationsFloating-point vector instructions using the MMX registers were introduced by AMD with the3DNow! technology.
These instructions take 64-bit vector operands consisting of two 32-bit singleprecision floating-point numbers, shown as FP single in Figure 5-6.63032 3163FP single FP single032 31FP single FP singleopopFP single FP single6332 310513-124.epsFigure 5-6. Floating-Point (3DNow!™ Instruction) OperationsThe AMD64 architecture’s 3DNow! floating-point instructions provide a unique advantage overlegacy x87 floating-point instructions: They allow integer and floating-point instructions to beintermixed in the same procedure, using only the MMX registers. This avoids the need to switchbetween integer MMX procedures and x87 floating-point procedures—a switch that may involvetime-consuming state saves and restores—while at the same time leaving the 128-bit XMM registerresources free for other applications.The 3DNow! instructions allow applications such as 3D graphics to accelerate front-end geometry,clipping, and lighting calculations.
Picture and pixel data are typically integer data types, althoughboth integer and floating-point instructions are often required to operate completely on the data. Forexample, software can change the viewing perspective of a 3D scene through transformation matricesby using floating-point instructions in the same procedure that contains integer operations on otheraspects of the graphics data.3DNow! programs typically perform better than x87 floating-point code, because the MMX registerfile is flat rather than stack-oriented and because 3DNow! instructions can operate on twice as manyoperands as x87 floating-point instructions. This ability to operate in parallel on twice as manyfloating-point values in the same register space often makes it possible to remove local temporaryvariables in 3DNow! code that would otherwise be needed in x87 floating-point code.64-Bit Media Programming199AMD64 Technology5.424592—Rev.
3.13—July 2007Registers5.4.1 MMX™ RegistersEight 64-bit MMX registers, mmx0–mmx7, support the 64-bit media instructions. Figure 5-7 showsthese registers. They can hold operands for both vector and scalar operations on integer (MMX) andfloating-point (3DNow!) data types.MMXTM Registers630mmx0mmx1mmx2mmx3mmx4mmx5mmx6mmx7513-145.epsFigure 5-7.64-Bit Media RegistersThe MMX registers are mapped onto the low 64 bits of the 80-bit x87 floating-point physical dataregisters, FPR0–FPR7, described in “Registers” on page 238.
However, the x87 stack registerstructure, ST(0)–ST(7), is not used by MMX instructions. The x87 tag bits, top-of-stack pointer(TOP), and high bits of the 80-bit FPR registers are changed when 64-bit media instructions areexecuted. For details about the x87-related actions performed by hardware during execution of 64-bitmedia instructions, see “Actions Taken on Executing 64-Bit Media Instructions” on page 232.5.4.2 Other RegistersSome 64-bit media instructions that perform data transfer, data conversion or data reorderingoperations (“Data Transfer” on page 209, “Data Conversion” on page 211, and “Data Conversion” onpage 224) can access operands in the general-purpose registers (GPRs) or XMM registers. Whenaddressing GPRs or XMM registers in 64-bit mode, the REX instruction prefix can be used to accessthe extended GPRs or XMM registers, as described in “REX Prefixes” on page 74.
For a description ofthe GPR registers, see “Registers” on page 23. For a description of the XMM registers, see “XMMRegisters” on page 116.20064-Bit Media Programming24592—Rev. 3.13—July 20075.5AMD64 TechnologyOperandsOperands for a 64-bit media instruction are either referenced by the instruction's opcode or included asan immediate value in the instruction encoding. Depending on the instruction, referenced operands canbe located in registers or memory. The data types of these operands include vector and scalar integer,and vector floating-point.5.5.1 Data TypesFigure 5-8 on page 202 shows the register images of the 64-bit media data types.
These data types canbe interpreted by instruction syntax and/or the software context as one of the following types of values:•••••Vector (packed) single-precision (32-bit) floating-point numbers.Vector (packed) signed (two's-complement) integers.Vector (packed) unsigned integers.Scalar signed (two's-complement) integers.Scalar unsigned integers.Hardware does not check or enforce the data types for instructions. Software is responsible forensuring that each operand for an instruction is of the correct data type. Software can interpret the datatypes in ways other than those shown in Figure 5-8 on page 202—such as bit fields or fractionalnumbers—but the 64-bit media instructions do not directly support such interpretations and softwaremust handle them entirely on its own.64-Bit Media Programming201AMD64 Technology24592—Rev.
3.13—July 2007Vector (Packed) Single-Precision Floating-Pointexpss63significandss54expsignificand31220Vector (Packed) Signed Integersdoublewordsswordssbytess63sswordssbyte55ssbyte47doublewordssssbyte39wordssssbyte31ssbyte23wordssssbyte15ssbyte70Vector (Packed) Unsigned Integersdoublewordwordbyte63wordbyte55doublewordbyte47wordbyte39byte31wordbyte23byte15byte70Signed Integerssquadword63sdoubleword31sword15sbyte70Unsigned Integersquadword63doubleword31word15byte7513-319.epsFigure 5-8.202064-Bit Media Data Types64-Bit Media Programming24592—Rev.
3.13—July 2007AMD64 Technology5.5.2 Operand Sizes and OverridesOperand sizes for 64-bit media instructions are determined by instruction opcodes. Some of theseopcodes include an operand-size override prefix, but this prefix acts in a special way to modify theopcode and is considered an integral part of the opcode. The general use of the 66h operand-sizeoverride prefix described in “Instruction Prefixes” on page 71 does not apply to 64-bit mediainstructions.For details on the use of operand-size override prefixes in 64-bit media instructions, see the opcodes in“64-Bit Media Instruction Reference” in Volume 5.5.5.3 Operand AddressingDepending on the 64-bit media instruction, referenced operands may be in registers or memory.Register Operands.
Most 64-bit media instructions can access source and destination operandslocated in MMX registers. A few of these instructions access the XMM or GPR registers. Whenaddressing GPR or XMM registers in 64-bit mode, the REX instruction prefix can be used to accessthe extended GPR or XMM registers, as described in “Instruction Prefixes” on page 228.The 64-bit media instructions do not access the rFLAGS register, and none of the bits in that registerare affected by execution of the 64-bit media instructions.Memory Operands.
Most 64-bit media instructions can read memory for source operands, and a fewof the instructions can write results to memory. “Memory Addressing” on page 14, describes thegeneral methods and conditions for addressing memory operands.Immediate Operands. Immediate operands are used in certain data-conversion and vector-shiftinstructions. Such instructions take 8-bit immediates, which provide control for the operation.I/O Ports. I/O ports in the I/O address space cannot be directly addressed by 64-bit mediainstructions, and although memory-mapped I/O ports can be addressed by such instructions, doing somay produce unpredictable results, depending on the hardware implementation of the architecture. Seethe data sheet or software-optimization documentation for particular hardware implementations.5.5.4 Data AlignmentThose 64-bit media instructions that access a 128-bit operand in memory incur a general-protectionexception (#GP) if the operand is not aligned to a 16-byte boundary.
These instructions include:••••CVTPD2PI—Convert Packed Double-Precision Floating-Point to Packed Doubleword Integers.CVTTPD2PI—Convert Packed Double-Precision Floating-Point to Packed Doubleword Integers,Truncated.FXRSTOR—Restore XMM, MMX, and x87 State.FXSAVE—Save XMM, MMX, and x87 State.64-Bit Media Programming203AMD64 Technology24592—Rev. 3.13—July 2007For other 64-bit media instructions, the architecture does not impose data-alignment requirements foraccessing 64-bit media data in memory. Specifically, operands in physical memory do not need to bestored at addresses that are even multiples of the operand size in bytes. However, the consequence ofstoring operands at unaligned locations is that accesses to those operands may require more processorand bus cycles than for aligned accesses.