Volume 1 Application Programming (794095), страница 43
Текст из файла (страница 43)
The architecturesupports two memory formats for FXSAVE and FXRSTOR, a 512-byte 32-bit legacy format and a512-byte 64-bit format. Selection of the 32-bit or 64-bit format is determined by the effective operandsize for the FXSAVE and FXRSTOR instructions. For details, see “FXSAVE and FXRSTORInstructions” in Volume 2.Save and Restore Control and Status• STMXCSR—Store MXCSR Control/Status Register• LDMXCSR—Load MXCSR Control/Status RegisterThe STMXCSR and LDMXCSR instructions save and restore the 32-bit contents of the MXCSRregister. For further information, see “MXCSR Register” on page 117.4.6Instruction Summary—Floating-Point InstructionsThis section summarizes the functions of the floating-point instructions in the 128-bit mediainstruction subset.
These include floating-point instructions that use an XMM register for source ordestination and data-conversion instructions that convert from floating-point to integers formats. For asummary of the integer instructions in the 128-bit media instruction subset, including data-conversioninstructions that convert from integer to floating-point formats, see “Instruction Summary—IntegerInstructions” on page 133.156128-Bit Media and Scientific Programming24592—Rev.
3.13—July 2007AMD64 TechnologyFor a summary of the 64-bit media floating-point instructions, see “Instruction Summary—FloatingPoint Instructions” on page 223. For a summary of the x87 floating-point instructions, see “InstructionSummary” on page 262.The instructions are organized here by functional group—such as data-transfer, vector arithmetic, andso on. Software running at any privilege level can use any of these instructions, if the CPUIDinstruction reports support for the instructions (see “Feature Detection” on page 176).
More detail onindividual instructions is given in the alphabetically organized “128-Bit Media Instruction Reference”in Volume 4.4.6.1 SyntaxThe 128-bit media floating-point instructions have the same syntax rules as those for the 128-bit mediainteger instructions, described in “Syntax” on page 133. For an illustration of typical syntax, seeFigure 4-16 on page 134.4.6.2 Data TransferThe data-transfer instructions copy operands between 32-bit, 64-bit, or 128-bit memory locations andXMM registers.
The MOV mnemonic, which stands for move, is a misnomer. A copy function isactually performed instead of a move. A new copy of the source value is created at the destinationaddress, and the original copy remains unchanged at its source location.Move• MOVAPS—Move Aligned Packed Single-Precision Floating-Point• MOVAPD—Move Aligned Packed Double-Precision Floating-Point• MOVUPS—Move Unaligned Packed Single-Precision Floating-Point• MOVUPD—Move Unaligned Packed Double-Precision Floating-Point• MOVHPS—Move High Packed Single-Precision Floating-Point• MOVHPD—Move High Packed Double-Precision Floating-Point• MOVLPS—Move Low Packed Single-Precision Floating-Point• MOVLPD—Move Low Packed Double-Precision Floating-Point• MOVHLPS—Move Packed Single-Precision Floating-Point High to Low• MOVLHPS—Move Packed Single-Precision Floating-Point Low to High• MOVSS—Move Scalar Single--Precision Floating-Point• MOVSD—Move Scalar Double-Precision Floating-Point• MOVDDUP—Move Double-Precision and Duplicate• MOVSLDUP—Move Single-Precision High and Duplicate• MOVSHDUP—Move Single-Precision Low and DuplicateFigure 4-31 on page 159 shows the capabilities of the various floating-point move instructions.128-Bit Media and Scientific Programming157AMD64 Technology24592—Rev.
3.13—July 2007The MOVAPx instructions copy a vector of four single-precision floating-point values (MOVAPS) or avector of two double-precision floating-point values (MOVAPD) from the second operand to the firstoperand—i.e., from an XMM register or 128-bit memory location or to another XMM register, or viceversa. A general-protection exception occurs if a memory operand is not aligned on a 16-byteboundary.The MOVUPx instructions perform operations analogous to the MOVAPx instructions, except thatunaligned memory operands do not cause a general-protection exception.158128-Bit Media and Scientific Programming24592—Rev.
3.13—July 2007127XMM Register(destination)AMD64 Technology0MOVAPSMOVAPDMOVUPSMOVUPD127XMM Register or Memory(source)0memoryMOVLPS*MOVLPD*MOVHPS*MOVHPD*MOVSDMOVSS127XMM Register or Memory(destination)0MOVAPSMOVAPDMOVUPSMOVUPD127XMM Register(source)0memoryMOVLPS*MOVLPD*MOVHPS*MOVHPD*MOVSDMOVSS127XMM Register(destination)0127XMM Register(source)0MOVHLPSMOVLHPS* These instructions copy data only between memory and regsiter or vice versa, not between two registers.513-169.epsFigure 4-31. Floating-Point Move OperationsThe MOVHPS and MOVHPD instructions copy a vector of two single-precision floating-point values(MOVHPS) or one double-precision floating-point value (MOVHPD) from a 64-bit memory locationto the high-order 64 bits of an XMM register, or from the high-order 64 bits of an XMM register to a128-Bit Media and Scientific Programming159AMD64 Technology24592—Rev.
3.13—July 200764-bit memory location. In the memory-to-register case, the low-order 64 bits of the destination XMMregister are not modified.The MOVLPS and MOVLPD instructions copy a vector of two single-precision floating-point values(MOVLPS) or one double-precision floating-point value (MOVLPD) from a 64-bit memory locationto the low-order 64 bits of an XMM register, or from the low-order 64 bits of an XMM register to a 64bit memory location. In the memory-to-register case, the high-order 64 bits of the destination XMMregister are not modified.The MOVHLPS instruction copies a vector of two single-precision floating-point values from thehigh-order 64 bits of an XMM register to the low-order 64 bits of another XMM register.
The highorder 64 bits of the destination XMM register are not modified. The MOVLHPS instruction performsan analogous operation except in the opposite direct (low-order to high-order), and the low-order 64bits of the destination XMM register are not modified.The MOVSS instruction copies a scalar single-precision floating-point value from the low-order 32bits of an XMM register or a 32-bit memory location to the low-order 32 bits of another XMM register,or vice versa.
If the source operand is an XMM register, the high-order 96 bits of the destination XMMregister are not modified. If the source operand is a 32-bit memory location, the high-order 96 bits ofthe destination XMM register are cleared to all 0s.The MOVSD instruction copies a scalar double-precision floating-point value from the low-order 64bits of an XMM register or a 64-bit memory location to the low-order 64 bits of another XMM register,or vice versa. If the source operand is an XMM register, the high-order 64 bits of the destination XMMregister are not modified. If the source operand is a memory location, the high-order 64 bits of thedestination XMM register are cleared to all 0s.The above MOVSD instruction should not be confused with the same-mnemonic MOVSD (movestring doubleword) instruction in the general-purpose instruction set.
Assemblers distinguish the twoinstructions by their operand data types.Move with Duplication. These instructions move two copies of the affected data segments from thesource XMM register or 128-bit memory operand to the target destination register.The MOVDDUP moves one copy of the lower quadword of the source operand into each quadwordhalf of the destination operand.The MOVSLDUP instruction moves two copies of the first doubleword of the source operand into thefirst two doubleword segments of the destination operand and moves two copies of the thirddoubleword of the source operand into the third and fourth doubleword segments of the destinationoperand.The MOVSHDUP instruction moves two copies of the second doubleword of the source operand intothe first two doubleword segments of the destination operand and moves two copies of the fourthdoubleword of the source operand into the upper two doubleword segments of the destination operand.160128-Bit Media and Scientific Programming24592—Rev.
3.13—July 2007AMD64 TechnologyMove Non-Temporal. The move non-temporal instructions are streaming-store instructions. Theyminimize pollution of the cache.••••MOVNTPD—Move Non-Temporal Packed Double-Precision Floating-PointMOVNTPS—Move Non-Temporal Packed Single-Precision Floating-PointMOVNTSD—Move Non-Temporal Scalar Double-Precision Floating-PointMOVNTSS—Move Non-Temporal Scalar Single-Precision Floating-PointThe MOVNTPx instructions copy four packed single-precision floating-point (MOVNTPS) or twopacked double-precision floating-point (MOVNTPD) values from an XMM register into a 128-bitmemory location.The MOVNTSx instructions store one double precision floating point XMM register value into a 64 bitmemory location or one single precision floating point XMM register value into a 32-bit memorylocation.These instructions indicate to the processor that their data is non-temporal, which assumes that thedata they reference will be used only once and is therefore not subject to cache-related overhead (asopposed to temporal data, which assumes that the data will be accessed again soon and should becached).
The non-temporal instructions use weakly-ordered, write-combining buffering of write data,and they minimize cache pollution. The exact method by which cache pollution is minimized dependson the hardware implementation of the instruction. For further information, see “MemoryOptimization” on page 92.Move Mask• MOVMSKPS—Extract Packed Single-Precision Floating-Point Sign Mask• MOVMSKPD—Extract Packed Double-Precision Floating-Point Sign MaskThe MOVMSKPS instruction copies the sign bits of four single-precision floating-point values in anXMM register to the four low-order bits of a 32-bit or 64-bit general-purpose register, with zeroextension.
The MOVMSKPD instruction copies the sign bits of two double-precision floating-pointvalues in an XMM register to the two low-order bits of a general-purpose register, with zero-extension.The result of either instruction is a sign-bit mask that can be used for data-dependent branching. Figure4-32 shows the MOVMSKPS operation.128-Bit Media and Scientific Programming161AMD64 Technology24592—Rev. 3.13—July 2007GPR127XMM00concatenate 4 sign bits513-158.epsFigure 4-32.MOVMSKPS Move Mask Operation4.6.3 Data ConversionThe floating-point data-conversion instructions convert floating-point operands to integer operands.These data-conversion instructions take 128-bit floating-point source operands. For data-conversioninstructions that take 128-bit integer source operands, see “Data Conversion” on page 139. For dataconversion instructions that take 64-bit source operands, see “Data Conversion” on page 211 and“Data Conversion” on page 224.Convert Floating-Point to Floating-Point.