Volume 1 Basic Architecture (794100), страница 60
Текст из файла (страница 60)
1 9-7PROGRAMMING WITH INTEL® MMX™ TECHNOLOGYTable 9-2. MMX Instruction Set Summary (Contd.)CategoryShiftWraparoundSignedSaturationShift Left LogicalPSLLW, PSLLDPSLLQShift Right LogicalPSRLW, PSRLDPSRLQShift RightArithmeticPSRAW, PSRADDoubleword TransfersDataTransferUnsigned SaturationQuadword TransfersRegister toRegisterMOVDMOVQLoad fromMemoryMOVDMOVQMOVDMOVQStore to MemoryEmpty MMXState9.4.1EMMSData Transfer InstructionsThe MOVD (Move 32 Bits) instruction transfers 32 bits of packed data from memoryto an MMX register and vice versa; or from a general-purpose register to an MMXregister and vice versa.The MOVQ (Move 64 Bits) instruction transfers 64 bits of packed data from memoryto an MMX register and vice versa; or transfers data between MMX registers.9.4.2Arithmetic InstructionsThe arithmetic instructions perform addition, subtraction, multiplication, andmultiply/add operations on packed data types.The PADDB/PADDW/PADDD (add packed integers) instructions and thePSUBB/PSUBW/ PSUBD (subtract packed integers) instructions add or subtract thecorresponding signed or unsigned data elements of the source and destination operands in wraparound mode.
These instructions operate on packed byte, word, anddoubleword data types.The PADDSB/PADDSW (add packed signed integers with signed saturation) instructions and the PSUBSB/PSUBSW (subtract packed signed integers with signed saturation) instructions add or subtract the corresponding signed data elements of thesource and destination operands and saturate the result to the limits of the signeddata-type range. These instructions operate on packed byte and word data types.The PADDUSB/PADDUSW (add packed unsigned integers with unsigned saturation)instructions and the PSUBUSB/PSUBUSW (subtract packed unsigned integers with9-8 Vol.
1PROGRAMMING WITH INTEL® MMX™ TECHNOLOGYunsigned saturation) instructions add or subtract the corresponding unsigned dataelements of the source and destination operands and saturate the result to the limitsof the unsigned data-type range. These instructions operate on packed byte andword data types.The PMULHW (multiply packed signed integers and store high result) and PMULLW(multiply packed signed integers and store low result) instructions perform a signedmultiply of the corresponding words of the source and destination operands and writethe high-order or low-order 16 bits of each of the results, respectively, to the destination operand.The PMADDWD (multiply and add packed integers) instruction computes the products of the corresponding signed words of the source and destination operands.
Thefour intermediate 32-bit doubleword products are summed in pairs (high-order pairand low-order pair) to produce two 32-bit doubleword results.9.4.3Comparison InstructionsThe PCMPEQB/PCMPEQW/PCMPEQD (compare packed data for equal) instructionsand the PCMPGTB/PCMPGTW/PCMPGTD (compare packed signed integers for greaterthan) instructions compare the corresponding signed data elements (bytes, words,or doublewords) in the source and destination operands for equal to or greater than,respectively.These instructions generate a mask of ones or zeros which are written to the destination operand. Logical operations can use the mask to select packed elements.
Thiscan be used to implement a packed conditional move operation without a branch or aset of branch instructions. No flags in the EFLAGS register are affected.9.4.4Conversion InstructionsThe PACKSSWB (pack words into bytes with signed saturation) and PACKSSDW (packdoublewords into words with signed saturation) instructions convert signed wordsinto signed bytes and signed doublewords into signed words, respectively, usingsigned saturation.PACKUSWB (pack words into bytes with unsigned saturation) converts signed wordsinto unsigned bytes, using unsigned saturation.9.4.5Unpack InstructionsThe PUNPCKHBW/PUNPCKHWD/PUNPCKHDQ (unpack high-order data elements)instructions and the PUNPCKLBW/PUNPCKLWD/PUNPCKLDQ (unpack low-order dataelements) instructions unpack bytes, words, or doublewords from the high- or loworder data elements of the source and destination operands and interleave them inthe destination operand.
By placing all 0s in the source operand, these instructionsVol. 1 9-9PROGRAMMING WITH INTEL® MMX™ TECHNOLOGYcan be used to convert byte integers to word integers, word integers to doublewordintegers, or doubleword integers to quadword integers.9.4.6Logical InstructionsPAND (bitwise logical AND), PANDN (bitwise logical AND NOT), POR (bitwise logicalOR), and PXOR (bitwise logical exclusive OR) perform bitwise logical operations onthe quadword source and destination operands.9.4.7Shift InstructionsThe logical shift left, logical shift right and arithmetic shift right instructions shift eachelement by a specified number of bit positions.The PSLLW/PSLLD/PSLLQ (shift packed data left logical) instructions and thePSRLW/PSRLD/PSRLQ (shift packed data right logical) instructions perform a logicalleft or right shift of the data elements and fill the empty high or low order bit positions with zeros. These instructions operate on packed words, doublewords, andquadwords.The PSRAW/PSRAD (shift packed data right arithmetic) instructions perform an arithmetic right shift, copying the sign bit for each data element into empty bit positionson the upper end of each data element.
This instruction operates on packed wordsand doublewords.9.4.8EMMS InstructionThe EMMS instruction empties the MMX state by setting the tags in x87 FPU tag wordto 11B, indicating empty registers. This instruction must be executed at the end of anMMX routine before calling other routines that can execute floating-point instructions. See Section 9.6.3, “Using the EMMS Instruction,” for more information on theuse of this instruction.9.5COMPATIBILITY WITH X87 FPU ARCHITECTUREThe MMX state is aliased to the x87 FPU state. No new states or modes have beenadded to IA-32 architecture to support the MMX technology. The same floating-pointinstructions that save and restore the x87 FPU state also handle the MMX state (forexample, during context switching).MMX technology uses the same interface techniques between the x87 FPU and theoperating system (primarily for task switching purposes).
For more details, seeChapter 11, “Intel® MMX™ Technology System Programming,” in the Intel® 64 andIA-32 Architectures Software Developer’s Manual, Volume 3A.9-10 Vol. 1PROGRAMMING WITH INTEL® MMX™ TECHNOLOGY9.5.1MMX Instructions and the x87 FPU Tag WordAfter each MMX instruction, the entire x87 FPU tag word is set to valid (00B). TheEMMS instruction (empty MMX state) sets the entire x87 FPU tag word to empty(11B).Chapter 11, “Intel® MMX™ Technology System Programming,” in the Intel® 64 andIA-32 Architectures Software Developer’s Manual, Volume 3A, provides additionalinformation about the effects of x87 FPU and MMX instructions on the x87 FPU tagword.
For a description of the tag word, see Section 8.1.7, “x87 FPU Tag Word.”9.6WRITING APPLICATIONS WITH MMX CODEThe following sections give guidelines for writing application code that uses MMXtechnology.9.6.1Checking for MMX Technology SupportBefore an application attempts to use the MMX technology, it should check that it ispresent on the processor. Check by following these steps:1. Check that the processor supports the CPUID instruction by attempting toexecute the CPUID instruction. If the processor does not support the CPUIDinstruction, this will generate an invalid-opcode exception (#UD).2. Check that the processor supports the MMX technology(if CPUID.01H:EDX.MMX[bit 23] = 1).3.
Check that emulation of the x87 FPU is disabled (if CR0.EM[bit 2] = 0).If the processor attempts to execute an unsupported MMX instruction or attempts toexecute an MMX instruction with CR0.EM[bit 2] set, this generates an invalid-opcodeexception (#UD).Example 9-1 illustrates how to use the CPUID instruction to detect the MMX technology. This example does not represent the entire CPUID sequence, but shows theportion used for detection of MMX technology.Example 9-1. Partial Routine for Detecting MMX Technology with the CPUID Instruction...; identify existence of CPUID instruction...; identify Intel processormovEAX, 1; request for feature flagsCPUID; 0FH, 0A2H CPUID instructiontestEDX, 00800000H; Is IA MMX technology bit (Bit 23 of EDX) set?jnz; MMX_Technology_FoundVol. 1 9-11PROGRAMMING WITH INTEL® MMX™ TECHNOLOGY9.6.2Transitions Between x87 FPU and MMX CodeApplications can contain both x87 FPU floating-point and MMX instructions.
However,because the MMX registers are aliased to the x87 FPU register stack, care must betaken when making transitions between x87 FPU instructions and MMX instructionsto prevent incoherent or unexpected results.When an MMX instruction (other than the EMMS instruction) is executed, theprocessor changes the x87 FPU state as follows:•••The TOS (top of stack) value of the x87 FPU status word is set to 0.The entire x87 FPU tag word is set to the valid state (00B in all tag fields).When an MMX instruction writes to an MMX register, it writes ones (11B) to theexponent part of the corresponding floating-point register (bits 64 through 79).The net result of these actions is that any x87 FPU state prior to the execution of theMMX instruction is essentially lost.When an x87 FPU instruction is executed, the processor assumes that the currentstate of the x87 FPU register stack and control registers is valid and executes theinstruction without any preparatory modifications to the x87 FPU state.If the application contains both x87 FPU floating-point and MMX instructions, thefollowing guidelines are recommended:•When transitioning between x87 FPU and MMX code, save the state of any x87FPU data or control registers that need to be preserved for future use.
The FSAVEand FXSAVE instructions save the entire x87 FPU state.•When transitioning between MMX and x87 FPU code, do the following:— Save any data in the MMX registers that needs to be preserved for future use.FSAVE and FXSAVE also save the state of MMX registers.— Execute the EMMS instruction to clear the MMX state from the x87 data andcontrol registers.The following sections describe the use of the EMMS instruction and give additionalguidelines for mixing x87 FPU and MMX code.9.6.3Using the EMMS InstructionAs described in Section 9.6.2, “Transitions Between x87 FPU and MMX Code,” whenan MMX instruction executes, the x87 FPU tag word is marked valid (00B). In thisstate, the execution of subsequent x87 FPU instructions may produce unexpectedx87 FPU floating-point exceptions and/or incorrect results because the x87 FPUregister stack appears to contain valid data.