Volume 1 Basic Architecture (794100), страница 63
Текст из файла (страница 63)
This data typeconsists of four IEEE 32-bit single-precision floating-point values packed into adouble quadword. (See Figure 4-3 for the layout of a single-precision floating-pointvalue; refer to Section 4.2.2, “Floating-Point Data Types,” for a detailed description ofthe single-precision floating-point format.)Contains 4 Single-PrecisionFloating-Point Values12796 9564 6332 310Figure 10-4. 128-Bit Packed Single-Precision Floating-Point Data Type10-8 Vol. 1PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)This 128-bit packed single-precision floating-point data type is operated on in theXMM registers or in memory. Conversion instructions are provided to convert twopacked single-precision floating-point values into two packed doubleword integers ora scalar single-precision floating-point value into a doubleword integer (seeFigure 11-8).SSE extensions provide conversion instructions between XMM registers and MMXregisters, and between XMM registers and general-purpose bit registers.
SeeFigure 11-8.The address of a 128-bit packed memory operand must be aligned on a 16-byteboundary, except in the following cases:••The MOVUPS instruction supports unaligned accesses.Scalar instructions that use a 4-byte memory operand that is not subject toalignment requirements.Figure 4-2 shows the byte order of 128-bit (double quadword) data types in memory.10.4SSE INSTRUCTION SETSSE instructions are divided into four functional groups••••Packed and scalar single-precision floating-point instructions64-bit SIMD integer instructionsState management instructionsCacheability control, prefetch, and memory ordering instructionsThe following sections give an overview of each of the instructions in these groups.10.4.1SSE Packed and Scalar Floating-Point InstructionsThe packed and scalar single-precision floating-point instructions are divided into thefollowing subgroups:••••••Data movement instructionsArithmetic instructionsLogical instructionsComparison instructionsShuffle instructionsConversion instructionsThe packed single-precision floating-point instructions perform SIMD operations onpacked single-precision floating-point operands (see Figure 10-5).
Each sourceoperand contains four single-precision floating-point values, and the destinationVol. 1 10-9PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)operand contains the results of the operation (OP) performed in parallel on the corresponding values (X0 and Y0, X1 and Y1, X2 and Y2, and X3 and Y3) in each operand.X3X2Y3X1Y2X0Y1OPOPOPX3 OP Y3X2 OP Y2X1 OP Y1Y0OPX0 OP Y0Figure 10-5. Packed Single-Precision Floating-Point OperationThe scalar single-precision floating-point instructions operate on the low (leastsignificant) doublewords of the two source operands (X0 and Y0); see Figure 10-6.The three most significant doublewords (X1, X2, and X3) of the first source operandare passed through to the destination.
The scalar operations are similar to thefloating-point operations performed in the x87 FPU data registers with the precisioncontrol field in the x87 FPU control word set for single precision (24-bit significand),except that x87 stack operations use a 15-bit exponent range for the result, whileSSE operations use an 8-bit exponent range.X3Y3X2Y2X1Y1X0Y0OPX3X2X1X0 OP Y0Figure 10-6.
Scalar Single-Precision Floating-Point Operation10-10 Vol. 1PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)10.4.1.1SSE Data Movement InstructionsSSE data movement instructions move single-precision floating-point data betweenXMM registers and between an XMM register and memory.The MOVAPS (move aligned packed single-precision floating-point values) instructiontransfers a double quadword operand containing four packed single-precisionfloating-point values from memory to an XMM register and vice versa, or betweenXMM registers.
The memory address must be aligned to a 16-byte boundary; otherwise, a general-protection exception (#GP) is generated.The MOVUPS (move unaligned packed single-precision, floating-point) instructionperforms the same operations as the MOVAPS instruction, except that 16-byte alignment of a memory address is not required.The MOVSS (move scalar single-precision floating-point) instruction transfers a 32bit single-precision floating-point operand from memory to the low doubleword of anXMM register and vice versa, or between XMM registers.The MOVLPS (move low packed single-precision floating-point) instruction movestwo packed single-precision floating-point values from memory to the low quadwordof an XMM register and vice versa. The high quadword of the register is leftunchanged.The MOVHPS (move high packed single-precision floating-point) instruction movestwo packed single-precision floating-point values from memory to the high quadwordof an XMM register and vice versa.
The low quadword of the register is leftunchanged.The MOVLHPS (move packed single-precision floating-point low to high) instructionmoves two packed single-precision floating-point values from the low quadword ofthe source XMM register into the high quadword of the destination XMM register. Thelow quadword of the destination register is left unchanged.The MOVHLPS (move packed single-precision floating-point high to low) instructionmoves two packed single-precision floating-point values from the high quadword ofthe source XMM register into the low quadword of the destination XMM register. Thehigh quadword of the destination register is left unchanged.The MOVMSKPS (move packed single-precision floating-point mask) instructiontransfers the most significant bit of each of the four packed single-precision floatingpoint numbers in an XMM register to a general-purpose register.
This 4-bit value canthen be used as a condition to perform branching.10.4.1.2SSE Arithmetic InstructionsSSE arithmetic instructions perform addition, subtraction, multiply, divide, reciprocal, square root, reciprocal of square root, and maximum/minimum operations onpacked and scalar single-precision floating-point values.Vol.
1 10-11PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)The ADDPS (add packed single-precision floating-point values) and SUBPS (subtractpacked single-precision floating-point values) instructions add and subtract, respectively, two packed single-precision floating-point operands.The ADDSS (add scalar single-precision floating-point values) and SUBSS (subtractscalar single-precision floating-point values) instructions add and subtract, respectively, the low single-precision floating-point values of two operands and store theresult in the low doubleword of the destination operand.The MULPS (multiply packed single-precision floating-point values) instruction multiplies two packed single-precision floating-point operands.The MULSS (multiply scalar single-precision floating-point values) instruction multiplies the low single-precision floating-point values of two operands and stores theresult in the low doubleword of the destination operand.The DIVPS (divide packed, single-precision floating-point values) instruction dividestwo packed single-precision floating-point operands.The DIVSS (divide scalar single-precision floating-point values) instruction dividesthe low single-precision floating-point values of two operands and stores the result inthe low doubleword of the destination operand.The RCPPS (compute reciprocals of packed single-precision floating-point values)instruction computes the approximate reciprocals of values in a packed single-precision floating-point operand.The RCPSS (compute reciprocal of scalar single-precision floating-point values)instruction computes the approximate reciprocal of the low single-precision floatingpoint value in the source operand and stores the result in the low doubleword of thedestination operand.The SQRTPS (compute square roots of packed single-precision floating-point values)instruction computes the square roots of the values in a packed single-precisionfloating-point operand.The SQRTSS (compute square root of scalar single-precision floating-point values)instruction computes the square root of the low single-precision floating-point valuein the source operand and stores the result in the low doubleword of the destinationoperand.The RSQRTPS (compute reciprocals of square roots of packed single-precisionfloating-point values) instruction computes the approximate reciprocals of thesquare roots of the values in a packed single-precision floating-point operand.The RSQRTSS (reciprocal of square root of scalar single-precision floating-pointvalue) instruction computes the approximate reciprocal of the square root of the lowsingle-precision floating-point value in the source operand and stores the result inthe low doubleword of the destination operand.The MAXPS (return maximum of packed single-precision floating-point values)instruction compares the corresponding values from two packed single-precisionfloating-point operands and returns the numerically greater value from each comparison to the destination operand.10-12 Vol.
1PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)The MAXSS (return maximum of scalar single-precision floating-point values)instruction compares the low values from two packed single-precision floating-pointoperands and returns the numerically greater value from the comparison to the lowdoubleword of the destination operand.The MINPS (return minimum of packed single-precision floating-point values)instruction compares the corresponding values from two packed single-precisionfloating-point operands and returns the numerically lesser value from each comparison to the destination operand.The MINSS (return minimum of scalar single-precision floating-point values) instruction compares the low values from two packed single-precision floating-point operands and returns the numerically lesser value from the comparison to the lowdoubleword of the destination operand.10.4.2SSE Logical InstructionsSSE logical instructions perform AND, AND NOT, OR, and XOR operations on packedsingle-precision floating-point values.The ANDPS (bitwise logical AND of packed single-precision floating-point values)instruction returns the logical AND of two packed single-precision floating-pointoperands.The ANDNPS (bitwise logical AND NOT of packed single-precision, floating-pointvalues) instruction returns the logical AND NOT of two packed single-precisionfloating-point operands.The ORPS (bitwise logical OR of packed single-precision, floating-point values)instruction returns the logical OR of two packed single-precision floating-point operands.The XORPS (bitwise logical XOR of packed single-precision, floating-point values)instruction returns the logical XOR of two packed single-precision floating-point operands.10.4.2.1SSE Comparison InstructionsThe compare instructions compare packed and scalar single-precision floating-pointvalues and return the results of the comparison either to the destination operand orto the EFLAGS register.The CMPPS (compare packed single-precision floating-point values) instructioncompares the corresponding values from two packed single-precision floating-pointoperands, using an immediate operand as a predicate, and returns a 32-bit maskresult of all 1s or all 0s for each comparison to the destination operand.