Volume 2A Instruction Set Reference A-M (794101), страница 19
Текст из файла (страница 19)
The number of bytes pushed isdetermined by the operand-size attribute of the instruction. See the “Operation”subsection of the “PUSH—Push Word, Doubleword or Quadword Onto the Stack”section in Chapter 4 of the Intel® 64 and IA-32 Architectures SoftwareDeveloper’s Manual, Volume 2B.•Pop() removes the value from the top of the stack and returns it. The statementEAX ← Pop(); assigns to EAX the 32-bit value from the top of the stack. Pop willreturn either a word, a doubleword or a quadword depending on the operand-sizeattribute.
See the “Operation” subsection in the “POP—Pop a Value from the3-10 Vol. 2AINSTRUCTION SET REFERENCE, A-MStack” section of Chapter 4 of the Intel® 64 and IA-32 Architectures SoftwareDeveloper’s Manual, Volume 2B.•PopRegisterStack — Marks the FPU ST(0) register as empty and incrementsthe FPU register stack pointer (TOP) by 1.••Switch-Tasks — Performs a task switch.Bit(BitBase, BitOffset) — Returns the value of a bit within a bit string.
The bitstring is a sequence of bits in memory or a register. Bits are numbered from loworder to high-order within registers and within memory bytes. If the BitBase is aregister, the BitOffset can be in the range 0 to [15, 31, 63] depending on themode and register size. See Figure 3-1: the function Bit[RAX, 21] is illustrated.6331210Bit Offset ← 21Figure 3-1. Bit Offset for BIT[RAX, 21]If BitBase is a memory address, the BitOffset can range has different rangesdepending on the operand size (see Table 3-2).Table 3-2. Range of Bit Positions Specified by Bit Offset OperandsOperand SizeImmediate BitOffsetRegister BitOffset160 to 15− 215 to 215 − 1320 to 31− 231 to 231 − 1640 to 63− 263 to 263 − 1The addressed bit is numbered (Offset MOD 8) within the byte at address(BitBase + (BitOffset DIV 8)) where DIV is signed division with rounding towardsnegative infinity and MOD returns a positive number (see Figure 3-2).Vol. 2A 3-11INSTRUCTION SET REFERENCE, A-M750 7BitBase +0 7BitBase0BitBase −BitOffset ← +1370 7BitBase0 7BitBase −50BitBase −BitOffset ← −Figure 3-2.
Memory Bit Indexing3.1.1.9Intel® C/C++ Compiler Intrinsics Equivalents SectionThe Intel C/C++ compiler intrinsics equivalents are special C/C++ coding extensionsthat allow using the syntax of C function calls and C variables instead of hardwareregisters. Using these intrinsics frees programmers from having to manage registersand assembly programming. Further, the compiler optimizes the instruction scheduling so that executable run faster.The following sections discuss the intrinsics API and the MMX technology and SIMDfloating-point intrinsics. Each intrinsic equivalent is listed with the instructiondescription. There may be additional intrinsics that do not have an instruction equivalent.
It is strongly recommended that the reader reference the compiler documentation for the complete list of supported intrinsics.See Appendix C, “Intel® C/C++ Compiler Intrinsics and Functional Equivalents,” inthe Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B, formore information on using intrinsics.Intrinsics APIThe benefit of coding with MMX technology intrinsics and the SSE/SSE2/SSE3 intrinsics is that you can use the syntax of C function calls and C variables instead of hardware registers. This frees you from managing registers and programming assembly.Further, the compiler optimizes the instruction scheduling so that your executableruns faster.
For each computational and data manipulation instruction in the newinstruction set, there is a corresponding C intrinsic that implements it directly. Theintrinsics allow you to specify the underlying implementation (instruction selection)3-12 Vol. 2AINSTRUCTION SET REFERENCE, A-Mof an algorithm yet leave instruction scheduling and register allocation to thecompiler.MMX™ Technology IntrinsicsThe MMX technology intrinsics are based on a __m64 data type that represents thespecific contents of an MMX technology register. You can specify values in bytes,short integers, 32-bit values, or a 64-bit object. The __m64 data type, however, isnot a basic ANSI C data type, and therefore you must observe the following usagerestrictions:•Use __m64 data only on the left-hand side of an assignment, as a return value,or as a parameter.
You cannot use it with other arithmetic expressions (“+”, “>>”,and so on).•Use __m64 objects in aggregates, such as unions to access the byte elementsand structures; the address of an __m64 object may be taken.•Use __m64 data only with the MMX technology intrinsics described in this manualand Intel® C/C++ compiler documentation.•See:— http://www.intel.com/support/performancetools/— Appendix C, “Intel® C/C++ Compiler Intrinsics and Functional Equivalents,”in the Intel® 64 and IA-32 Architectures Software Developer’s Manual,Volume 2B, for more information on using intrinsics.— SSE/SSE2/SSE3 Intrinsics— SSE/SSE2/SSE3 intrinsics all make use of the XMM registers of the PentiumIII, Pentium 4, and Intel Xeon processors.
There are three data typessupported by these intrinsics: __m128, __m128d, and __m128i.•The __m128 data type is used to represent the contents of an XMM register usedby an SSE intrinsic. This is either four packed single-precision floating-pointvalues or a scalar single-precision floating-point value.•The __m128d data type holds two packed double-precision floating-point valuesor a scalar double-precision floating-point value.•The __m128i data type can hold sixteen byte, eight word, or four doubleword, ortwo quadword integer values.The compiler aligns __m128, __m128d, and __m128i local and global data to16-byte boundaries on the stack.
To align integer, float, or double arrays, use thedeclspec statement as described in Intel C/C++ compiler documentation. Seehttp://www.intel.com/support/performancetools/.The __m128, __m128d, and __m128i data types are not basic ANSI C data typesand therefore some restrictions are placed on its usage:•Use __m128, __m128d, and __m128i only on the left-hand side of anassignment, as a return value, or as a parameter.
Do not use it in other arithmeticexpressions such as “+” and “>>.”Vol. 2A 3-13INSTRUCTION SET REFERENCE, A-M•Do not initialize __m128, __m128d, and __m128i with literals; there is no way toexpress 128-bit constants.•Use __m128, __m128d, and __m128i objects in aggregates, such as unions (forexample, to access the float elements) and structures. The address of theseobjects may be taken.•Use __m128, __m128d, and __m128i data only with the intrinsics described inthis user’s guide. See Appendix C, “Intel® C/C++ Compiler Intrinsics andFunctional Equivalents,” in the Intel® 64 and IA-32 Architectures SoftwareDeveloper’s Manual, Volume 2B, for more information on using intrinsics.The compiler aligns __m128, __m128d, and __m128i local data to 16-byte boundaries on the stack. Global __m128 data is also aligned on 16-byte boundaries.
(Toalign float arrays, you can use the alignment declspec described in the followingsection.) Because the new instruction set treats the SIMD floating-point registers inthe same way whether you are using packed or scalar data, there is no __m32 datatype to represent scalar data as you might expect. For scalar operations, you shoulduse the __m128 objects and the “scalar” forms of the intrinsics; the compiler and theprocessor implement these operations with 32-bit memory references.The suffixes ps and ss are used to denote “packed single” and “scalar single” precision operations. The packed floats are represented in right-to-left order, with thelowest word (right-most) being used for scalar operations: [z, y, x, w]. To explainhow memory storage reflects this, consider the following example.The operation:float a[4] ← { 1.0, 2.0, 3.0, 4.0 };__m128 t ← _mm_load_ps(a);Produces the same result as follows:__m128 t ← _mm_set_ps(4.0, 3.0, 2.0, 1.0);In other words:t ← [ 4.0, 3.0, 2.0, 1.0 ]Where the “scalar” element is 1.0.Some intrinsics are “composites” because they require more than one instruction toimplement them.
You should be familiar with the hardware features provided by theSSE, SSE2, SSE3, and MMX technology when writing programs with the intrinsics.Keep the following important issues in mind:•Certain intrinsics, such as _mm_loadr_ps and _mm_cmpgt_ss, are not directlysupported by the instruction set. While these intrinsics are convenientprogramming aids, be mindful of their implementation cost.••Data loaded or stored as __m128 objects must generally be 16-byte-aligned.Some intrinsics require that their argument be immediates, that is, constantintegers (literals), due to the nature of the instruction.3-14 Vol. 2AINSTRUCTION SET REFERENCE, A-M•The result of arithmetic operations acting on two NaN (Not a Number) argumentsis undefined.
Therefore, floating-point operations using NaN arguments may notmatch the expected behavior of the corresponding assembly instructions.For a more detailed description of each intrinsic and additional information related toits usage, refer to Intel C/C++ compiler documentation. See:— http://www.intel.com/support/performancetools/— Appendix C, “Intel® C/C++ Compiler Intrinsics and Functional Equivalents,”in the Intel® 64 and IA-32 Architectures Software Developer’s Manual,Volume 2B, for more information on using intrinsics.3.1.1.10Flags Affected SectionThe “Flags Affected” section lists the flags in the EFLAGS register that are affected bythe instruction. When a flag is cleared, it is equal to 0; when it is set, it is equal to 1.The arithmetic and logical instructions usually assign values to the status flags in auniform manner (see Appendix A, “EFLAGS Cross-Reference,” in the Intel® 64 andIA-32 Architectures Software Developer’s Manual, Volume 1).