Volume 1 Basic Architecture (794100), страница 77
Текст из файла (страница 77)
The shuffle mask is unaffected. If the most significant bit(bit 7) of a shuffle control byte is set, the constant zero is written in the resultbyte.12.6.6Packed SignThere are six packed-sign instructions (represented by three mnemonics). Threeoperate on 128-bit operands and three operate on 64-bit operands. The widths ofeach data element for these instructions are 8 bit, 16 bit or 32 bit signed integers.•PSIGNB/W/D negates each signed integer element of the destination operand ifthe sign of the corresponding data element in the source operand is less thanzero.12.6.7Packed Align RightThere are two packed-align-right instructions (represented by one mnemonic).
Oneoperates on 128-bit operands and the other operates on 64-bit operands. Theseinstructions concatenate the destination and source operand into a composite, andextract the result from the composite according to an immediate constant.•PALIGNR’s source operand is appended after the destination operand forming anintermediate value of twice the width of an operand.
The result is extracted fromthe intermediate value into the destination operand by selecting the 128-bit or64-bit value that are right-aligned to the byte offset specified by the immediatevalue.12.7WRITING APPLICATIONS WITH SSSE3 EXTENSIONSThe following sections give guidelines for writing application programs and operating-system code that use SSSE3 instructions.12.7.1Guidelines for Using SSSE3 ExtensionsThe following guidelines describe how to maximize the benefits of using SSSE3extensions:12-12 Vol.
1PROGRAMMING WITH SSE3 AND SUPPLEMENTAL SSE3••Check that the processor supports SSSE3 extensions.•Employ the optimization and scheduling techniques described in the Intel® 64and IA-32 Architectures Optimization Reference Manual (see Section 1.4,“Related Literature”).Ensure that your operating system supports SSE/SSE2/SSE3/SSSE3 extensions.(Operating system support for the SSE extensions implies sufficient support forSSE2, SSE3, and SSSE3.)12.7.2Checking for SSSE3 SupportBefore an application attempts to use the SIMD subset of SSSE3 extensions, theapplication should follow the steps illustrated in Section 11.6.2, “Checking forSSE/SSE2 Support.” Next, use the additional step provided below:•Check that the processor supports SSSE3 (if CPUID.01H:ECX.SSSE3[bit 9] = 1).12.8SSE3/SSSE3 EXCEPTIONSSSE3/SSSE3 instructions can generate the same type of memory-access and nonnumeric exceptions as other Intel 64 or IA-32 instructions.
Existing exceptionhandlers generally handle these exceptions without code modification.FISTTP can generate floating-point exceptions. Some SSE3 instructions can alsogenerate SIMD floating-point exceptions.SSE3 additions and changes are noted in the following sections. See also: Section11.5, “SSE, SSE2, and SSE3 Exceptions”.12.8.1Device Not Available (DNA) ExceptionsSSE3/SSSE3 will cause a DNA Exception (#NM) if the processor attempts to executean SSE3 instruction while CR0.TS[bit 3] = 1.
If CPUID.01H:ECX.SSE3[bit 0] = 0,execution of an SSE3 extension will cause an invalid opcode fault regardless of thestate of CR0.TS[bit 3].12.8.2Numeric Error flag and IGNNE#Most SSE3 instructions ignore CR0.NE[bit 5] (treats it as if it were always set) andthe IGNNE# pin. With one exception, all use the vector 19 software exception forerror reporting.
The exception is FISTTP; it behaves like other x87-FP instructions.SSSE3 instructions ignore CR0.NE[bit 5] (treats it as if it were always set) and theIGNNE# pin. SSSE3 instructions do not cause floating-point errors.Vol. 1 12-13PROGRAMMING WITH SSE3 AND SUPPLEMENTAL SSE312.8.3EmulationUsed to emulate x87 floating-point instructions, CR0.EM[bit 2] cannot be used foremulation of SSE3/SSSE3. If an SSE3/SSSE3 instruction executes with CR0.EM[bit 2]set, an invalid opcode exception (INT 6) is generated instead of a device not availableexception (INT 7).12-14 Vol.
1CHAPTER 13INPUT/OUTPUTIn addition to transferring data to and from external memory, IA-32 processors canalso transfer data to and from input/output ports (I/O ports). I/O ports are created insystem hardware by circuity that decodes the control, data, and address pins on theprocessor. These I/O ports are then configured to communicate with peripheraldevices. An I/O port can be an input port, an output port, or a bidirectional port.Some I/O ports are used for transmitting data, such as to and from the transmit andreceive registers, respectively, of a serial interface device. Other I/O ports are usedto control peripheral devices, such as the control registers of a disk controller.This chapter describes the processor’s I/O architecture.
The topics discussed include:•••I/O port addressingI/O instructionsI/O protection mechanism13.1I/O PORT ADDRESSINGThe processor permits applications to access I/O ports in either of two ways:••Through a separate I/O address spaceThrough memory-mapped I/OAccessing I/O ports through the I/O address space is handled through a set of I/Oinstructions and a special I/O protection mechanism. Accessing I/O ports throughmemory-mapped I/O is handled with the processors general-purpose move andstring instructions, with protection provided through segmentation or paging.
I/Oports can be mapped so that they appear in the I/O address space or the physicalmemory address space (memory mapped I/O) or both.One benefit of using the I/O address space is that writes to I/O ports are guaranteedto be completed before the next instruction in the instruction stream is executed.Thus, I/O writes to control system hardware cause the hardware to be set to its newstate before any other instructions are executed. See Section 13.6, “Ordering I/O,”for more information on serializing of I/O operations.13.2I/O PORT HARDWAREFrom a hardware point of view, I/O addressing is handled through the processor’saddress lines. For the P6 family, Pentium 4, and Intel Xeon processors, the requestcommand lines signal whether the address lines are being driven with a memoryaddress or an I/O address; for Pentium processors and earlier IA-32 processors, theVol.
1 13-1INPUT/OUTPUTM/IO# pin indicates a memory address (1) or an I/O address (0). When the separateI/O address space is selected, it is the responsibility of the hardware to decode thememory-I/O bus transaction to select I/O ports rather than memory. Data is transmitted between the processor and an I/O device through the data lines.13.3I/O ADDRESS SPACEThe processor’s I/O address space is separate and distinct from the physical-memoryaddress space. The I/O address space consists of 216 (64K) individually addressable8-bit I/O ports, numbered 0 through FFFFH.
I/O port addresses 0F8H through 0FFHare reserved. Do not assign I/O ports to these addresses. The result of an attempt toaddress beyond the I/O address space limit of FFFFH is implementation-specific; seethe Developer’s Manuals for specific processors for more details.Any two consecutive 8-bit ports can be treated as a 16-bit port, and any four consecutive ports can be a 32-bit port. In this manner, the processor can transfer 8, 16, or32 bits to or from a device in the I/O address space.
Like words in memory, 16-bitports should be aligned to even addresses (0, 2, 4, ...) so that all 16 bits can betransferred in a single bus cycle. Likewise, 32-bit ports should be aligned toaddresses that are multiples of four (0, 4, 8, ...). The processor supports data transfers to unaligned ports, but there is a performance penalty because one or moreextra bus cycle must be used.The exact order of bus cycles used to access unaligned ports is undefined and is notguaranteed to remain the same in future IA-32 processors. If hardware or softwarerequires that I/O ports be written to in a particular order, that order must be specifiedexplicitly. For example, to load a word-length I/O port at address 2H and thenanother word port at 4H, two word-length writes must be used, rather than a singledoubleword write at 2H.Note that the processor does not mask parity errors for bus cycles to the I/O addressspace.
Accessing I/O ports through the I/O address space is thus a possible source ofparity errors.13.3.1Memory-Mapped I/OI/O devices that respond like memory components can be accessed through theprocessor’s physical-memory address space (see Figure 13-1). When using memorymapped I/O, any of the processor’s instructions that reference memory can be usedto access an I/O port located at a physical-memory address. For example, the MOVinstruction can transfer data between any register and a memory-mapped I/O port.The AND, OR, and TEST instructions may be used to manipulate bits in the controland status registers of a memory-mapped peripheral devices.When using memory-mapped I/O, caching of the address space mapped for I/Ooperations must be prevented.
With the Pentium 4, Intel Xeon, and P6 family processors, caching of I/O accesses can be prevented by using memory type range regis-13-2 Vol. 1INPUT/OUTPUTters (MTRRs) to map the address space used for the memory-mapped I/O asuncacheable (UC). See Chapter 10, “Memory Cache Control,” in the Intel® 64 andIA-32 Architectures Software Developer’s Manual, Volume 3A, for a complete discussion of the MTRRs.The Pentium and Intel486 processors do not support MTRRs. Instead, they providethe KEN# pin, which when held inactive (high) prevents caching of all addresses sentout on the system bus. To use this pin, external address decoding logic is required toblock caching in specific address spaces.Physical MemoryFFFFEPROMI/O PortI/O PortI/O PortRAM0Figure 13-1. Memory-Mapped I/OAll the IA-32 processors that have on-chip caches also provide the PCD (page-levelcache disable) flag in page table and page directory entries.