Volume 1 Application Programming (794095), страница 25
Текст из файла (страница 25)
The DS segment is the default segment for most memory operands. Manyinstructions allow this default data segment to be overridden using one of the six segment-overrideprefixes shown in Table 3-7 on page 72. Data-segment overrides will be ignored when accessing datain the following cases:••When a stack reference is made that pushes data onto or pops data off of the stack.
In those cases,the SS segment is always used.When the destination of a string is memory it is always referenced using the ES segment.Instruction fetches from the CS segment cannot be overridden. However, the CS segment-overrideprefix can be used to access instructions as data objects and to access data stored in the code segment.For further details on these prefixes, see “Segment-Override Prefixes” in Volume 3.Lock Prefix. The LOCK prefix causes certain read-modify-write instructions that access memory tooccur atomically. The mechanism for doing so is implementation-dependent (for example, themechanism may involve locking of data-cache lines that contain copies of the referenced memoryoperands, and/or bus signaling or packet-messaging on the bus).
The prefix is intended to give theprocessor exclusive use of shared memory operands in a multiprocessor system.The prefix can only be used with forms of the following instructions that write a memory operand:ADC, ADD, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, DEC, INC, NEG, NOT, OR, SBB,SUB, XADD, XCHG, and XOR. An invalid-opcode exception occurs if LOCK is used with any otherinstruction.For further details on these prefixes, see “Lock Prefix” in Volume 3.Repeat Prefixes. There are two repeat prefixes byte codes, F3h and F2h. Byte code F3h is the moregeneral and is usually treated as two distinct instructions by assemblers.
Byte code F2h is only usedwith CMPSx and SCASx instructions:•••REP (F3h)—This more generalized repeat prefix repeats its associated string instruction thenumber of times specified in the counter register (rCX). Repetition stops when the value in rCXreaches 0. This prefix is used with the INS, LODS, MOVS, OUTS, and STOS instructions.REPE or REPZ (F3h)—This version of REP prefix repeats its associated string instruction thenumber of times specified in the counter register (rCX). Repetition stops when the value in rCXreaches 0 or when the zero flag (ZF) is cleared to 0.
The prefix can only be used with the CMPSxand SCASx instructions.REPNE or REPNZ (F2h)—The REPNE or REPNZ prefix repeats its associated string instructionthe number of times specified in the counter register (rCX). Repetition stops when the value in rCXreaches 0 or when the zero flag (ZF) is set to 1. The prefix can only be used with the CMPSx andSCASx instructions.The size of the rCX counter is determined by the effective address size. For further details about theseprefixes, including optimization of their use, see “Repeat Prefixes” in Volume 3.General-Purpose Programming73AMD64 Technology24592—Rev. 3.13—July 20073.5.2 REX PrefixesREX prefixes are a new group of instruction-prefix bytes that can be used only in 64-bit mode.
Theyenable the 64-bit register extensions. REX prefixes specify the following features:••••Use of an extended GPR register, shown in Figure 3-3 on page 27.Use of an extended XMM register, shown in Figure 4-12 on page 117.Use of a 64-bit (quadword) operand size, as described in “Operands” on page 36.Use of extended control and debug registers, as described in Volume 2.REX prefix bytes have a value in the range 40h to 4Fh, depending on the particular combination ofregister extensions desired. With few exceptions, a REX prefix is required to access a 64-bit GPR orone of the extended GPR or XMM registers.
A few instructions (described in “General-PurposeInstructions in 64-Bit Mode” in Volume 3) default to 64-bit operand size and do not need the REXprefix to access an extended 64-bit GPR.An instruction can have only one REX prefix, and one such prefix is all that is needed to express thefull selection of 64-bit-mode register-extension features.
The prefix, if used, must immediatelyprecede the first opcode byte of an instruction. Any other placement of a REX prefix is ignored. Thelegacy instruction-size limit of 15 bytes still applies to instructions that contain a REX prefix.For further details on the REX prefixes, see “REX Prefixes” in Volume 3.3.6Feature DetectionThe CPUID instruction provides information about the processor implementation and its capabilities.Software operating at any privilege level can execute the CPUID instruction to collect this information.After the information is collected, software can select procedures that optimize performance for aparticular hardware implementation.
For example, application software can determine whether theAMD64 architecture’s long mode is supported by the processor, and it can determine the processorimplementation’s performance capabilities.Support for the CPUID instruction is implementation-dependent, as determined by software’s abilityto write the RFLAGS.ID bit. The following code sample shows how to test for the presence of theCPUID instruction.pushfdpopmovxorpushpopfdpushfdpopcmpjz74eaxebx, eaxeax, 00200000heaxeaxeax, ebxNO_CPUID;;;;;;;;;;save EFLAGSstore EFLAGS in EAXsave in EBX for later testingtoggle bit 21push to stacksave changed EAX to EFLAGSpush EFLAGS to TOSstore EFLAGS in EAXsee if bit 21 has changedif no change, no CPUIDGeneral-Purpose Programming24592—Rev.
3.13—July 2007AMD64 TechnologyAfter software has determined that the processor implementation supports the CPUID instruction,software can test for support of specific features by loading a function code (value) into the EAXregister and executing the CPUID instruction. Processor feature information is returned in the EAX,EBX, ECX, and EDX registers, as described fully in “CPUID” in Volume 3.The architecture supports CPUID information about standard functions and extended functions. Ingeneral, standard functions include the earliest features offered in the x86 architecture.
Extendedfunctions include newer features of the x86 and AMD64 architectures, such as SSE, SSE2, SSE3, and3DNow! instructions, and long mode.Standard functions are accessed by loading EAX with the value 0 (standard-function 0) or 1 (standardfunction 1) and executing the CPUID instruction. All software using the CPUID instruction mustexecute standard-function 0, which identifies the processor vendor and the largest standard-functioninput value supported by the processor implementation.
The CPUID standard-function 1 returns theprocessor version and standard-feature bits.Software can test for support of extended functions by first executing the CPUID instruction with thevalue 8000_0000h in EAX. The processor returns, in EAX, the largest extended-function input valuedefined for the CPUID instruction on the processor implementation. If the value in EAX is greater than8000_0000h, extended functions are supported, although specific extended functions must be testedindividually.The following code sample shows how to test for support of any extended functions:moveax, 80000000hCPUIDcmpeax, 80000000hjbeNO_EXTENDEDMSR;;;;query for extended functionsget extended function limitis EAX greater than 80000000?no extended-feature supportIf extended functions are supported, software can test for support of specific extended features.
Forexample, software can determine whether the processor implementation supports long mode byexecuting the CPUID instruction with 8000_0001h in the EAX register, then testing to see if bit 29 inthe EDX register is set to 1. The following code sample shows how to test for long-mode support.moveax, 80000001hCPUIDtest edx, 20000000hjnzYES_Long_Mode;;;;query for function 8000_0001hget feature bits in EDXtest bit 29 in EDXlong mode is supportedWith a few exceptions, general-purpose instructions are supported in all hardware implementations ofthe AMD64 architecture, Exceptional instructions are implemented only if their associated CPUIDfunction bit is set. The implementation of certain media instructions (such as FXSAVE andFXRSTOR) and system instructions (such as RDMSR and WRMSR) is also indicated by CPUIDfunction bits.
See “CPUID” in the AMD64 Architecture Programmer’s Manual Volume 3: GeneralPurpose and System Instructions, order# 24594, and the AMD CPUID Specification, order# 25481, fora full description of the CPUID instruction, all CPUID standard and extended functions, and theproper interpretation of returned values.General-Purpose Programming75AMD64 Technology24592—Rev. 3.13—July 20073.6.1 Feature Detection in a Virtualized EnvironmentSoftware writers must assume that their software may be executed as a guest in a virtualizedenvironment. (A virtualized guest may be migrated between processors of differing capabilities, so theCPUID indication of a feature's presence must be respected. Operating systems, user programs andlibraries must all ensure that the CPUID instruction indicates a feature is present before using thatfeature.
The hypervisor is responsible for ensuring consistent CPUID values across the system.For example, an OS, program, or library typically detects a feature during initialization and thenconfigures code paths or internal copies of feature indications based on the detection of that feature,with the feature detection occurring once per initialization. In this case, the feature must be detected byuse of the CPUID instruction rather than by ignoring CPUID and testing for the presence of thatfeature.To ensure guest migration between processors across multiple generations of processors, whileallowing for features to be deprecated in future generations of processors, it is imperative that softwarecheck the CPUID bit once per program or library initialization before using instructions that areindicated by a CPUID bit; otherwise inconsistent behavior may result.3.7Control Transfers3.7.1 OverviewFrom the application-program’s viewpoint, program-control flow is sequential—that is, instructionsare addressed and executed sequentially—except when a branch instruction (a call, return, jump,interrupt, or return from interrupt) is encountered, in which case program flow changes to the branchinstruction’s target address.