Volume 1 Basic Architecture (794100), страница 93
Текст из файла (страница 93)
#P - Inexact Result (Precision)InstructionConditionMasked ResponseADDPSADDPDADDSUBPSADDSUBPDHADDPSHADDPDSUBPSSUBPDHSUBPSHSUBPDMULPSMULPDDIVPSDIVPDSQRTPSSQRTPDCVTDQ2PSCVTPI2PSCVTPS2PICVTPS2DQCVTPD2PICVTPD2DQCVTPD2PSCVTTPS2PICVTTPD2PICVTTPD2DQCVTTPS2DQADDSSADDSDSUBSSSUBSDMULSSMULSDDIVSSDIVSDSQRTSSSQRTSDCVTSI2SSCVTSS2SICVTSD2SICVTSD2SSCVTTSS2SICVTTSD2SIThe result is notexactlyrepresentable inthe destinationformat.res = Result roundedto the destinationprecision and usingthe boundedexponent, but only ifno unmaskedunderflow oroverflow conditionsoccur (this exceptioncan occur in thepresence of amasked underflowor overflow); #PE =1.Unmasked Response and ExceptionCodeOnly if no underflow/overflowcondition occurred, or if thecorresponding exceptions are masked:• Set #OE if masked overflow and setresult as described above formasked overflow.• Set #UE if masked underflow andset result as described above formasked underflow.If neither underflow nor overflow, resequals the result rounded to thedestination precision and using thebounded exponent set #PE = 1.Vol.
1 E-21GUIDELINES FOR WRITING SIMD FLOATING-POINT EXCEPTION HANDLERSE.4.3Example SIMD Floating-Point Emulation ImplementationThe sample code listed below may be considered as being part of a user-levelfloating-point exception filter for the SSE/SSE2/SSE3 numeric instructions. It isassumed that the filter function is invoked by a low-level exception handler (reachedvia interrupt vector 19 when an unmasked floating-point exception occurs), and thatit operates as explained in Section E.4.1, “Floating-Point Emulation.” The samplecode does the emulation only for the SSE instructions for addition, subtraction, multiplication, and division. For this, it uses C code and x87 FPU operations.
Operationscorresponding to other SSE/SSE2/SSE3 numeric instructions can be emulated similarly. The example assumes that the emulation function receives a pointer to a datastructure specifying a number of input parameters: the operation that caused theexception, a set of sub-operands (unpacked, of type float), the rounding mode (theprecision is always single), exception masks (having the same relative bit positionsas in the MXCSR but starting from bit 0 in an unsigned integer), and flush-to-zeroand denormals-are-zeros indicators.The output parameters are a floating-point result (of type float), the cause of theexception (identified by constants not explicitly defined below), and the exceptionstatus flags.
The corresponding C definition is:typedef struct {unsigned int operation;//SSE or SSE2 operation: ADDPS, ADDSS, ...unsigned int operand1_uint32; //first operand valueunsigned int operand2_uint32; //second operand value (if any)float result_fval; // result value (if any)unsigned int rounding_mode; //rounding modeunsigned int exc_masks; //exception masks, in the order P,U,O,Z,D,Iunsigned int exception_cause; //exception causeunsigned int status_flag_inexact; //inexact status flagunsigned int status_flag_underflow; //underflow status flagunsigned int status_flag_overflow; //overflow status flagunsigned int status_flag_divide_by_zero;//divide by zero status flagunsigned int status_flag_denormal_operand;//denormal operand status flagunsigned int status_flag_invalid_operation;//invalid operation status flagunsigned int ftz; // flush-to-zero flagunsigned int daz; // denormals-are-zeros flag} EXC_ENV;The arithmetic operations exemplified are emulated as follows:E-22 Vol.
1GUIDELINES FOR WRITING SIMD FLOATING-POINT EXCEPTION HANDLERS1. If the denormals-are-zeros mode is enabled (the DAZ bit in MXCSR is set to 1),replace all the denormal inputs with zeroes of the same sign (the denormal flag isnot affected by this change).2. Perform the operation using x87 FPU instructions, with exceptions disabled, theoriginal user rounding mode, and single precision. This reveals invalid, denormal,or divide-by-zero exceptions (if there are any) and stores the result in memory asa double precision value (whose exponent range is large enough to look like“unbounded” to the result of the single precision computation).3. If no unmasked exceptions were detected, determine if the result is less than thesmallest normal number (tiny) that can be represented in single precisionformat, or greater than the largest normal number that can be represented insingle precision format (huge).
If an unmasked overflow or underflow occurs,calculate the scaled result that will be handed to the user exception handler, asspecified by IEEE Standard 754.4. If no exception was raised, calculate the result with a “bounded” exponent. If theresult is tiny, it requires denormalization (shifting the significand right whileincrementing the exponent to bring it into the admissible range of [-126,+127]for single precision floating-point numbers).The result obtained in step 2 cannot be used because it might incur a doublerounding error (it was rounded to 24 bits in step 2, and might have to be roundedagain in the denormalization process). To overcome this is, calculate the result asa double precision value, and store it to memory in single precision format.Rounding first to 53 bits in the significand, and then to 24 never causes a doublerounding error (exact properties exist that state when double-rounding erroroccurs, but for the elementary arithmetic operations, the rule of thumb is that ifan infinitely precise result is rounded to 2p+1 bits and then again to p bits, theresult is the same as when rounding directly to p bits, which means that nodouble-rounding error occurs).5.
If the result is inexact and the inexact exceptions are unmasked, the calculatedresult will be delivered to the user floating-point exception handler.6. The flush-to-zero case is dealt with if the result is tiny.7. The emulation function returns RAISE_EXCEPTION to the filter function if anexception has to be raised (the exception_cause field indicates the cause).Otherwise, the emulation function returns DO_NOT_ RAISE_EXCEPTION. In thefirst case, the result is provided by the user exception handler called by the filterfunction.
In the second case, it is provided by the emulation function. The filterfunction has to collect all the partial results, and to assemble the scalar or packedresult that is used if execution is to continue.Vol. 1 E-23GUIDELINES FOR WRITING SIMD FLOATING-POINT EXCEPTION HANDLERSExample E-2. SIMD Floating-Point Emulation// masks for individual status word bits#define PRECISION_MASK 0x20#define UNDERFLOW_MASK 0x10#define OVERFLOW_MASK 0x08#define ZERODIVIDE_MASK 0x04#define DENORMAL_MASK 0x02#define INVALID_MASK 0x01// 32-bit constantsstatic unsigned ZEROF_ARRAY[] = {0x00000000};#define ZEROF *(float *) ZEROF_ARRAY// +0.0static unsigned NZEROF_ARRAY[] = {0x80000000};#define NZEROF *(float *) NZEROF_ARRAY// -0.0static unsigned POSINFF_ARRAY[] = {0x7f800000};#define POSINFF *(float *)POSINFF_ARRAY// +Infstatic unsigned NEGINFF_ARRAY[] = {0xff800000};#define NEGINFF *(float *)NEGINFF_ARRAY// -Inf// 64-bit constantsstatic unsigned MIN_SINGLE_NORMAL_ARRAY [] = {0x00000000, 0x38100000};#define MIN_SINGLE_NORMAL *(double *)MIN_SINGLE_NORMAL_ARRAY// +1.0 * 2^-126static unsigned MAX_SINGLE_NORMAL_ARRAY [] = {0x70000000, 0x47efffff};#define MAX_SINGLE_NORMAL *(double *)MAX_SINGLE_NORMAL_ARRAY// +1.1...1*2^127static unsigned TWO_TO_192_ARRAY[] = {0x00000000, 0x4bf00000};#define TWO_TO_192 *(double *)TWO_TO_192_ARRAY// +1.0 * 2^192static unsigned TWO_TO_M192_ARRAY[] = {0x00000000, 0x33f00000};#define TWO_TO_M192 *(double *)TWO_TO_M192_ARRAY// +1.0 * 2^-192// auxiliary functionsstatic int isnanf (unsigned int ); // returns 1 if f is a NaN, and 0 otherwisestatic float quietf (unsigned int ); // converts a signaling NaN to a quiet// NaN, and leaves a quiet NaN unchangedstatic unsigned int check_for_daz (unsigned int ); // converts denormals// to zeros of the same sign;// does not affect any status flags// emulation of SSE and SSE2 instructions using// C code and x87 FPU instructionsunsigned intsimd_fp_emulate (EXC_ENV *exc_env){E-24 Vol.
1GUIDELINES FOR WRITING SIMD FLOATING-POINT EXCEPTION HANDLERSint uiopd1; // first operand of the add, subtract, multiply, or divideint uiopd2; // second operand of the add, subtract, multiply, or dividefloat res; // result of the add, subtract, multiply, or dividedouble dbl_res24; // result with 24-bit significand, but "unbounded" exponent// (needed to check tininess, to provide a scaled result to// an underflow/overflow trap handler, and in flush-to-zero mode)double dbl_res; // result in double precision format (needed to avoid a// double rounding error when denormalizing)unsigned int result_tiny;unsigned int result_huge;unsigned short int sw; // 16 bitsunsigned short int cw; // 16 bits// have to check first for faults (V, D, Z), and then for traps (O, U, I)// initialize x87 FPU (floating-point exceptions are masked)_asm {fninit;}result_tiny = 0;result_huge = 0;switch (exc_env->operation) {casecasecasecasecasecasecasecaseADDPS:ADDSS:SUBPS:SUBSS:MULPS:MULSS:DIVPS:DIVSS:uiopd1 = exc_env->operand1_uint32; // copy as unsigned int// do not copy as float to avoid conversion// of SNaN to QNaN by compiled codeuiopd2 = exc_env->operand2_uint32;// do not copy as float to avoid conversion of SNaN// to QNaN by compiled codeuiopd1 = check_for_daz (uiopd1); // operand1 = +0.0 * operand1 if it is// denormal and DAZ=1uiopd2 = check_for_daz (uiopd2); // operand2 = +0.0 * operand2 if it is// denormal and DAZ=1// execute the operation and check whether the invalid, denormal, or// divide by zero flags are set and the respective exceptions enabled// set control word with rounding mode set to exc_env->rounding_mode,// single precision, and all exceptions disabledVol.