Morgan - Numerical Methods (523161), страница 12
Текст из файла (страница 12)
Dependingon the size of the operands, it’s almost always faster than any of the preceedingtechniques and can be extended to handle operands of virtually any size. There areexceptions, however; for example, the MUL instruction on the 8086 was terriblyslow, making it a draw in certain situations.
The 80286 was faster in both cycle timeand clock speed, and the 80386 was even faster; nevertheless, many examples showthat multiplication using the shift and add technique is highly competitive. This isalmost never true of multiprecision multiplication, although the double precisionshift available on the 80386 and up may be an exception.In earlier examples involving multiplication, we saw numbers represented asbinary polynomials in which each position contained either a zero or a one times base2 taken to a certain power. To perform that multiplication, we multiplied each bit ofthe muliplicand by each bit of the multiplier, and summed the subproducts accordingto their power to form the product (see the section Binary Multiplication).
Workingwith larger numbers is much the same except that the polynomials generally showthe operands broken into bytes or words. For example, suppose we needed tomultiply two 24-bit quantities, such as 123456H and 654321H. We would want torestate these numbers in terms of a new base, that of our hardware multiply. In thiscase, we’re using an 8086 with a 16-bit multiply, so our base is 216 (10000H).
First,123456H becomes three single-byte quantities:12x10000H1 + 3456x10000H0and 654321H becomes:65x10000H1 + 4321x10000H0To better understand this process, let’s relabel each byte. The quantity 123456Hcan be seen as the sum of 120000H + 3456H, which becomes a + b. The quantity61NUMERICAL METHODS654321H becomes 650000H + 4321H, which then becomes d + e. Now, multiply:d+ea+bbebdaeadWith the original numbers, that calculation is:650000H + 43218*120000H + 345680db94116H14a5ee0000H4b8520000H71a00000000H7336bf94116HThe direction the multiply takes is not significant; that is, the most significantwords could have been multiplied first because the final additions align the results.This technique can be extended as far as needed to produce a result.
It’s also fast,requiring only a few multiplies and divides.In mul32, we multiply two doubleword numbers and arrive at a quadword result.mul32: Algorithm1.Use DI to hold the address of the result, a quadword.2.Move the most significant word of the multiplicand into AX and multiplyby the most significant word of the multiplier. The product of thismultiplication is written to result.3.The most significant word of the multiplicand is returned to AX andmultiplied by the least significant word of the multiplier. The leastsignificant word of this product is MOVed to the second word of result,the most significant word of the product is ADDed to the third word ofresult, and any carry is propagated to the most significant word byadding-with-carry a zero.4.The least significant word of the multiplicand is moved to AX andmultiplied by the most significant word of the multiplier.
The product62INTEGERSis added to the second word of result and added-with-carry to the thirdword of result, with any carry propagated into the most significant word.5. Finally, the least significant word of the multiplicand is moved intoAX and multiplied by the least significant word of the multiplier. Theleast significant word of this product is moved to the least significantword of result, the most significant word of the product is added to thesecond word of result, and any carry is propagated into the third andthen the most significant word of result.mu132: Listing;*****;mu132 - Multiplies two unsigned fixed-point values.
The;arguments and a pointer to the result are passed on the stack.mu132 proc uses dx di,smultiplicand:dword, smultiplier:dword, result:wordmovdi, word ptr result;small model pointer is nearax, word ptr smultiplicand[2];multiply multiplicand highmovmulword ptr smultiplier[2];word by multiplier high wordmovword ptr [di][4], axword ptr [di] [6], dxmovmovax, word ptr smultiplicand[2];multiply multiplicand highword ptr smultiplier[0]mul;word by multiplier low wordmovword ptr [di][2], axaddword ptr [di][4], dxadc;add any remnant carryword ptr [di][6], 0movax, word ptr smultiplicand[0];multiply multiplicand lowmulword ptr smultiplier[2];word by multiplier high wordaddword ptr [di][2], axword ptr [di][4], dxadcadcword ptr [di][6], 0;add any remnant carrymovax, word ptr smultiplicand[0];multiply multiplicand lowmulword ptr smultiplier[0];word by multiplier low wordmovword ptr [di] [0], axaddword ptr [di][2], dxadc;add any remnant carryword ptr [di][4], 0adcword ptr [di][6], 0retmu132 endpFor additional examples of this technique, see the FXMATH.ASM module.63NUMERICAL METHODSBinary DivisionError CheckingDivision requires more error checking than any of the other basic arithmeticoperations.
Depending on whether you’re using the hardware division instructionsor a brew of your own, you’ll need to know if a mistake has been made. The primarydifference between using a hardware instruction and using your own solution is thatan error made during the execution of a hardware instruction can blow up a programquite unaesthetically by invoking an exception or trap.Three basic errors can occur during division: overflow, division of zero, and anattempt to divide by zero.You can avoid overflow by checking the dividend and divisor to see whethertheir quotient will fit in the space provided, or by always breaking the dividend intocoefficients of the same size data type as the dividend.
An overflow (or underflow)can happen quite easily when the dividend is very large and the divisor is small. Ifyou’re using a software algorithm to perform the divide, you may find that you losepart of your data. If you’re using a hardware instruction, a hardware exception willbe invoked. On the 8086, the largest dividend allowed is 32-bits, the largest divisoris 16 with a 16-bit quotient. In this case, dividing 12345678H by 01DEH results ina quotient of 9bfe9H and a hardware exception (the result too large for the 16 bits the8086 allows).If you think such an overflow could occur in your code, it might be wise toinclude a test before the divide to ascertain how much storage the quotient willrequire and, therefore, which form of the divide to use.
The largest dividend a divisorcan divide and store is equal to the size of the data type multiplied by the divisor. Bycomparing the number obtained from such a multiplication with an arbitrarydividend, you can determine whether the result of that operation will fit in the datatype specified.With binary numbers, this is easy. The largest quotient an 8086 can producewithout overflow is 16 bits, which amounts to a left shift of the divisor of 16 bits ora multiplication of 10000H. If the value obtained is greater than or equal to thedividend, the result of the division will fit; if not, it won’t. In other words, if you’redividing a 32-bit quantity by a 16-bit quantity, simply comparing the divisor with the64INTEGERSupper word of the dividend (dividend/l0000H) will tell you whether the quotient willfit in 16 bits or not.
If the upper 16 bits of the dividend are greater than your divisor,the operation will overflow. This test can be extended to 16-bit dividends and eightbit divisors.Suppose we wish to divide 12345678H by 1deH. Since this divisor is larger thanone byte, we must use 16-bit division. The 1deH need not be multiplied by 10000Hor shifted; we only need to compare the upper word of the dividend and the divisorto see which is greater.movmovmovcmpjadx, dvdnd[2]ax, dvdnd[0]cx, dvsrdx, cxnot_big_enuf;1234H;5678H;1DEH;compare;the quotient won't fitdiv32:Depending on the circumstances, the best method may be to begin anymultiprecision divide by clearing DX and loading AX with the most significantword.
An overflow is impossible with this technique as long as you have a divisor,since 1H multiplied by 10000H is greater than any one-word dividend.The other two errors, division of zero and an attempt to divide by zero, are easilydetected in the beginning of the routine. If either condition is true, the program canbranch to a predetermined error routine and return.Finally, two conditions are worth checking if your arithmetic gets very big:Are the divisor and dividend equal?Is the divisor greater than the dividend?If the two are equal, return a one; if the divisor is greater, return a zero with thedividend in the remainder.Examples of this kind of checking can be found in the FXMATH.ASM moduleand later in this chapter in the section Hardware Division.llSoftware DivisionThe classic multiplication algorithm is based on the idea of multiplication asiterative addition, so you shouldn’t be surprised to learn that the method for division65NUMERICAL METHODSFigure 2-1.
Division using shift and subtract.66INTEGERSis based on shift and subtract. This procedure isn’t fast, but it’s friendly.The procedure involves shifting the dividend left into a variable, the remainder,and comparing this remainder with the divisor. If the remainder is equal to or largerthan the divisor, the divisor is subtracted from the remainder and a one is left-shiftedinto a variable, called the quotient. This continues until the requisite number of bitshave been shifted.
No early out is available here; the number of shifts necessarydepends on the size of the operands.The following variables will be used for the division algorithms: dvsr, dvdnd,qtnt, cntr, and rmndr. Note that during execution of the algorithm the quotient,dividend, and remainder share memory locations (Figure 2- 1). Shifting the dividendinto the remainder leaves the lower bits free to become the quotient. At the end of theroutine the dividend is gone, leaving only the quotient and the remainder. For theprogrammer, this means fewer shifts, some increase in speed, and a slightly smallerroutine.