Decimal Arithmetic Specification, version 1.08
Copyright (c) IBM Corporation, 2003. All rights reserved. ©
8 Jan 2003
[previous | contents | next]

Arithmetic operations

This section describes the arithmetic operations on numbers, including subnormal numbers, negative zeros, and special values (see also IEEE 854 §6).

Arithmetic operation notation

In this section, a simplified notation is used to illustrate arithmetic operations: a number is shown as the string that would result from using the to-scientific-string operation. Single quotes are used to indicate that a number converted from an abstract representation is implied.

Also, operations are indicated as functions (taking either one or two operands), and the sequence ==> means ‘results in’. Hence:

  add('12', '7.00') ==> '19.00'
means that the result of the add operation with the operands [0,12,0] and [0,700,-2] is [0,1900,-2].

Finally, in this example and in the examples below, the context is assumed to have precision set to 9, rounding set to round-half-up, and all trap-enablers set to 0.

Arithmetic operation rules

The following general rules apply to all arithmetic operations.

Examples involving special values:

  add('Infinity', '1')        ==>  'Infinity'
  add('NaN', '1')             ==>  'NaN'
  subtract('1', 'Infinity')   ==>  '-Infinity'
  multiply('-1', 'Infinity')  ==>  '-Infinity'
  subtract('-0', '0')         ==>  '-0'
  multiply('-1', '0')         ==>  '-0'
  divide('-1', 'Infinity')    ==>  '-0'
  divide('1', '0')            ==>  'Infinity'
  divide('1', '-0')           ==>  '-Infinity'
  divide('-1', '0')           ==>  '-Infinity'
Notes:
  1. Operands may have more than precision digits and are not rounded before use.
  2. Quiet NaNs are permitted to propagate diagnostic information pertaining to the origin of the NaN (see IEEE 854 §6.2). Any such diagnostic information, and the means by which it is propagated, is outside the scope of this specification.
  3. The rules above imply that the compare operation can return a quiet NaN as a result, which indicates an ‘unordered’ comparison (see IEEE 854 §5.7).
  4. An implementation may use the compare operation ‘under the covers’ to implement a closed set of comparison operations (greater than, equal, etc.) if desired. In this case, the additional constraints detailed in IEEE 854 §5.7 will apply; that is, a comparison (such a ‘greater than’) which does not explicitly allow for an ‘unordered’ result yet would require an unordered result will give rise to an Invalid operation condition.
  5. If a result is rounded, remains finite, and is not subnormal, its coefficient will have exactly precision digits (except after the rescale, round-to-integer, or square-root operations, as described below). That is, only unrounded or subnormal coefficients can have fewer than precision digits.
  6. Trailing zeros are not removed after operations. That is, results are unnormalized.


abs

abs takes one operand. If the operand is negative, the result is the same as using the minus operation on the operand. Otherwise, the result is the same as using the plus operation on the operand.

Examples:

  abs('2.1')    ==>  '2.1'
  abs('-100')   ==>  '100'
  abs('101.5')  ==>  '101.5'
  abs('-101.5') ==>  '101.5'


add and subtract

add and subtract both take two operands. If either operand is a special value then the general rules apply.

Otherwise, the operands are added (after inverting the sign used for the second operand if the operation is a subtraction), as follows:

The result is then rounded to to precision digits if necessary, counting from the most significant digit of the result.

Examples:

  add('12', '7.00')        ==>  '19.00'
  add('1E+2', '1E+4')      ==>  '1.01E+4'
  subtract('1.3', '1.07')  ==>  '0.23'
  subtract('1.3', '1.30')  ==>  '0.00'
  subtract('1.3', '2.07')  ==>  '-0.77'


compare

compare takes two operands and compares their values numerically. If either operand is a special value then the general rules apply.

Otherwise, the operands are compared as follows.

If the signs of the operands differ, a value representing each operand ('-1' if the operand is less than zero, '0' if the operand is zero or negative zero, or '1' if the operand is greater than zero) is used in place of that operand for the comparison instead of the actual operand.[2] 

The comparison is then effected by subtracting the second operand from the first and then returning a value according to the result of the subtraction: '-1' if the result is less than zero, '0' if the result is zero or negative zero, or '1' if the result is greater than zero.

An implementation may use this operation ‘under the covers’ to implement a closed set of comparison operations (greater than, equal, etc.) if desired. It need not, in this case, expose the compare operation itself.

Examples:

  compare('2.1', '3')     ==>  '-1'
  compare('2.1', '2.1')   ==>  '0'
  compare('2.1', '2.10')  ==>  '0'
  compare('3', '2.1')     ==>  '1'
  compare('2.1', '-3')    ==>  '1'
  compare('-3', '2.1')    ==>  '-1'
Note that the result of a compare is always exact and unrounded.


divide

divide takes two operands. If either operand is a special value then the general rules apply.

Otherwise, if the divisor is zero then either the Division undefined condition is raised (if the dividend is zero) and the result is NaN, or the Division by zero condition is raised and the result is an Infinity with a sign which is the exclusive or of the signs of the operands.

Otherwise, a ‘long division’ is effected, with the division being complete when either precision digits have been accumulated or the remainder from a subtraction in the division is zero, as follows:

The result is then rounded to precision digits, if necessary, according to the rounding algorithm and taking into account the remainder from the division.

Examples:

  divide('1', '3'  )      ==>  '0.333333333'
  divide('2', '3'  )      ==>  '0.666666667'
  divide('5', '2'  )      ==>  '2.5'
  divide('1', '10' )      ==>  '0.1'
  divide('12', '12')      ==>  '1'
  divide('8.00', '2')     ==>  '4.00'
  divide('2.400', '2.0')  ==>  '1.20'
  divide('1000', '100')   ==>  '10'
  divide('1000', '1')     ==>  '1000'
  divide('2.40E+6', '2')  ==>  '1.20E+6'


divide-integer

divide-integer takes two operands; it divides two numbers and returns the integer part of the result. If either operand is a special value then the general rules apply.

Otherwise, the result returned is defined to be that which would result from repeatedly subtracting the divisor from the dividend while the dividend is larger than the divisor. During this subtraction, the absolute values of both the dividend and the divisor are used: the sign of the final result is the same as that which would result if normal division were used.

In other words, if the operands x and y were given to the divide-integer and remainder operations, resulting in i and r respectively, then the identity

  x = i×y + r
holds.

The exponent of the result must be 0. Hence, if the result cannot be expressed exactly within precision digits, the operation is in error and will fail – that is, the result cannot have more digits than the value of precision in effect for the operation, and will not be rounded. For example, divide-integer('10000000000', '3') requires ten digits to express the result exactly ('3333333333') and would therefore fail if precision were in the range 1 through 9.

Notes:

  1. The divide-integer operation may not give the same result as truncating normal division (which could be affected by rounding).
  2. The divide-integer and remainder operations are defined so that they may be calculated as a by-product of the standard division operation (described above). The division process is ended as soon as the integer result is available; the residue of the dividend is the remainder.
  3. The divide and divide-integer operation on the same operands give results of the same numerical value if no error occurs and there is no residue from the divide-integer operation.

Examples:

  divide-integer('2', '3')    ==>  '0'
  divide-integer('10', '3')   ==>  '3'
  divide-integer('1', '0.3')  ==>  '3'


max

max takes two operands, compares their values numerically, and returns the maximum. If either operand is a NaN then the general rules apply.

Otherwise, the operands are compared as as though by the compare operation. If they are numerically equal then the left-hand operand is chosen as the result. Otherwise the maximum (closer to positive infinity) of the two operands is chosen as the result. In either case, the result is the same as using the plus operation on the chosen operand.

Examples:

  max('3', '2')    ==>  '3'
  max('-10', '3')  ==>  '3'
  max('1.0', '1')  ==>  '1.0'


min

min takes two operands, compares their values numerically, and returns the minimum. If either operand is a NaN then the general rules apply.

Otherwise, the operands are compared as as though by the compare operation. If they are numerically equal then the left-hand operand is chosen as the result. Otherwise the minimum (closer to negative infinity) of the two operands is chosen as the result. In either case, the result is the same as using the plus operation on the chosen operand.

Examples:

  min('3', '2')    ==>  '2'
  min('-10', '3')  ==>  '-10'
  min('1.0', '1')  ==>  '1.0'


minus and plus

minus and plus both take one operand, and correspond to the prefix minus and plus operators in programming languages.

The operations are evaluated using the same rules as add and subtract; the operations plus(a) and minus(a) (where a and b refer to any numbers) are calculated as the operations add('0', a) and subtract('0', b) respectively, where the '0' has the same exponent as the operand.

Examples:

  plus('1.3')    ==>  '1.3'
  plus('-1.3')   ==>  '-1.3'
  minus('1.3')   ==>  '-1.3'
  minus('-1.3')  ==>  '1.3'


multiply

multiply takes two operands. If either operand is a special value then the general rules apply.

Otherwise, the the operands are multiplied together (‘long multiplication’), resulting in a number which may be as long as the sum of the lengths of the two operands, as follows:

The result is then rounded to to precision digits if necessary, counting from the most significant digit of the result.

Examples:

  multiply('1.20', '3')         ==>  '3.60'
  multiply('7', '3')            ==>  '21'
  multiply('0.9', '0.8')        ==>  '0.72'
  multiply('0.9', '-0')         ==>  '-0.0'
  multiply('654321', '654321')  ==>  '4.28135971E+11'


normalize

normalize takes one operand. It has the same semantics as the plus operation, except that the final result is reduced to its simplest form, with all trailing zeros removed.

That is, while the coefficient is non-zero and a multiple of ten the coefficient is divided by ten and the exponent is incremented by 1. Alternatively, if the coefficient is zero the exponent is set to 0. In all cases the sign is unchanged.

Examples:

  normalize('2.1')    ==>  '2.1'
  normalize('-2.0')   ==>  '-2'
  normalize('1.200')  ==>  '1.2'
  normalize('-120')   ==>  '-1.2E+2'
  normalize('120.00') ==>  '1.2E+2'
  normalize('0.00')   ==>  '0'


remainder

remainder takes two operands; it returns the remainder from integer division. If either operand is a special value then the general rules apply.

Otherwise, the result is the residue of the dividend after the operation of calculating integer division as described for divide-integer, rounded to precision digits if necessary. The sign of the result, if non-zero, is the same as that of the original dividend.

This operation will fail under the same conditions as integer division (that is, if integer division on the same two operands would fail, the remainder cannot be calculated).

Examples:

  remainder('2.1', '3')    ==>  '2.1'
  remainder('10', '3')     ==>  '1'
  remainder('-10', '3')    ==>  '-1'
  remainder('10.2', '1')   ==>  '0.2'
  remainder('10', '0.3')   ==>  '0.1'
  remainder('3.6', '1.3')  ==>  '1.0'
Notes:
  1. The divide-integer and remainder operations are defined so that they may be calculated as a by-product of the standard division operation (described above). The division process is ended as soon as the integer result is available; the residue of the dividend is the remainder.
  2. The remainder operation differs from the remainder operation defined in IEEE 854 (the remainder-near operator), in that it gives the same results for numbers whose values are equal to integers as would the usual remainder operator on integers.
    For example, the result of the operation remainder('10', '6') as defined here is '4', and remainder('10.0', '6') would give '4.0' (as would remainder('10', '6.0') or remainder('10.0', '6.0')). The IEEE 854 remainder operation would, however, give the result '-2' because its integer division step chooses the closest integer, not the one nearer zero.


remainder-near

remainder-near takes two operands. If either operand is a special value then the general rules apply.

Otherwise, if the operands are given by x and y, then the result is defined to be xy × n, where n is the integer nearest the exact value of x ÷ y (if two integers are equally near then the even one is chosen). If the result is equal to 0 then its sign will be the sign of x. (See IEEE §5.1.)

This operation will fail under the same conditions as integer division (that is, if integer division on the same two operands would fail, the remainder cannot be calculated).[4] 

Examples:

  remainder-near('2.1', '3')    ==>  '-0.9'
  remainder-near('10', '6')     ==>  '-2'
  remainder-near('10', '3')     ==>  '1'
  remainder-near('-10', '3')    ==>  '-1'
  remainder-near('10.2', '1')   ==>  '0.2'
  remainder-near('10', '0.3')   ==>  '0.1'
  remainder-near('3.6', '1.3')  ==>  '-0.3'
Notes:
  1. The remainder-near operation differs from the remainder operation in that it does not give the same results for numbers whose values are equal to integers as would the usual remainder operator on integers. For example, the operation remainder('10', '6') gives the result '4', and remainder('10.0', '6') gives '4.0' (as would the operations remainder('10', '6.0') or remainder('10.0', '6.0')). However, remainder-near('10', '6') gives the result '-2' because its integer division step chooses the closest integer, not the one nearer zero.
  2. The result of this operation is always exact.
  3. This operation is sometimes known as ‘IEEE remainder’.


rescale

rescale takes two operands. If either operand is a special value then the general rules apply (and infinities are unchanged), except that if the right-hand operand is infinite, an Invalid operation condition is raised, and the result is [0,qNaN].

Otherwise, it returns the number which is equal in value (except for any rounding) and sign to the first (left-hand) operand and which has an exponent set to the value of the second (right-hand) operand.

The right-hand operand must be a whole number whose integer part (after any exponent has been applied) is no more than Emax and no less then Etiny, and whose fractional part (if any) is all zeros.

The coefficient of the result is derived from that of the left-hand operand. It may be rounded using the current rounding setting (if the exponent is being increased), multiplied by a positive power of ten (if the exponent is being decreased), or is unchanged (if the exponent is already equal to the right-hand operand).

Unlike other operations, if the length of the coefficient after the rescaling would be greater than precision then an Overflow condition results. This guarantees that, unless there is an error condition, the exponent of the result of a rescale is always the value specified by the right-hand operand.

Examples:

  rescale('2.17', '-3')         ==>  '2.170'
  rescale('2.17', '-2')         ==>  '2.17'
  rescale('2.17', '-1')         ==>  '2.2'
  rescale('2.17', '0')          ==>  '2'
  rescale('2.17', '1')          ==>  '0E+1'
  rescale('2', 'Infinity')      ==>  'NaN'
  rescale('-0.1', '0')          ==>  '-0'
  rescale('-0', '5')            ==>  '-0E+5'
  rescale('+35236450.6', '-2')  ==>  'Infinity'
  rescale('-35236450.6', '-2')  ==>  '-Infinity'
  rescale('217',  '-1')         ==>  '217.0'
  rescale('217',  '0')          ==>  '217'
  rescale('217',  '1')          ==>  '2.2E+2'
  rescale('217',  '2')          ==>  '2E+2'
Note that in the penultimate example the number is [0,22,1], leading to the string in scientific notation as shown.


round-to-integer

round-to-integer takes one operand. Its result is the same as using the rescale operation using the given operand as the left-hand-operand and 0 as the right-hand-operand.[5] 

Examples:

  round-to-integer('2.1')    ==>  '2'
  round-to-integer('100')    ==>  '100'
  round-to-integer('100.0')  ==>  '100'
  round-to-integer('101.5')  ==>  '102'
  round-to-integer('-101.5') ==>  '-102'
  round-to-integer('10E+5')  ==>  '1000000'

Note: IEEE 854 refers to §4 for this operation, but then implies that round-half-even rounding should always be used (whereas §4 specifically allows directed rounding). It is assumed that it was not intended to exclude directed rounding.


square-root

square-root takes one operand, If the operand is a special value then the general rules apply.

Otherwise, the operand must be greater than or equal to 0. If the value of the operand is –0 then the result is [1,0,0].

Otherwise, the result is the exact square root of the operand, rounded according to the setting of precision using the round-half-even algorithm, and then normalized (as though by the normalize operation).

Examples:

  square-root('0')     ==> '0'
  square-root('-0')    ==> '-0'
  square-root('0.39')  ==> '0.6244998'
  square-root('1.00')  ==> '1'
  square-root('7')     ==> '2.64575131'
  square-root('10')    ==> '3.16227766'
Notes:
  1. The rounding setting in the context is not used; this means that the algorithm described in Properly Rounded Variable Precision Square Root by T. E. Hull and A. Abrham (ACM Transactions on Mathematical Software, Vol 11 #3, pp229-237, ACM, September 1985) may be used for this operation.
  2. A subnormal result is only possible if the working precision is greater than Emax+1.
  3. The result of this operation is normalized because an unnormalized result with an integer coefficient cannot always be defined (e.g., square-root('4.0')); this normalization may cause the Rounded condition.


power

The following operation is under review. It will probably either be removed or be changed to simply state that the the result must be within one ulp. The definition in this section is included as it defines the results for the power testcase group and for the reference implementation.

power takes two operands, and raises a number (the left-hand operand) to a whole number power (the right-hand operand). If either operand is a special value then the general rules apply, except as stated below.

Otherwise, the right-hand operand must be a whole number whose integer part (after any exponent has been applied) has no more than 9 digits and whose fractional part (if any) is all zeros before any rounding. The operand may be positive, negative, or zero; if negative, the absolute value of the power is used, and the left-hand operand is inverted (divided into 1) before use.

For calculating the power, the number (left-hand operand) is in theory multiplied by itself for the number of times expressed by the power.

In practice (see the note below for the reasons), the power is calculated by the process of left-to-right binary reduction. For power(x, n): ‘n’ is converted to binary, and a temporary accumulator is set to 1. If ‘n’ has the value 0 then the initial calculation is complete. Otherwise each bit (starting at the first non-zero bit) is inspected from left to right. If the current bit is 1 then the accumulator is multiplied by ‘x’. If all bits have now been inspected then the initial calculation is complete, otherwise the accumulator is squared by multiplication and the next bit is inspected.

The multiplications and initial division are done under the normal arithmetic operation and rounding rules, using the context supplied for the operation, except that the multiplications (and the division, if needed) are carried out using an increased precision of precision+elength+1 digits. Here, elength is the length in decimal digits of the integer part (coefficient) of the whole number ‘n’ (i.e., excluding any sign, decimal part, decimal point, or insignificant leading zeros.[6] 

If the increased precision needed for the intermediate calculations exceeds the capabilities of the implementation then an Invalid operation condition is raised.

If, when raising to a negative power, an underflow occurs during the division into 1, the operation is not halted at that point but continues.[7] 

In addition:

Examples:

  power('2', '3')           ==>  '8'
  power('2', '-3')          ==>  '0.125'
  power('1.7', '8')         ==>  '69.7575744'
  power('Infinity', '-2')   ==>  '0'
  power('Infinity', '-1')   ==>  '0'
  power('Infinity', '0')    ==>  '1'
  power('Infinity', '1')    ==>  'Infinity'
  power('Infinity', '2')    ==>  'Infinity'
  power('-Infinity', '-2')  ==>  '0'
  power('-Infinity', '-1')  ==>  '-0'
  power('-Infinity', '0')   ==>  '1'
  power('-Infinity', '1')   ==>  '-Infinity'
  power('-Infinity', '2')   ==>  'Infinity'
  power('0', '0')           ==>  'NaN'
Notes:
  1. The result of the power operator is negative if (and only if) the left-hand operand is negative and the right-hand operand is odd.
  2. A particular algorithm for calculating powers is described, since it is efficient (though not optimal) and considerably reduces the number of actual multiplications performed. It therefore gives better performance than the simpler definition of repeated multiplication. Since results can occasionally differ from those of repeated multiplication, the algorithm must be defined here so that different implementations will give identical results for the same operation on the same values. Other algorithms for this (and other) operations may always be used, so long as they give identical results to those described here.
  3. Mathematical and transcendental functions are outside the scope of this specification. However, implementations are encouraged to provide a power operator which will accept a non-integral right-hand operand when the left-hand operand is non-negative. In this case it is not required that the more general function return identical results to the operation described above.

Footnotes:
[1] In practice, it is only necessary to work with intermediate results of up to twice the current precision. Some rounding settings may require some inspection of possible remainders or additional digits (for example, to determine whether a result is exactly 0.5 in the next position), though their actual values would not be required.
For round-half-up, rounding can be effected by truncating the result to precision (and adding the count of truncated digits to the exponent). The first truncated digit is then inspected, and if it has the value 5 through 9 the result is incremented by 1. This could cause the result to again exceed precision digits, in which case it is divided by 10 and the exponent is incremented by 1.
[2] This rule removes the possibility of an arithmetic overflow during a numeric comparison.
[3] In practice, only two bits need to be noted, indicating whether the remainder was 0, or was exactly half of the final coefficient of the divisor, or was in one of the two ranges above or below the half-way point.
[4] This is a deviation from IEEE 854, necessary to assure realistic execution times when the operands have a wide range of exponents.
[5] This operation is defined in order to provide the Round Floating-Point Number to Integral Value operator described in IEEE 854 §5.5.
[6] The precision specified for the intermediate calculations ensures that the final result will differ by at most 1, in the least significant position, from the ‘true’ result (given that the operands are expressed precisely under the current setting of digits). Half of this maximum error comes from the intermediate calculation, and half from the final rounding.
[7] It can only be halted early if the result becomes zero.

[previous | contents | next]