Incorrect optimization in 1963

Floating-point users today are accustomed (or resigned, sometimes) to compilers that make invalid optimizations by assuming all arithmetic is mathematically correct instead of rounding. The situation used to be worse. A 1963 IBM Fortran II manual warns that it did this for integers too:

FORTRAN assumes that mathematically equivalent expressions are computationally equivalent. Hence, a sequence of consecutive multiplications, consecutive divisions, consecutive additions, or consecutive subtractions, not grouped by parentheses will be reordered, if necessary, to minimize the number of storage accesses in the object program.

Although the assumption concerning mathematical and computational equivalence is virtually true for floating point expressions, special care must be taken to indicate the order of fixed point multiplication and division, since fixed point arithmetic in FORTRAN is “greatest integer” arithmetic (i.e., truncated or remainderless). Thus, the expression


which by convention is taken to mean [(5 × 4)/2], is computed in a FORTRAN object program as


i.e., it is computed from left to right after permutation of the operands to minimize storage accesses.

The result of a FORTRAN computation in this case would be 8. On the other hand, the result of the expression (5 × 4)/2 is 10. Therefore, to insure accuracy of fixed point multiplication and division, it is suggested that parentheses be inserted into the expression involved.

(Reordering “to minimize the number of storage accesses” is pointless in a constant expression, but apparently the optimizer did it anyway.)

If this reordering can be prevented by redundant parentheses, then parentheses don't only affect parsing; they change semantics by introducing a barrier against algebraic transformations!

Giving parentheses this additional meaning has an unfortunate effect: other optimizations can no longer ignore them. The manual continues by describing one such problem:

One important type of optimization, involving common subexpressions, takes place only if the expression is suitably written. For example, the arithmetic statement

Y = A*B*C + SINF (A*B)

will cause the object program to compute the product A*B twice. An efficient object program would compute the product A*B only once. The statement is correctly written

Y = (A*B) * C + SINF (A*B)

By parenthesizing the common subexpression, A*B will be computed only once in the object program.

In general, when common subexpressions occur within a expression, they should be parenthesized.

There is one case in which it is not necessary to write the parentheses, because FORTRAN will assume them to be present. These are the type discussed in “Hierarchy of operations,” and need not be given. Thus

Y = A*B+C+SINF (A*B)

is, for optimization purposes, as suitable as

Y = (A*B)+C+SINF (A*B)

I'm not sure whether the problem is simply that A*B*C does not contain the subexpression A*B, or that the CSE lifter sees it but can't merge it with (A*B) because they're not equivalent in all contexts.

Optimizers today still have limitations, and still make invalid transformations, but they've become much more subtle!

No comments:

Post a Comment

It's OK to comment on old posts.