ISO/IEC JTC1/SC22/WG5/ N1248 A DISCUSSION OF OPERATORS AND OPERATOR PRECEDENCE ================================================= ( submitted by Wolfgang Walter, summarizing a discussion at the last DIN Fortran meeting held January 16/17, 1997 ) As currently defined in Fortran 90/95, the feature "defined (extension) operator" is incomplete because the standard allows the user to choose a new name for an operator, but prohibits the specification of its priority (precedence, binding strength) in expressions. The current standard imposes highest priority for unary and lowest priority for binary defined operators without regard to the intended semantics. The current situation is unsatisfactory because the priority imposed by the standard is rarely the one the application calls for and thus the mechanism by which new operators are defined is clearly incomplete. This forces the user to employ extra parentheses (which are mostly superfluous in the user's intuitive hand-notation) to control precedence in all but the most trivial expressions involving defined operators --- or to abandon the use of defined operators altogether. With the current restriction, this feature is seriously crippled, so the latter actually appears to be the safest choice in some cases in order to avoid misunderstandings and improper use. For example, there is no way to make comparison operators intuitively safe if they are not at the same level of precedence as the intrinsic ones --- the user is bound to make a false assumption at some point and forget the parentheses. Here are some facts about operators which might not be generally known: 1. Quite apart from any programming languages, operators are commonplace in virtually all areas of engineering, mathematics, physics, and many other sciences. They are designed and used for a multitude of specific tasks in these areas and provide a convenient and intuitive notation that is far superior to function calls only. EVERY operator, regardless of whether it is intrinsic, overloaded, or user-defined, has a more or less inherent priority that is dictated by convention in the specific area of application where this operator is defined and used. These conventions are by no means arbitrary, but have developed over time in order to make the notation more effective and concise and to facilitate communication among specialists and users in the field (there are loads of examples in algebra, calculus, logic, physics, engineering, geometry, graphics, and many other fields). Furthermore, these conventions usually coexist and interact in a closely knit framework and have passed the test of time. 2. Most users have a good intuitive idea of what the precedence among the special operators in their field of application is and will have no difficulties in learning the precise priority levels defined for these operators - assuming the implementors of the basic modules for this field respect the conventions. Furthermore, there are some more global conventions, e.g. for comparison operators. Most people expect comparison operators to be at the same priority level, and there is a wealth of comparison operators out there. The supposedly "standard" six, ==, /=, <, >, <=, >=, are by no means special or in any way unique. Set theory has others, such as "subset", "superset", "element of", and "disjoint" (which also occur in interval arithmetic), graphics may have "is hidden by", "is clipped by", "overlaps", etc., and I don't have to strain my imagination to come up with more. 3. In whatever field they may appear, operators typically have the same minimal set of inherent properties: they have a clearly defined syntax (typically a symbol or name), a clearly defined set of operands (fixed number of operands and their "types"), they are used with a clear notational convention (order and "geometric" placement of operands with respect to the operator's position), and their interaction with other operators within the same expression is clearly defined (precedence rules for the whole set of relevant operators). 4. An expressional notation involving mostly operators instead of function references results in a drastic reduction of the number of parentheses and commas necessary to write the expression, making the expression much more readable. However, the number of parentheses will be greatly reduced ONLY IF the operators have an associated natural precedence and IF there is a sufficient number of precedence levels. 5. Because Niklaus Wirth's motto is "Small is beautiful" and because he wanted to be able to write a fast recursive descent parser, Pascal has only three binary and one unary operator precedence level. Thus generations of programmers have been trained to believe that parentheses are always required in expressions such as ( x < y ) and ( y < z ) Fortunately, Fortran has not made such a simplistic assumption for its intrinsic operators, but rather chosen to adopt the well-established conventions of mathematical/logical notation. Curiously, however, Fortran 90/95 fail to use the same logic for other operators which happen to be user-defined and not predefined. Since the choice of what is intrinsically defined and what is not is clearly somewhat arbitrary and undergoes evolutionary change (see problems with interval operators), this restriction is unacceptable. 6. Experience shows that one looses track of parenthesis levels very quickly, usually as soon as the nesting depth reaches 3 or 4. So if there is an easy way to avoid parentheses, one should try to avoid them. For example, LISP programs involve loads of parenthesized lists which are often nested 5 to 10 levels deep. The only way these programs become half-way readable is through the liberal use of indentation and newlines, usually by vertical alignment of corresponding pairs of parentheses. However, one does not have that liberty in typical Fortran expressions, e.g. because of the statement length limit, because it does not mimic the typical application-specific notation, and because it is highly unconventional in algorithmic programming languages. 7. From the language design point of view, burdening every "user" of a feature with superfluous work (i.e., requiring extra syntax in every expression that uses that feature/operator) instead of solving the problem at its root (i.e., at the point of definition of that operator) violates the principle of locality and makes the usage of defined operators very error-prone and counterintuitive. Some Common Fallacies About Operators: ===================================== 1. We have to worry about the same operator name having different -------------------------------------------------------------- priorities (associated with individual overloads): ------------------------------------------------- YES and NO. YES in the sense that something has to be done, namely to prohibit such incompatible overloads. All overloads visible within the same scoping unit must have the same (defined or default) priority. The compiler can and must check this (it is a static property) and produce an error if this (very unlikely) error occurs. NO in the sense that with this rule, there is nothing else to worry about. Priority is a syntactic, not a semantic property of an operator, i.e. the expression tree (DAG) must be uniquely determined solely by the SYNTAX of the expression. Thus it is mandatory that a priority be associated with an operator NAME (or SYMBOL) and not with a particular operator overload. This rule is necessary to avoid the situation where an expression will allow only contradictory interpretations (see example at end). 2. Operators are just syntactic sugar. ---------------------------------- For decades, math-oriented people have employed precisely that "bit of extra syntax" to great advantage, whether it be on paper, in a programming language, or in a symbolic/algebra system. The main goal of all data abstraction and object orientation efforts is to maintain simplicity of syntax and expression while building more and more complex structures with more and more complex semantics. Operators are an essential part of data abstraction. An intuitive and efficient notation is very important for coding comprehensible, maintainable, reliable, and verifiable software. Long-term experience has shown that such a notational convenience can make all the difference in the world, to the point where, without it, programming not only becomes less intuitive, more cumbersome and more error-prone, but may make debugging so frustratingly slow that it may ultimately lead to serious delays in a project. 3. With everybody defining their own operators, we will have everybody ------------------------------------------------------------------- building their own programming environment/world/language. --------------------------------------------------------- (For those who have thoughts along these lines, the implication here seems to be that that is bad or dangerous or hinders communication and understanding among programmers.) a) In Fortran 77, all variation at the lexical level came from an extremely limited set of token classes: identifiers (names), literal constants, and labels. In Fortran 90/95, that set is extended by one extra token class: defined (extension) operators. b) Lexically, operator names are quite similar to other names (except for the enclosing points). I fail to see how their use could be more dangerous than the use of ordinary identifiers. c) This fear seems to allude to the enhanced capabilities of a language featuring user-defined operators which allow the user to immitate his/her habitual notation (e.g. on paper), and that that notation might not be immediately understandable to the casual reader. d) I feel compelled to ask: Isn't that what a programming language with good abstraction facilities is all about? The other (non-operator) names the user dreams up are part of his/her programming "style/environment" as well, and nobody considers that to be harmful. A rough draft for the proposed feature follows: ============================================== Title: Priority specification for defined operators Submitted By: DIN Basic Functionality: Allow the specification of a priority for defined operators that is different from the predefined priority. All overloads of a defined operator must have the same priority, that is, the priority is associated with the operator name (see note below). The priority levels for unary operators, 1 (top), 4 (unary + and -), and 8 (.NOT.), should be reserved for unary operators, the other priority levels for binary operators. Although the set of priority levels (currently 12) could in principle be augmented, this will make the feature more complicated both for the programmer and for the implementor. Since this seems to be a requirement in few applications only, it is not currently proposed. If the user does not specify the priority of an operator, the default rule as defined by the current standard applies which imposes highest priority for unary and lowest priority for binary defined operators. Associativity rules are not changed, i.e. a sequence of binary operators with the same priority is interpreted from left to right (with the exception of sequences of ** operators). Thus existing programs are not affected. There are several ways to enforce the requirement that an operator name must have only one priority associated with it. The simplest rule to understand and enforce is probably that within a given context (scoping unit), all visible overloads of an operator name must have the same priority. The following example shows why an operator name must have only one priority associated with it: INTERFACE OPERATOR ( .BINOP. PRIORITY + ) MODULE PROCEDURE ADDFUN END INTERFACE INTERFACE OPERATOR ( .BINOP. PRIORITY * ) MODULE PROCEDURE MULFUN1, MULFUN2 END INTERFACE CONTAINS REAL FUNCTION ADDFUN (x, y) INTEGER, INTENT(IN) :: x, y . . . END FUNCTION ADDFUN REAL FUNCTION MULFUN1 (x, y) INTEGER, INTENT(IN) :: x ; REAL, INTENT(IN) :: y . . . END FUNCTION MULFUN1 REAL FUNCTION MULFUN2 (x, y) REAL, INTENT(IN) :: x ; INTEGER, INTENT(IN) :: y . . . END FUNCTION MULFUN2 When these operators are used in an expression such as INTEGER :: A, B, C . . . . . . A .BINOP. B .BINOP. C in the absence of other overloads for the operator name .BINOP., both interpretations violate the specified priorities: . . . MULFUN1( A, ADDFUN(B, C) ) . . . MULFUN2( ADDFUN(A, B), C ) This problem is not restricted to multiple occurrences of the same operator name in an expression. Estimated Impact: Medium impact on compilers (may require generalization of symbol/operator table), no impact on efficiency, huge impact on user-friendliness of defined operators. Does not affect existing programs.