ISO/IEC JTC1/SC22/WG5/ N1248


           A DISCUSSION OF OPERATORS AND OPERATOR PRECEDENCE 
           =================================================
                   ( submitted by Wolfgang Walter, 
                      summarizing a discussion at 
                     the last DIN Fortran meeting 
                      held January 16/17, 1997 )


As currently defined in Fortran 90/95, the feature "defined (extension) 
operator" is incomplete because the standard allows the user 
to choose a new name for an operator, but prohibits the specification 
of its priority (precedence, binding strength) in expressions.  The 
current standard imposes highest priority for unary and lowest priority 
for binary defined operators without regard to the intended semantics.  

The current situation is unsatisfactory because the priority imposed by 
the standard is rarely the one the application calls for and thus the 
mechanism by which new operators are defined is clearly incomplete.  
This forces the user to employ extra parentheses (which are mostly 
superfluous in the user's intuitive hand-notation) to control precedence 
in all but the most trivial expressions involving defined operators --- 
or to abandon the use of defined operators altogether.  With the current 
restriction, this feature is seriously crippled, so the latter 
actually appears to be the safest choice in some cases in order to 
avoid misunderstandings and improper use.  For example, there is no way 
to make comparison operators intuitively safe if they are not at the 
same level of precedence as the intrinsic ones --- the user is bound to 
make a false assumption at some point and forget the parentheses.  


Here are some facts about operators which might not be generally known: 

1. Quite apart from any programming languages, operators are commonplace 
   in virtually all areas of engineering, mathematics, physics, and many 
   other sciences.  They are designed and used for a multitude of 
   specific tasks in these areas and provide a convenient and intuitive 
   notation that is far superior to function calls only.  

   EVERY operator, regardless of whether it is intrinsic, overloaded, or 
   user-defined, has a more or less inherent priority that is dictated by 
   convention in the specific area of application where this operator is 
   defined and used.  These conventions are by no means arbitrary, but 
   have developed over time in order to make the notation more effective 
   and concise and to facilitate communication among specialists and users 
   in the field (there are loads of examples in algebra, calculus, logic, 
   physics, engineering, geometry, graphics, and many other fields).  
   Furthermore, these conventions usually coexist and interact in a 
   closely knit framework and have passed the test of time.  

2. Most users have a good intuitive idea of what the precedence among the 
   special operators in their field of application is and will have no 
   difficulties in learning the precise priority levels defined for these 
   operators - assuming the implementors of the basic modules for this 
   field respect the conventions.  Furthermore, there are some more global 
   conventions, e.g. for comparison operators.  

   Most people expect comparison operators to be at the same priority 
   level, and there is a wealth of comparison operators out there.  
   The supposedly "standard" six, ==, /=, <, >, <=, >=, are by no means 
   special or in any way unique.  Set theory has others, such as "subset", 
   "superset", "element of", and "disjoint" (which also occur in interval 
   arithmetic), graphics may have "is hidden by", "is clipped by", 
   "overlaps", etc., and I don't have to strain my imagination to come up 
   with more.  

3. In whatever field they may appear, operators typically have the same 
   minimal set of inherent properties: they have a clearly defined 
   syntax (typically a symbol or name), a clearly defined set of 
   operands (fixed number of operands and their "types"), they are used 
   with a clear notational convention (order and "geometric" placement 
   of operands with respect to the operator's position), and their 
   interaction with other operators within the same expression is 
   clearly defined (precedence rules for the whole set of relevant 
   operators).     

4. An expressional notation involving mostly operators instead of 
   function references results in a drastic reduction of the number of 
   parentheses and commas necessary to write the expression, making 
   the expression much more readable.  However, the number of parentheses 
   will be greatly reduced ONLY IF the operators have an associated 
   natural precedence and IF there is a sufficient number of precedence 
   levels.  

5. Because Niklaus Wirth's motto is "Small is beautiful" and because 
   he wanted to be able to write a fast recursive descent parser, Pascal 
   has only three binary and one unary operator precedence level.  
   Thus generations of programmers have been trained to believe that 
   parentheses are always required in expressions such as 

      ( x < y ) and ( y < z ) 

   Fortunately, Fortran has not made such a simplistic assumption for 
   its intrinsic operators, but rather chosen to adopt the 
   well-established conventions of mathematical/logical notation.  
   Curiously, however, Fortran 90/95 fail to use the same logic for 
   other operators which happen to be user-defined and not predefined.  
   Since the choice of what is intrinsically defined and what is not 
   is clearly somewhat arbitrary and undergoes evolutionary change (see 
   problems with interval operators), this restriction is unacceptable.  

6. Experience shows that one looses track of parenthesis levels very 
   quickly, usually as soon as the nesting depth reaches 3 or 4.  
   So if there is an easy way to avoid parentheses, one should try to 
   avoid them.  
    
   For example, LISP programs involve loads of parenthesized lists 
   which are often nested 5 to 10 levels deep.  The only way these 
   programs become half-way readable is through the liberal use of 
   indentation and newlines, usually by vertical alignment of 
   corresponding pairs of parentheses.  However, one does not have 
   that liberty in typical Fortran expressions, e.g. because of the 
   statement length limit, because it does not mimic the typical 
   application-specific notation, and because it is highly 
   unconventional in algorithmic programming languages.  
   
7. From the language design point of view, burdening every "user" of 
   a feature with superfluous work (i.e., requiring extra syntax in 
   every expression that uses that feature/operator) instead of solving 
   the problem at its root (i.e., at the point of definition of that 
   operator) violates the principle of locality and makes the usage of 
   defined operators very error-prone and counterintuitive.  


Some Common Fallacies About Operators: 
=====================================

1. We have to worry about the same operator name having different 
   --------------------------------------------------------------
   priorities (associated with individual overloads): 
   -------------------------------------------------

   YES and NO.  

   YES in the sense that something has to be done, namely to prohibit 
   such incompatible overloads.  All overloads visible within the same 
   scoping unit must have the same (defined or default) priority.  
   The compiler can and must check this (it is a static property) and 
   produce an error if this (very unlikely) error occurs.  

   NO in the sense that with this rule, there is nothing else to worry 
   about.  

   Priority is a syntactic, not a semantic property of an operator, 
   i.e. the expression tree (DAG) must be uniquely determined solely 
   by the SYNTAX of the expression.  Thus it is mandatory that a 
   priority be associated with an operator NAME (or SYMBOL) and not 
   with a particular operator overload.  

   This rule is necessary to avoid the situation where an expression 
   will allow only contradictory interpretations (see example at end).  


2. Operators are just syntactic sugar.  
   ----------------------------------

   For decades, math-oriented people have employed precisely that "bit 
   of extra syntax" to great advantage, whether it be on paper, in a 
   programming language, or in a symbolic/algebra system.  

   The main goal of all data abstraction and object orientation efforts 
   is to maintain simplicity of syntax and expression while building 
   more and more complex structures with more and more complex semantics.  
   Operators are an essential part of data abstraction.  An intuitive and 
   efficient notation is very important for coding comprehensible, 
   maintainable, reliable, and verifiable software.  

   Long-term experience has shown that such a notational convenience can 
   make all the difference in the world, to the point where, without it, 
   programming not only becomes less intuitive, more cumbersome and more 
   error-prone, but may make debugging so frustratingly slow that it may 
   ultimately lead to serious delays in a project.  


3. With everybody defining their own operators, we will have everybody 
   -------------------------------------------------------------------
   building their own programming environment/world/language.  
   ---------------------------------------------------------

   (For those who have thoughts along these lines, the implication here 
   seems to be that that is bad or dangerous or hinders communication 
   and understanding among programmers.)  

   a) In Fortran 77, all variation at the lexical level came from 
      an extremely limited set of token classes: identifiers (names), 
      literal constants, and labels.  
      In Fortran 90/95, that set is extended by one extra token class: 
      defined (extension) operators.  
   b) Lexically, operator names are quite similar to other names (except 
      for the enclosing points).  I fail to see how their use could be 
      more dangerous than the use of ordinary identifiers.  
   c) This fear seems to allude to the enhanced capabilities of a 
      language featuring user-defined operators which allow the user 
      to immitate his/her habitual notation (e.g. on paper), and that 
      that notation might not be immediately understandable to the casual 
      reader.  
   d) I feel compelled to ask: Isn't that what a programming language 
      with good abstraction facilities is all about?  
      The other (non-operator) names the user dreams up are part of 
      his/her programming "style/environment" as well, and nobody 
      considers that to be harmful.  


A rough draft for the proposed feature follows: 
==============================================

Title:  Priority specification for defined operators 

Submitted By:  DIN 

Basic Functionality:  Allow the specification of a priority for defined 
operators that is different from the predefined priority.  All overloads 
of a defined operator must have the same priority, that is, the priority 
is associated with the operator name (see note below).  The priority 
levels for unary operators, 1 (top), 4 (unary + and -), and 8 (.NOT.), 
should be reserved for unary operators, the other priority levels for 
binary operators.  Although the set of priority levels (currently 12) 
could in principle be augmented, this will make the feature more 
complicated both for the programmer and for the implementor.  Since 
this seems to be a requirement in few applications only, it is not 
currently proposed.  

If the user does not specify the priority of an operator, the default 
rule as defined by the current standard applies which imposes highest 
priority for unary and lowest priority for binary defined operators.  
Associativity rules are not changed, i.e. a sequence of binary operators 
with the same priority is interpreted from left to right (with the 
exception of sequences of ** operators).  Thus existing programs are 
not affected.  

There are several ways to enforce the requirement that an operator 
name must have only one priority associated with it.  
The simplest rule to understand and enforce is probably that within a 
given context (scoping unit), all visible overloads of an operator name 
must have the same priority.  

The following example shows why an operator name must have only one 
priority associated with it: 

  INTERFACE OPERATOR ( .BINOP. PRIORITY + )
    MODULE PROCEDURE  ADDFUN 
  END INTERFACE 

  INTERFACE OPERATOR ( .BINOP. PRIORITY * )
    MODULE PROCEDURE  MULFUN1, MULFUN2 
  END INTERFACE 

CONTAINS 

  REAL FUNCTION ADDFUN (x, y)
    INTEGER, INTENT(IN) :: x, y
     . . . 
  END FUNCTION ADDFUN

  REAL FUNCTION MULFUN1 (x, y)
    INTEGER, INTENT(IN) :: x ;   REAL, INTENT(IN) :: y
     . . . 
  END FUNCTION MULFUN1

  REAL FUNCTION MULFUN2 (x, y)
    REAL, INTENT(IN) :: x ;   INTEGER, INTENT(IN) :: y
     . . . 
  END FUNCTION MULFUN2

When these operators are used in an expression such as 

  INTEGER :: A, B, C 
   . . . 
   . . .   A .BINOP. B .BINOP. C 

in the absence of other overloads for the operator name .BINOP., 
both interpretations violate the specified priorities: 

   . . .   MULFUN1( A, ADDFUN(B, C) ) 
   . . .   MULFUN2( ADDFUN(A, B), C )

This problem is not restricted to multiple occurrences of the same 
operator name in an expression.  

Estimated Impact:  Medium impact on compilers (may require 
generalization of symbol/operator table), no impact on efficiency, 
huge impact on user-friendliness of defined operators.  
Does not affect existing programs.