ISO/IEC JTC1/SC22/WG5/N1239 1996-01-06 From: R. Baker Kearfott To: Miles Ellis, WG5 Subject: Report, HPC electronic subgroup References: WG5/N1189 (SD5), X3J3/95-004r1, X3J3/96-103r2, WG5/N1215, X3J3/96-158r2, X3J3/96-164 PARTS OF THIS REPORT (in order of occurrence): Introduction Area of Responsibility Summary of Action on Repository Items Additional Actions and Requirements Recommendations to the February, 1997 WG5/X3J3 subgroup INTRODUCTION ============ The HPC electronic subgroup, formed as one of three such groups after the July, 1996 Dresden WG5 meeting, was charged with identifying additional requirements for Fortran 2000 concerning numerical aspects of the language and high-performance computing. AREA OF RESPONSIBILITY ==== == ============== Christian Weber sorted the items in the present WG5 Requirements Repository (SD5) into each of the three subgroups. The requirements falling under "HPC" are: 19 Standardization of performance directives 19a Directives 23 Multi-threaded execution facilities 52 Asynchronous I/O (proposed HPF work) 53 PRIVATE and SHARED variables for TASK parallelism 60 Pointer Association Classes 78 Qualifiers and Attributes for High Performance Computing with Fortran 90 83 Performance Problem: Allocation and Deallocation SUMMARY OF ACTION ON THE ABOVE REPOSITORY ITEMS ======= == ====== == === ===== ========== ===== X3J3 and WG5 have previously taken some actions on the Requirements Repository items. In addition, I conducted an electronic vote, with comments. (I did not vote on these specific items, but I include my opinion in the final recommendations.) Summaries follow here. 19 STANDARDIZATION OF PERFORMANCE DIRECTIVES: This is similar to the X3J3 JOR item 26 (WG5 item 19a), although item 19 lists specific directives that would aid high-performance computing. X3J3 approved X3J3 JOR item 26 as medium priority. WG5 item 19 can be reprioritized in February. Summary of vote ------- -- ---- Yes: 5 No: 2 Maybe: 1 Both "no" votes commented that there is not sufficient consensus about specifics for standardization. One of the "yes" votes contained the comment that the programmer should be able to, in a portable way, specify probable sizes of loop bounds and lack of data dependencies and I/O within loops and functions. Other "yes" votes indicated that it is merely standardization of existing practice, but which directives are to be standardized should be identified; opinions about how it should be done were given. The "Maybe" vote indicated that computer architecture is in a state of flux, and existing practice is architecture-specific. 23 MULTI-THREADED EXECUTION FACILITIES: Action has not been taken on this item. The actual proposal in the requirements repository is not specific, and more work, mainly study of existing practice, would need to be done. Summary of vote ------- -- ---- Yes: 4 No: 3 Maybe: 1 One of the "no" votes indicated that POSIX threads should nonetheless be included, while another "no" vote indicated that there is no industry agreement about the form of parallelism. The third "no" vote felt the requirement is too far-reaching to be completed in time. One of the "yes" votes commented that van Snyder's suggestion "In addition to making I/O asynchronous, why not just make anything asynchronous?" (cf. http://gyre.jpl.nasa.gov/~vsnyder/fortran/threads.html) should be considered. Another "yes" vote gave further justifying arguments. The "maybe" vote was wondering about specifics and about possible performance problems. 52. ASYNCHRONOUS I/O: This has already been identified as a Fortran 2000 requirement in WG5/N1215. Richard Bleikamp and X3J3 have already made substantial progress on this topic. (See, for example, X3J3/96-158r2, X3J3/96-164.) Since this item is already a firm requirement, further action is not appropriate. 53 PRIVATE AND SHARED VARIABLES FOR TASK PARALLELISM: This is X3J3 JOR item 81. In May, 1996, X3J3 changed the status of this item (as a national recommendation) from "medium priority" to "not a priority". The item is loosely related to item 23. Summary of vote ------- -- ---- Yes: 3 No: 4 Maybe: 1 One of the "no" votes pointed out that PRIVATE is more expensive than SHARED on some machines and less expensive on others; thus, user definition of these could REDUCE performance. A second "no" vote indicated that this feature is not needed for threads, and other of it uses are not well-defined. The third "no" vote indicated that parallelism in general should not be standardized since there is no industry agreement. One of the "yes" votes indicated that, because common blocks are deprecated and since global data within modules has the same problems as common blocks, another mechanism should be provided to declare variables common to different tasks. The "maybe" vote had similar reservations as for # 23. 60 POINTER ASSOCIATION CLASSES: This is X3J3 JOR item 60. X3J3 has internally given this a low priority. Summary of vote ------- -- ---- Yes: 3 No: 3 Maybe: 2 The "maybe" votes needed more time to review the issue. One of the "no" votes stated that the utility is not great, current restrictions on pointers are sufficient, and similar attempts in "C" were not widely used. The "yes" votes indicated that vectorization and parallelization would be promoted by information given to the compiler that two pointers cannot point to the same object. 78 QUALIFIERS AND ATTRIBUTES FOR HIGH PERFORMANCE COMPUTING WITH FORTRAN 90: Summary of vote ------- -- ---- Yes: 5 No: 2 Maybe: 1 One of the "no" votes states that N1186 is too vague and does not standardize existing practice. The "maybe" vote wanted to review the issue. One of the "yes" votes stated that this item should be limited to in-lining of procedures, while another stated that in-lining is an absolute must. One of the other "yes" votes was accompanied by a comment that vectorization and parallelization should be promoted with user descriptions of data dependencies (or lack thereof) through pointers, etc. Another "yes" vote indicated that this requirement would standardize existing practice. 83 PERFORMANCE PROBLEM: ALLOCATION AND DEALLOCATION: This is an unusual item, since it appears to be a requirement for an implementation constraint to improve performance, rather than for syntax for user-control of performance. In fact, strictly speaking, the description in the repository appears to be mainly a statement of fact of use to designers of optimizing compilers. Summary of vote ------- -- ---- Yes: 2 No: 4 Maybe: 2 The maybe votes wanted to review the issue or to study it more. One of the "no" votes did not see what else was needed beyond what is in F95, while another felt that the statement of the requirement itself is empty. One of the "yes" votes suggested an array syntax to efficiently allocate a large number of pointers. ADDITIONAL ACTION AND REQUIREMENTS ========= ====== === ============ In addition to the above items, the following activities took place: * I solicited suggestions for additional possible requirements, without response. * I suggested the following additional possible requirement. What response I received (two comments) was favorable: Title: Support for IEEE I/O Basic Functionality: Allow the user to query the processor concerning support of the IEEE 754 specification for base conversion in I/O and in constants. Rationale: The intrinsic modules IEEE_ARITHMETIC and IEEE_SUBSET as described in John Reid's Technical Report for Floating Point Exception Handling provide good support for the binary, runtime aspects of the IEEE 754 binary floating point standard. When IEEE arithmetic is available and query function IEEE_SUPPORT_ROUNDING in IEEE_ARITHMETIC indicates that IEEE 754 arithmetic is supported for a particular data type, the user knows the accuracy of binary arithmetic with that data type. However, IEEE arithmetic also specifies accuracy of conversion of decimal strings to binary representations in I/O and in constants (see ANSI/IEEE Std 754-1985, Section 5.6). There should be language support for such conversions for the same reasons as support for the IEEE rounding modes. In particular, rigorous error analysis of a computation is not possible without language support for the conversion aspects of IEEE arithmetic. Suggested implementation: The syntax can be consistent with (and, indeed, can be bound to) the modules IEEE_ARITHMETIC and IEEE_SUBSET, That is, syntax can be of the form IEEE_SUPPORT_CONVERSION, similar to IEEE_SUPPORT_ROUNDING. However, decimal to binary conversion of constants can be done at compile time, when rounding modes are not known. An ambiguity in IEEE 754 precludes such compile time conversion, since 754 says "Conversions shall be correctly rounded as specified in Section 4 ..." The meaning of IEEE_SUPPORT_ROUNDING should be such that the rounding mode (e.g. "round-to-nearest") is specified for conversions that are done at compile time. (Should only "round-to-nearest" be supported for conversion, or should a compiler directive be able to specify which mode is supported? For equal behavior on compilers that convert at runtime and compile time, it seems that only one mode can be supported per compilation unit.) * I put forward a proposal for mixed mode arithmetic with the interval data type. This elicited no response from the hpc electronic subgroup. However, there was subsequently lively discussion on the X3J3 email list. RECOMMENDATIONS TO THE FEBRUARY, 1997 WG5/X3J3 SUBGROUP =============== == == ======== ==== ======== ======== I. THE REPOSITORY ITEMS: There was no consensus, so items 19, 23, 53, 60, 78, and 83 should be discussed. Subgroup members should review the items in N1189, as well as the above comment summaries. (I will can supply the entire text of the comments.) The committee should be guided by the following summary of votes: 19 -- Strong pass 23 -- Weak pass 53 -- Weak fail 60 -- Undecided 78 -- Strong pass 83 -- Weak fail My own feelings: 53 --- The arguments "for" this appear to be weaker than the arguments against it. 78 --- The requirement clearly needs to be made clearer before WG5 can decide. I favor limiting it to one or more well-defined items. I strongly favor syntax to specify in-lining of procedures. 83 --- I would like to see more specifics concerning how additional language syntax could make allocation/deallocation more efficient, and why the same effect cannot be obtained within the present language. II. ADDITIONAL REQUIREMENT: Although all response to "Support for IEEE I/O" (see above) was favorable, the subgroup should examine this new item carefully. III. Options for how interval arithmetic can be included in F2000 should be reviewed, for presentation to the full committee for a decision. IV. Time permitting, technical work on the interval arithmetic proposal should continue. This includes: A. Finalizing the status of mixed-mode arithmetic B. Discussing the proposal for interval I/O C. Studying the individual intrinsic functions for interval data types