ISO/IEC JTC1/SC22/WG5 N2045 Result of the WG5 straw ballot on N2040 John Reid N2044 asked this question Please answer the following question "Is N2040 ready for forwarding to SC22 as the DTS?" in one of these ways. 1) Yes. 2) Yes, but I recommend the following changes. 3) No, for the following reasons. 4) Abstain. The numbers of answers in each category were: 3 for 1) Yes (Long, Reid, Whitlock) 2 for 2) Yes, but I recommend the following changes (Bader, Nagle) 5 for 3) No, for the following reasons (Cohen, Corbett, Maclaren, Muxworthy, Snyder) 0 for 4) Abstain The ballot has failed. I request J3 to consider all the comments and prepare a revised version for final approval by WG5 and submission to SC22 for PDTS ballot. Here are the comments and reasons. I have included an edited version of the comment of Anton Shterenlikht that appeared in comp-f90 because he has noticed errors in A.3.1. Reinhold Bader 2) Yes, but I recommend the following changes. [1:7] After "examples" add " that illustrate the semantics described" Reason: Many of these examples are in the Annex. [14:30] Replace "constuct" by "construct". [14:30+] Add the following text: "Deallocation of coarrays is delayed until the statement that performs the deallocation on all active images of the current team has synchronized these images." Reason: Avoid a race condition for definitions/references to such coarrays on the stalled image (cf. [31:31-34], [36:11-15]). It may be appropriate to also add a note that such a statement must be a DEALLOCATE or a invocation of MOVE_ALLOC, either of which must have STAT= specified. [17:7] Delete superfluous space after "ISO_FORTRAN_ENV". [33:16+] Add missing bullet "* extensions of image selector syntax and semantics provide the capability to access coarray data across team boundaries;" [33:19-20] Replace "provide low-level primitives ... computation;" by " provide the ability to perform non-trivial operations across image boundaries on scalars of some intrinsic types in unordered segments;" Reason: The text should describe what the atomics do, beyond the already existing ones. [35:24-25] Replace by "{In 4.5.6.2 The finalization process, replace the text of NOTE 4.48} An implementation might need to ensure that when more than one coarray must be deallocated by execution of a single statement, they are deallocated in the same order on all images in the current team." Reason: The term "event" now has a defined meaning that has nothing to do with the NOTEs scenario. [36:35-36] Consider the statement SYNC MEMORY executed by all active images of the current team, one image of which has failed. According to the semantics defined here and in [38:25-26] error termination must be initiated on each executing image of the current team; in particular this involves cross-image activity that was not required by Fortran 2008. Was this intended? If not, is it sufficient to make the following edit to [36:35]: Replace "If" by "Except in a SYNC MEMORY statement, if" ? [37:14] Before "FORM TEAM", insert "\uwave{EVENT POST, EVENT WAIT,}". Reason: Similar to locks, events only impose one-way segment ordering, and this ordering is already defined in [18:21-24], so a SYNC MEMORY appears unnecessary. See 09-193r2 for the reasoning for LOCK/UNLOCK. [37:18+] Add the following text "{In 8.5.2 Segments, edit the first sentence of NOTE 8.34 as follows} The model upon which the interpretation of a program is based is that there is a permanent memory location for each coarray and that all images \uwave{on which it is established} can access it." [38:13] Delete "on all images" Reason: For each statement it is clear on which images it is executed; this may be a subset of all images. [42:22] Replace "in the current team when the coarray was established" by "in the most remotely removed current or ancestor team in which the coarray is established." Reason: The problem with the present wording is that the set of images on which a coarray is established may change throughout execution time (and also across images). To avoid ambiguity, I suggest looking at the establishment at the point (and the image) where the intrinsic is executed. This also seems appropriate for assuring composability of the coarray team concept - a huge UCOBOUND that cannot be addressed by any means in the local context would not seem to make sense. [44:15], [44:17] Replace "subcauses" by "subclauses", twice. _____________________________________________________________________ Malcolm Cohen 3) No, for the following reasons. (a) I agree with Robert Corbett's vote. My recommendation is that the transfer of control to the END TEAM statement should be available only for access to failed image data from within the CHANGE TEAM construct itself. (a2) 5.9 states "Otherwise, the executing image resumes execution at the END TEAM statement of the construct" "the construct" lacks definition. There can be many CHANGE TEAM constructs, and more than one of them can be active. Presumably what is meant is either (i) the innermost such construct or (ii) the innermost such construct whose END TEAM statement has a STAT= specifier. This needs to be explicitly stated. I note that in the case of executing code outside (but called from) a CHANGE TEAM construct, "innermost" has no meaning. (b) The TS has merely scratched the surface of the semantics that are being specified for stalled image handling; much more work needs to be done to clarify what is supposed to happen (e.g. which variables become undefined, etc.). Even for failed images some additional work appears to be needed... (c) I do not agree with the syntax for specifying a team variable in an image-selector, as we use double colons following type-specs and other related attributes, which this is certainly not. A single colon would be acceptable. (d) FAIL IMAGE is insufficiently specified. - The syntax is "FAIL IMAGE ". I see no purpose in using the BNF rule here. - "Execution of a FAIL IMAGE statement causes the executing image to behave as if it has failed." I think that should be "...become a failed image." - " No further statements are executed by that image." I think it would be clearer to state explicitly that image termination is not initiated by this statement, e.g. " Neither normal nor error termination is initiated, but no further statements are executed by that image." - "When an image executes a FAIL IMAGE statement, its stop code, if any, is made available in a processor-dependent manner." This is not only completely useless, but also missing any useful recommendation; e.g. for STOP and ERROR STOP we recommend "formatted output to [ERROR_UNIT]". (e) Clause 1 states "This Technical Specification does not specify formal data consistency or progress models. Some level of asynchronous progress is required to ensure that the examples in clauses 6 and 7 are conforming." - point 1: there are no examples in clause 6; - point 2: I found no useful examples in clause 7, by which I mean any that are bigger than 1 statement and that make any use of data consistency or asynchronous progress; - point 3: were there useful examples the question would not be whether they were conforming, but whether they WORKED on any conforming implementation of the TS. (f) Clause 1 continues "Developing the formal data consistency and progress models is left until the integration of these facilities into ISO/IEC 1539-1." We need to get started on this straightaway, not leave it to the last minute. _____________________________________________________________________ Robert Corbett My vote is 3) No, for the following reasons. I am still concerned about the features described in Clause 5.9 I understand that allowing stalled images to resume execution is a desired feature. I am not convinced that the feature as described in the DTS can be implemented without imposing a severe performance penalty. I understand that the ability to resume stalled images is an optional feature. I think that even an optional feature should be required to be implementable. I would change my vote if a description of how the feature could be implemented is provided, assuming that the proposed implementation is reasonable. (Implementation via an interpreter, for example, would not satisfy me.) I would like the proposed implementation to be based on hardware and systems software that is commonly available. A proposal for an implementation for x86/x64 Linux would be fine. The description of TS. A separate paper, not subject to approval would suffice. One implementation proposal I shall not accept is that the implementation should be the same as whatever the GCC implementation of C++ does for exception handling. I spoke with a member of Oracle's C++ team, and he said that Oracle's implementation of C++ exception handling could not do everything I told him the DTS requires. The DTS imposes some implicit requirements on processors. For example, some Fortran features require an implementation to perform synchronization. An implementation of a CRITICAL construct, a SYNC ALL statement, a parallel reduction, or input/output is likely to involve synchronization. If an image stalls on a data reference during the execution of a CRITICAL construct within the scope of execution of a CHANGE TEAM construct, I assume that the DTS assumes that a lock held by the image as part of the synchronization done for the CRITICAL construct must be released before execution of of the stalled image resumes. The DTS does not appear to impose a requirement that storage allocated during execution of a stalled image be released before execution of the stalled image resumes. Is the possible memory leak permitted? Fortran processors often acquire system resources during execution. For example, some operating systems allow a process to use at most a fixed number of locks and events. To avoid running out of the system resources, the process must release resources it acquired when it no longer needs them. Is it intended that the DTS require that a process release such resources as are no longer needed when an associated stalled image resumes execution, or is it a quality of implementation issue? ____________________________________________________________________ Nick Maclaren 3) No, for the reasons given in N2038, N2013 and other votes. I need to reiterate that neither response in N2039 even addresses my comments. I believe that incorporating the TS into the main standard will cause serious harm to Fortran, because the (semantic) difficulties cannot be resolved (let alone specified unambiguously) in the time available. Indeed, it is not clear even that they ARE soluble, because this TS is specifying a feature that is beyond the state of the art, and has been for half a century. I would be prepared to change my vote to abstain if the decision to incorporate it were reversed. ___________________________________________________________________ David Muxworthy 3) No, for the following reasons. The timescales specified in N2024 do not allow adequate time for the designs in N2040 to be implemented and proved to be robust and portable before being standardized. I would change my vote if "the next revision" on page iv were to be changed to "a future revision". ___________________________________________________________________ Dan Nagle Y2) Yes with comment. Comments in n2040, change "functions" to "subroutines" at [31:3] change "function" to "subroutine" at [31:17] __________________________________________________ Anton Shterenlikht A.3.1 In the first example, why x and y are defined as coarray variables? This fact seems to be completely unused. Also, is it not possible for image P to read x_dot_y (line 8) from image Q, before this variable has been defined on image Q in line 7? Is this what Note 7.4 is saying? In the second example, line 17, j_max is undefined. I think what was meant is: 16 integer :: j_max, j_max_location j_max = j 17 call co_max(j_max) __________________________________________________ Van Snyder My vote is, reluctantly 3) No, for the following reasons I am concerned by Robert Corbett's comments. I don't quite know what to do because I'm not expert in the area of synchronization mechanisms. One concern that especially troubles me is Robert's observation that there is no specification that locks held by a stalled image be unlocked when the image resumes execution, and critical sections in which stalled images are executing when they stalled be considered to be completed. I don't know what other "gotchas" lurk in similar areas. My contacts on the Ada committee assure me that exception handling can be done with very low overhead, but I have not asked them about interactions between exception handling, locks, critical sections, and synchronization. I'm tempted to abstain, but the lack of description of the interaction of resuming a stalled image with synchronization, locks, and critical sections leads me to vote no.