ISO/IEC JTC1/SC22/WG5 N1999 Result of the WG5 letter ballot on N1996 John Reid N1997 asked this question Please answer the following question "Is N1996 ready for forwarding to SC22 as the DTS?" in one of these ways. 1) Yes. 2) Yes, but I recommend the following changes. 3) No, for the following reasons. 4) Abstain. The numbers of answers in each category were: 0 for 1) Yes. 0 for 2) Yes, but I recommend the following changes. 10 for 3) No, for the following reasons (Bader, Chen, Cohen, Corbett, Long, Maclaren, Muxworthy, Reid, Snyder, Whitlock) 0 for 4) Abstain. The ballot has failed. J3 is requested to prepare a revised version that takes the comments into account. Here are the responses in detail Reinhold Bader 3) No, for the following reasons: * The resilience feature has not yet received sufficient attention, * There still exist some problems with ancestor team coindexing, * Some clarifying words about the event model as well as atomics may still be needed, * 13-359 points out a number of outstanding issues that need resolution. Details on specific issues are given in the following. Unless explicitly indicated otherwise, all [page:line] markers are references to N1996. Section 5: ~~~~~~~~~~ (5A) Ancestor coindexing in CHANGE TEAM construct: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Section 5.4 specifies how data transfer between teams can be arranged for inside a CHANGE TEAM construct. However, it is still not fully clear for an object that is addressed via A[outer :: i, j] how the corank and cobounds of A that are necessary to establish the coindex-to-image-index mapping are defined in case the referenced team is a descendant of the team in which the coarray was established. Based on e-Mail discussion on the Coarray-TS list, it seems that the best solution may be to allow (oblige?) the programmer to specify this via a RECODIMENSION statement inside the CHANGE TEAM block in the above-mentioned case. An example that indicates how this may work is: REAL, ALLOCATABLE :: a[:,:], b[:] TYPE(team_type) :: outer, inner ALLOCATE(a[nx, ny, *], b[*]) FORM TEAM (outer, ...) CHANGE TEAM(outer) : ! (X) : ! initialize a and b using "outer"-local coindexing FORM TEAM (inner, ...) CHANGE TEAM (inner) RECODIMENSION :: a[outer :: p:*], b[outer :: *] : a[outer :: i] = ... ! a has new corank and cobounds b[outer :: j] = ... ! b retains original corank and cobounds END TEAM END TEAM It would be useful to also permit a RECODIMENSION statement for local coindexing. In the above example, statement (X) could then read RECODIMENSION :: a[ny,p:*] ! same as a[outer :: ny,p:*] which would improve support for achieving consistency of coindexing between the team-local context in "outer" and the ancestor-team context in "inner". (5B) Comments on normative text in 5.1: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [9:4-5] mentions "image indices" in a context that actually refers to coindexing. [9:14] appears to partially duplicate text for semantics specified in [9:36]; also it may cause confusion because the words "The current team is ..." also appear as a definition in [9:3-4]. (5C) Image failure: ~~~~~~~~~~~~~~~~~~~ (1) It is desirable that the programmer can determine whether or not the implementation supports continuing in the face of image failure. How about requiring STAT_FAILED_IMAGE to be a negative value if this support is not at all available? (2) Imagine that a coarray code runs with four images on two compute nodes, with two images per node. If the interconnect between the two nodes fails, an implementation supporting resilience may well continue executing all four images, but each image pair will have a different assessment of which images have failed. In my view, the draft TS is lacking some specification that provides a minimal amount of consistency. For example, one could specify that failure j implies that the implementation decompose the initial set of images into subsets Aj(1),...,Aj(Nj),Bj with the following properties: (a) for each k in 1,...,Nj, the images in Aj(k) continue execution, and consider all images outside Aj(k) failed. (b) for a failure i that occurs after failure j, each Ai(k) (k=1,...,Ni) must be a subset of some Aj(kk), and Bi must be a superset of Bj. (I know that saying "after" is fraught with peril, but determination of the temporal order in which error detection is performed could surely be left to the implementation). (3) As example (A.2.1) shows, using the resiliency feature is not conditioned on the use of teams. I suggest moving the description of the feature to a section of its own. (4) I'm wondering whether some additional words should be put into normative text about the definition status of variables whose values depend on references to coarrays on failed images. Presently there is only the NOTE 5.6, but for an implementation that can actually continue execution in the face of a coindexed reference to a failed image this may be insufficient. (5) I suspect there are some problems with [30:30-41]; the first one is that in [30:30-31] SYNC TEAM does not show up, presumably because the error detection is restricted to the current team; however this leaves the situation for SYNC TEAM undefined. The second one is that [30:39-41] appears to mostly remove the synchronization properties of image control statements for the non-failed images even in case the error condition is STAT_FAILED_IMAGE. If this is intended, the resilience aspect of example (A.2.1) will not work. However, I think even example (A.1.2) cannot, in general, work properly in this case, for at least two reasons I can think of: * re-entering a team execution context via CHANGE TEAM may not have appropriate synchronization properties (unless perhaps the spec refers to the *new* team when saying "current team", but this is not fully clear) * deallocation of coarrays inside the team execution context should surely follow the same rule as SYNC ALL, leading to a race condition on the non-failed images. So the following questions arise: * was there a particular reason to remove the synchronization properties of image control statements, in case STAT_FAILED_IMAGE occurs? If so, why was it not applied to (DE)ALLOCATE? * would it not be more appropriate to describe the effect of STAT_FAILED_IMAGE for each image control statement individually, retaining as much synchronization as possible? Otherwise I fear that invalid data will flood the program while it is attempting to recover. (6) With the most recent F08 interps the MOVE_ALLOC intrinsic may now be an image control statement. Therefore it will be necessary to add a STAT argument to this subroutine, because otherwise code using it cannot be made resilient. Also, a MOVE_ALLOC that uses coarray actual arguments cannot be PURE (but fixing that may be outside the scope of the TS). (7) Note 5.7 points out that image 1 plays a special role because of the standard input being preconnected to that image; however in the context of fail-safe execution this may not be that relevant since the recommended practice is to specify input files via command line arguments anyway. It may also be worth pointing out that standard error and standard output will probably get lost if image 1 is among the failed images; again this does not necessarily adversely affect fail-safe execution if the program's I/O is appropriately set up. (5D) Note 5.2 ~~~~~~~~~~~~~ The first line of that NOTE has the text "array A(0,N+1)" which presumably should read "array A(0:N+1)". Furthermore, I assume that array elements A(1) and A(N) are updated by the iteration procedure, and therefore a second "SYNC TEAM (INITIAL)" statement needs to be inserted just prior to END DO. (5E) Team identity ~~~~~~~~~~~~~~~~~~ Given the quite complicated constraints on teams not being allowed in a variable definition context, and also the addition of GET_TEAM, it might be worth considering a definition of team that is more decoupled from a concrete instance stored in a particular team variable. A team might be characterized by * the subset of images in the initial team * the value of its ID * the mapping of local image indices to initial-team image indices. (based on this, a comparison operator might even be provided). This would also allow prefabrication of teams via FORM TEAM statements executed in the initial team that can subsequently be used in nested CHANGE TEAM blocks, subject to inclusivity rules, i.e. requirements that assure that it is always a superset of images of any referenced subteam that invokes CHANGE TEAM. It is unclear to me whether such prefabrication is permitted under the present draft's provisions - if so, there appears to be a lack of consistency in any case. Section 6: ~~~~~~~~~~ Given the discussion on events on the coarray-ts mailing list as well as the Editor's comments on clause 6 (13-359 I-6[a-c]) I think proper usage of events involves the following requirements: (6A) on the programmer: the number of posts to an event in otherwise unordered segments must always be guaranteed to match against the same number of waits. A situation that illustrates this is Image 1 Image 2 Image 3 A2 A3 A1 Post ev[1] Post ev[1] Wait(1) ev B1 Wait(2) ev C1 where the question is: how is B1 ordered against A2 viz. A3? Given the present wording in N1996 [16:8-13] and NOTE 6.2, the following interleavings of atomic event updates might occur: case 1 case 2 Post ev[1] on Image 2 Post ev[1] on Image 3 Post ev[1] on Image 3 Post ev[1] on Image 2 Wait(1) ev on Image 1 Wait(1) ev on Image 1 case 3 case 4 Post ev[1] on Image 2 Post ev[1] on Image 3 Wait(1) ev on Image 1 Wait(1) ev on Image 1 Post ev[1] on Image 3 Post ev[1] on Image 2 the first two of which imply that B1 is ordered against both A2 and A3, but there is no information available to the program which of the four actually happened in any run! So the answer is indeed: B1 may be ordered against either A2, or A3 or both. It follows that both Waits need to be performed by a program that wants to ensure segment ordering against both posting images (i.e., C1). (6B) On the implementation: It must be guaranteed that event counts as seen by EVENT WAIT and EVENT_QUERY on the image on which the event is located will eventually see the updates resulting from EVENT POST statements issued on any image irrespective of segment ordering. See also the comments on the atomic examples below. It may be appropriate to delete querying on remote events if a stronger requirement is not considered desirable. Because of (6A) and (6B), I tend to agree with the Editor's items I-6a and I-6b, but believe that I-6c is not needed, except perhaps for diagnostic purposes. In particular, example (A.2.2) starts out violating (6A) and tries to get around this by using MAX_COUNT; multiple producer programs should use a different method (e.g. teams with one task per team acting as a producer) in order to improve scalability. John Reid has suggested a much better example (tree structure from a multifrontal solver) that should be used as a replacement for (A.2.2). Section 7: ~~~~~~~~~~ (7A) FAILED_IMAGES: ~~~~~~~~~~~~~~~~~~~ It may not be desirable to exit the team execution context to obtain information about which images in the initial team have failed. Therefore I suggest adding an optional DISTANCE argument to the FAILED_IMAGES intrinsic. (7B) GET_TEAM: ~~~~~~~~~~~~~~ Because of [11:4-5], the second example's statement A [PARENT_TEAM :: 1] = 4.2 in [25:1] is non-conforming (the text [11:4-5] was introduced to address my comment N1989/(A.2.3)). My conclusion is that at the very least the DISTANCE argument should be removed from this function. The second example might read SUBROUTINE TT (A) USE, INTRINSIC :: ISO_FORTRAN_ENV REAL :: A[*] TYPE(TEAM_TYPE) :: INVOKING_TEAM, NEW_TEAM INTEGER :: I, ID CALL GET_TEAM(INVOKING_TEAM) ID = ... ! calculate team membership FORM TEAM(ID, NEW_TEAM) CHANGE TEAM(NEW_TEAM) ... ! process A on each team and define I SYNC TEAM (INVOKING_TEAM) ... = A[INVOKING_TEAM :: I] ... ! further processing not involving A END TEAM END SUBROUTINE In the above situation I'd consider it advisable having a separate subroutine dummy of TYPE(TEAM_TYPE) anyway, so the usefulness of GET_TEAM reduces to producing the value of the initial team. (7C) Intrinsics in section 7.5: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In [26:11] and [26:21] brackets "()" should probably be added after the intrinsic names. Annex A: ~~~~~~~~ Example (A.1.2): ~~~~~~~~~~~~~~~~ Replace [36:15] by "IF (this_image() <= images_used) THEN read_checkpoint = .FALSE. ELSE read_checkpoint = .TRUE. END IF" Reason: images outside the working set will always need to read a checkpoint once activated. In [36:36], replace "SUBTEAM" by "TEAM" (renamed construct). Example (A.2.1): ~~~~~~~~~~~~~~~~ In [37:36], add ")" after "num_images()" In [37:37], add ")" at the end of the statement Delete [37:48] and [38:1]. Image failure here may or may not imply that the corresponding work item is lost. In any case, the program as written here cannot re-send it. An appropriate comment could be added as a replacement for the two deleted lines. Example (A.2.2): ~~~~~~~~~~~~~~~~ See the discussion near the end of the comments on Section 6 above. Examples for atomic usage: ~~~~~~~~~~~~~~~~~~~~~~~~~~ Section A.3.2.2 gives a number of examples that produce possibly surprising results. I'd also like to see some that illustrate useful expected behaviour. For example, INTEGER(atomic_int_kind) :: x[*] = 0, z = 0 CALL ATOMIC_ADD(x[1], 1) ! (A) IF (THIS_IMAGE() == 2) THEN wait : DO CALL ATOMIC_REF(z, x[1]) ! (B) IF (z == NUM_IMAGES()) THEN EXIT wait END IF END DO : wait ! (C) END IF NOTE 13.1 in the Fortran 2008 standard says that such use is processor- dependent. However, I'd like to know the answers to the following questions for the above example: (1) Is the "wait" loop guaranteed to complete? If this is not the case, I think some words should be added in the normative text defining the atomic's semantics along the lines "The effect of the complete sequence of executed atomic updates shall eventually become visible to all images even if no segment ordering occurs." Performance variations we must live with under the regime of QOI, but the purpose the atomics were designed for should be fulfilled by every implementation. It seems to me that this may also be the root cause for Nick's worries about circularity for the event model; the same principle should therefore also apply to the count stored inside a TYPE(event_type) variable (maybe only locally?); this is required to have guaranteed progress in example (A.2.1), [37:46],[38:7]. (2) Assuming that SYNC MEMORY statements are added to the above immediately before (A), and immediately after (C), is it guaranteed that the segments preceding the first SYNC MEMORY on all images are ordered against the segment following the second SYNC MEMORY on image 2? _______________________________________________________________________ Daniel Chen 3) No, for the following reasons: 1. From Van and Reinhold's comments, I think there is indeed an issue with corank, cobounds and coindex mapping inside a CHANGE TEAM construct. I think the RECODIMENSION as well as the coassociation proposal should be further studied and considered. The following are some minor comments to N1996. 2. [16:] 6.4.Should there be a constraint that is the same as C(604) for the event variable in an EVENT WAIT statement? 3. [20-22] 7.4.7: SOURCE argument for all the collectives should be coarray. N1996 only explicitly states it for CO_BROADCAST. 4. [22:] 7.4.9. It states SOURCE shall not be polymorphic. The same wording should be added for RESULT argument. _______________________________________________________________________ Malcolm Cohen 3) No, for the following reasons. a. The design is still under active technical development; in particular, - the team design has not reached consensus, with additional features and changes being requested, - the team design needs (at a minimum) much more explanation, - the event design has been recently changed, and it is far from clear that the new version is correctly described and sufficient for purpose. b. Many technical and editorial problems and ambiguities as reported by others. I continue to be of the opinion that an explicit formal memory model for atomics would be a very good idea, but would not vote No purely on that alone. _______________________________________________________________________ Robert Corbett 3) No, for the following reasons. I am persuaded by the comments of the who voted earlier that the draft TS is not ready to be forwarded to SC22. In additional to those comments, I have an editorial comment, which by itself would not have caused me to vote no. The second sentence of Clause 3 seems to be out-of-place. The sentence is obviously true, so obviously true that I wondered why it was worth stating. I was told that it was needed to make it clear that the name ISO_FORTRAN_ENV in Clauses 3.3 and 3.4 referred to ISO_FORTRAN_ENV as extended by the TS. Again, I thought that to be obvious, but if it is worth stating, it should be stated explicitly. I suggest deleting the second sentence of Clause 3 and replacing Clauses 3.3 and 3.4 with 3.3 event variable scalar variable of the type EVENT_TYPE(6.2) from the intrinsic module ISO_FORTRAN_ENV as extended by this Technical Specification. 3.4 team variable scalar variable of the type TEAM_TYPE(5.2) from the intrinsic module ISO_FORTRAN_ENV as extended by this Technical Specification. _______________________________________________________________________ Bill Long No, for the following reasons. I. Minor editorial fixes. ------------------------- 1) In 5.3, [10:17] delete "scalar". {The rule R504 for a team variable already says "scalar", so it is redundant here.} 2) In 5.3 the paragraph at [10:22-23] effectively prohibits deallocation of a team variable for an active team construct. This seems to make [9:34] redundant. Propose to delete [9:34]. 3) In 5.4 Note 5.2 line 1, "A(0,N+1)" -> "A(0:N+1)". 4) In 5.5 [11:15] delete "It is an image control statement." and insert "The FORM TEAM statement is an image control statement." at the beginning of [12:1]. The merge the paragraphs [12:1-2] and [12:3-6]. {Move image control statement bit to para where we discuss the meaning. Parallel to other subclauses describing statements that are image control statements.} 5) 5.5 Note 5.4 line 1, replace "coarrays regarded" with "corresponding coarrays on each image representing parts of a larger array". {Avoid potential confusion about coarrays being global objects.} 6) 5.7 Note 5.7 line 2, delete "on modern hardware". {The word "modern" becomes dated, inconsistent with the nature of a standard.} 7) In 6.3 [15:29] replace "event variable's count" with "count of the event variable". {Parallel wording to EVENT WAIT.} 8) In 7.1 [17:8] replace "intrinsics" with "intrinsic procedures". {Subroutines and functions are pure, not 'intrinsics'.} 9) In 7.4, for the OLD arguments at [18:19], [19:6], [19:37], [20:10], replace "shall be a scalar of type integer with the same kind as ATOM" with "shall be a scalar and of the same type and kind as ATOM". {Wording more like ATOMIC_CAS, and allows for future possibility that additional types are allowed for ATOM.} 10) In 7.4.3 [19:27], replace "prior to the comparison" with "used for performing the comparison operation". {Clearer and more like similar wording in other examples.} 11) In 7.4.9 [22:33] replace "continues until" with "terminates when". {Possibly clearer - current text is not specific about what more might happen.} 12) In 7.4.13 [24:19], replace "The corresponding actual argument" with "It". {The argument descriptions for intrinsic procedures are for the actual arguments. See f2008 [325:5-6].} 13) In 7.5.2 [26:27] after "image index" insert "of the invoking image". {Clarification} 14) In 8.11 [33:27-28] replace "function" with "subroutine" twice. {From Dan email.} 15) Noted misc fixes in Reinhold's ballot at [36:15], [36:36], [37:36] and [37:37], all of which appear valid. II. More significnant fixes/questions. -------------------------------------- 1) In 2 Normative reference, do we need to include references to the Corrigenda? If we do, how does this affect the Edits clause? 2) In 5.2 [9:34] could be clarified to begin "The team variable specified in the CHANGE TEAM statement of the current change team construct...shall not be deallocated." {It is possible for there to be multiple team variables with the same value. Ones not appearing in an active CHANGE TEAM statement should be OK to deallocate.} Note: See I-2 above. If that is accepted, this edit is moot. 3) In 5.4 [11:1-4] replace the first sentence of the para with "If appears in an image selector its value shall be the same as the team variable specified in the CHANGE TEAM statement of a currently executing change team construct or the initial team. The image index computed using the specified cosubscripts is interpreted as an image index in the team specified by ." {The wording about FORM TEAM and GET_TEAM is duplicated in [10:19-21]. Furthermore, the original text was unclear that the value relative the the team is the image index.} 4) In 5.7 [12:24] Is the term "collective activity" well defined? 5) In 5.7, after Note 5.7, should we include a note saying that continued execution can depend on the nature of the program/algorithm? 6) Subclause 6.5 might not be needed at all depending on the outcome of the discussion on MAX_COUNT. 7) In 7.3 [17:28-29] This para is overkill. It is allowed, for example, that the VALUE argument be a coarray, and there is no such requirement in that case. You could also have a coarray STAT argument. Needs to be restricted to the argument on which the collective operation takes place. 8) In 7.4, in the descriptions of the ATOMIC_* subroutines, we use the "becomes defined with" terminology frequently. In other parts of the document we have moved to "is assigned". Do we want these changed as well? 9) In 7.4.9 CO_REDUCE [22:16] the statement "and the function shall be executed by all images of the current team" is not true. It is allowed, for example, for just one of the images to do the whole computation. We intend that, for any image that does execute the function, it is the same function. 10) In 7.4.9 [22:17] Is it allowed for the RESULT argument to be polymorphic? Seems not symmetric with SOURCE. 11) 7.4.11 [23:24] In EVENT_QUERY, there should be an ERRMSG argument as well. Compare with the GET_xxx intrinsics. 12) In 7.4.13 [24:19-20], is the sentence "The corresponding ... ancestors." needed? The sentence is poorly worded, and redefining what is actually intended here is already prohibited elsewhere. Propose to delete rather than repair. If that is accepted, I-12 above is moot. III. Issues not yet resolved. ----------------------------- 1) The MAX_COUNT feature in EVENT POST has problems (expanded from 13-359). The intention is that operations on the count variable of an event be atomic. That is easy for a plain EVENT POST (atomic add 1) and EVENT WAIT (atomic add -1). This also is the case for an EVENT POST with a COUNT= specifier (atomic fetch-and-add 1) which would provide potentially useful information to the executing image. Similarly, and EVENT CLEAR statement could be implemented as (atomic and 0). Alternatively, an EVENT CLEAR could be implemented as an EVENT WAIT with a CLEAR="yes' qualifier, for example. Or, with richer semantics as an EVENT WAIT (UNTIL_COUNT = ) form that would wait until the count got to the indicated level and then subtract the UNTIL_COUNT value from the event variable count and complete. These alternatives need consideration, as they provide useful functionality and can still be implemented atomically. However, including a MAX_COUNT specifier in an EVENT POST statement can lead to a race condition. This is fundamentally two operations - a fetch of the current value, followed by a decision on whether to increment. It is possible to get around this with repeated retries with a compare-and-swap operation, but the implementation will be significantly slower and potentially deadlock. Therefore, I think the current MAX_COUNT= specifier is problematic and needs repair or removal. Note that there is a special case that would work - a binary only version that only sets the count to 1 if it is currently 0 (atomic compare-and-swap). As long as the user never executes a 'non-binary' event post on that event variable this could be usable. That involves either restricting MAX_COUNT to be 1 if it is specified, or to change the spelling to something line BINARY='yes', with the default 'no'. 2) EVENT QUERY loose ends (from 13-359). In 7.4.11 EVENT_QUERY, the COUNT argument is assigned the value 0 if an error occurs. Not very informative. Perhaps count=-1 would be more useful in the error case. In 7.4.11 EVENT_QUERY, if the STATUS argument is not present and an error condition occurs, does the program terminate? It appears not. That is the same as for GET_COMMAND and friends with a STATUS argument. But the opportunities for failure here are greater (EVENT image is failed, for example). Should a valid value be STAT_FAILED_IMAGE? 3) Deallocation of a saved coarray at the end of a CHANGE TEAM construct (from 13-359). Note 5.1 explains that an implementation is responsible for deallocating coarrays at the end of an CHANGE TEAM construct. This is not trivial, since a coarray with the SAVE attribute that is allocated in a subprogram called will need to be tracked by the runtime in case the subroutine is called inside a CHANGE TEAM construct. No suggestion for a change - just a heads up to implementors. 4) Do we want a cobounds remapping facility? This would be a new feature. Background and discussion follows. In N1996, the TS 18508 draft from J3 meeting 202, the facility provided by the modified image selectors allows references to coarrays on images that are not part of the current team. This is enabled by syntax that specifies a different team that is in effect for that reference. The team has to be an ancestor of the current team, and include the image specified. The cobounds for a coarray can only be specified in a declaration or allocate statement. Changing to a different team does not alter the cobounds or corank of an existing coarray. A coarray has only one set of bounds at a given time, and only allocatable coarrays can change their cobounds during program execution. The identification of the correct physical PE containing the coarray being referenced using the new syntax involves two steps: Using the specified cosubscripts and the current cobounds for the coarray, an image index is computed. The image index is then converted to a physical PE by a team-specific mapping. Suppose a coarray, REAL :: A(:)[N1,N2,*] exists (either static, or allocatable and allocated with those cobounds) on each image on entry to a CHANGE TEAM construct. For statements executed during execution of the CHANGE TEAM construct: Case 1: No team is specified in the reference: X(:) = A(:)[i,j,k] This reference is relative to the current team. The cobounds used to compute the image index are the ones that existed when the CHANGE TEAM construct began. If the computed image index is outside the range 1..num_images() for the current team, the reference is in error. If the image index is in the valid range, the mapping between image indices and physical PE for the current team is used to identify the physical PE containing the referenced coarray. The computation of the correct coarray location is unambiguous in this case, though the selection of the values [i,j,k] might not be intuitive. Case 2: An ancestor team, pteam, is specified in the reference: X(:) = A(:)[pteam :: i,j,k] This reference is relative to the team specified by the value of the team variable pteam. The computation of the image index is exactly the same as in Case 1, with the current cobounds of A(:) used. The value of num_images() used in the range check for a valid image index is the number of images in team pteam. The image identified has to be an image that is part of team pteam; otherwise an error occurs. The mapping between the computed image index and a physical PE location for A is the one for team pteam. For the team-modified image selector syntax to work, the implementation would need to keep track of the mapping and num_images() information for all ancestors of the current team, and associate that with the team variable. This is probably the case anyway. It is not necessary to keep track of cobound information separately for each team - that information is tied to the coarray, not the team. As noted in N1983, in the comments on the previous TS draft from Van Snyder, the correct values for the cosubscripts in Case 1 are not intuitive unless the corank is one. The existing team-modified syntax in Case 2 does not address that problem. A facility enabled by a RECODIMENSION statement has been discussed on coarray-ts to address this problem. A RECODIMENSION defines the current cobounds for a coarray that exists during execution of the construct and is associated with the previously existing coarray of the same name. The cobounds and corank of the construct coarray may be different from those of the existing coarray. The association is similar to argument association. This is superior to actual argument association in that a procedure call is not involved. How use of this feature would affect the ability to access the corresponding coarray on an image outside the current team (using a team-modified image specifier) is not quite as clear. Alternatively, a syntax similar to the associate construct, as suggested by Van, could be employed. That has the advantage of using a different name for the construct entity, which would permit use of the original name for accesses outside the current team. _______________________________________________________________________ Nick Maclaren 3) No, for the following reasons. I regard comments A, B, H, I, J and K as the most serious, as they are either not fixable by additional function or wording changes or certain to cause massive problems. Many of the comments in N1989 have not been addressed. These include (with some slight modifications): Teams ----- Comment A --------- 5.2 and 5.3, p9:16-*, p10:*-35. It is still not clear whether TEAM_TYPE objects have value or association semantics. C502 and C503 are not enough, because of the implicit copying implied by passing assumed-shape arrays to explicit-shape or assumed-size ones, and the wording (e.g. R502) says 'variable'. This is linked to the next point, but is not the same. However, I forgot the VALUE attribute and vector subscripts. Fortran 2008 12.5.2.3p4 abd 16.6.1.6p4 make it very clear that a VALUE dummy argument and dummies corresponding to vector subscripted arrays NOT the same variable as the actual argument. While it could be said that these are variable definition contexts, they are NOT in the list in 16.6.5. Is it permitted to have VALUE dummies, or vector subscript actual arguments, and where is that stated in normative text? Either the above loopholes must be closed, or TEAM_TYPE variables must be stated to have value semantics (in which case forbidding assignment is not needed). I cannot propose edits, as I have never discovered what mental model other people are using. Comment B --------- 5.3, p10:28-35. Executing a common CHANGE TEAM statement the same number of times is not enough, because the variable could be a dummy argument associated with a different team on different images. There needs to be an explicit restriction (probably in lines 14-16) that all variables must have been created by the same execution of the same FORM SUBTEAM statement with the same team-id. TYPE(TEAM_TYPE) :: a[NUM_IMAGES()] DO i = 1,NUM_IMAGES() FORM TEAM (i,a(i)) END DO CALL Fred(a(i)) SUBROUTINE Fred (x) TYPE(TEAM_TYPE) :: x CHANGE TEAM (x) ... I cannot find anywhere in the text that is forbidden, but it clearly makes no sense. In particular, 5.3 p10:28-35 becomes nonsense if it is allowed. This is simple to fix. 5.3 p10:19-21. After "intrinsic subroutine GET TEAM (7.4.13).", add: "All members of the team specified by team-variable shall execute the CHANGE TEAM statement, and team-variable shall specify the same team on all images." Comment C --------- 5.6 p11:17+. There is nothing said about when resources may be released, and no mechanism for the user to free them. This is not reasonable, and there needs to be some defined way for a programmer to avoid memory leaks when using FORM SUBTEAM heavily. Note that allowing deallocation is NOT enough, as cleaning up teams needs synchronisation, just as creating them does. Comment D --------- 7.4.15 p26:5-7. I can find no guarantee that the subteam id. is assigned in a defined order, and hope that is not the case. The example comments should say "Code for half of the images in the current team" and "Code for the other half of the images in the current team". Events ------ Comment E --------- 6.3 p15:34. This still makes no sense, as an image control statement cannot occur within a segment! It should say something like "How sequences of posts that are not ordered by other segment ordering rules interleave with each other is processor dependent." Comment F --------- 7.4.11 p23:25-36. It needs to say that EVENT_QUERY may be used in segments that are unordered with respect to EVENT POST on the same variable. Collectives ----------- Comment G --------- 7.4.9 p22:16. This makes no sense and does not address the comment in N189, anyway. A reduction over N images needs only N-1 pairwise operations. It would be far better to leave it completely open and change: ", and the function shall be executed by all the images of the current team." to: ". It is unspecified on which images it will be called, how many times and on which arguments." New Substantive Points ---------------------- EVENT POST MAX_COUNT -------------------- Comment H --------- Upon thinking of how to implement these facilities, I realise that the availability of MAX_COUNT causes a serious performance loss. Without that, EVENT POST can be implemented by a simple fence and message sent to the event owner. With that, it needs to wait for a response from the event owner, which will often cause the posting image to block until the owning image reaches an active coarray statement. This is also noted in 13-359. Requiring a maximum count of 1 would have the same loss in performance, but they would also reduce the model to one whose semantics are understood. As I have said before, I would regard that as a price worth paying. In short, I think that MAX_COUNT is a very bad idea, as it combines the disadvantages of both general and binary semaphores. EVENT_QUERY ----------- Comment I --------- There have been multiple inconclusive Email debates on exactly what is specified, with no consensus on what should be said in normative text. I have been convinced by them that it is not possible to produce a consistent specification for EVENT_QUERY without introducing a synchronisation model by the back door. This is particularly serious because of the EVENT_QUERY example in A.2.1 pp37-8. One of the issues raised was whether programs like the following are conforming: Example event_1: INTEGER :: x[*] On image 1 On image 2 POST EVENT (q[2]) CALL EVENT_QUERY (q, n) x[3] = 123 IF (n >= 2) THEN POST EVENT (q[2]) WAIT EVENT (q) x[3] = 456 END IF Similarly, it is unclear whether the following program is required to complete: Example event_2: On image 1 On image 2 POST EVENT (q[2]) DO CALL EVENT_QUERY (q, n) IF (n > 0) EXIT END DO Also, does the same answer hold if image 1 is the event owner? These are NOT minor points, because they have a major impact on how EVENT_QUERY can be implemented. In particular, if those examples are to work, EVENT_QUERY has to be implemented using very similar mechanisms to EVENT POST with MAX_COUNT=0. Even if we restricted EVENT_QUERY to local operation, it would have to probe for incoming posts for example event_2 to work. Example event_1 would not add any extra inefficiency, but would complicate the logic for synchronisation on many systems. At this late stage, I think the only feasible solution is to omit EVENT_QUERY entirely, pending a memory model. Atomic Subroutines ------------------ Comment J --------- A.3.2 is very welcome, and clarifies the current atomic subroutines considerably. Unfortunately, it is not enough to avoid problematic issues with the new atomic subroutines. The underlying problem is that the word 'atomic' has many possible meanings, has drifted over time, and not all of these make sense with the new atomic subroutines. There is an official ISO dictionary, but I have not been able to access a copy. Either Fortran needs to refer to some reasonably authoritative and explicit definition or it needs to define what it means. In particular, it has more-or-less the following meaning: An operation completes in its entirety or makes no change to system state, without any other agent being able to see an intermediate condition, but without ANY implication of data consistency. This is one of the meanings used by Intel (see the Intel 64 and IA-32 Architectures Software Developer's Manual, volume 3, 8.1, Locked Atomic Operations and 8.1.1 Guaranteed Atomic Operations). http://www.intel.com/content/www/us/en/processors/ architectures-software-developer-manuals.html However, there is a problem here, which is whether the term 'atomic' implies 'coherence', which essentially means that updates cannot simply get lost even if they occur in parallel. This was not needed when atomic operations were used for interrupt handlers, but was rapidly discovered to be critical for parallelism when multi-cpu computers started to be used. The experts I have contacted have told me that modern computer science convention is to assume it, but that it is not implicit and any rigorous specification should state it explicitly. The C++ standard does just that (see 1.10 Multi-threaded executions and data races, paragraph 6). Note that Intel atomic accesses are coherent by default, but incoherent atomic accesses are possible if the above rules are followed but the MTRR of the memory is set to WB (see 8.2.5 Strengthening or Weakening the Memory-ordering Model). With the existing atomic subroutines, the lack of coherence is not observable, provided that it does not cause two simultaneous atomic definitions to fail. However, without coherence, even simple use of (say) ATOMIC_ADD to accumulate totals is likely to give the wrong answer. See the next comment for a proposed solution. Comment K --------- Unfortunately, the above problem is made worse by the new atomic subroutines, which are functionally equivalent to OpenMP's 'capture' atomics. I enquired of the same experts and the reason that most papers do not describe composite operations like update and capture is that doing so is much harder than for simple loading and storing; in particular, C++ does not have any such concepts in its memory model. Take the following program: INTEGER(ATOMIC_INT_KIND) :: x[*] = 0 INTEGER :: n = 0 On image 1 On image 2 CALL ATOMIC_OR(x[3],z'1',n) CALL ATOMIC_OR(x[3],z'1',n) PRINT *, n PRINT *, n Printing 0 and 0 could very reasonably be said to be a valid optimisation, not least because the assignment to the OLD value is not part of the atomic operation. Indeed, the same could be said even if image 2 ORed z'2' into the value rather than z'1'. Consider the following program: INTEGER(ATOMIC_INT_KIND) :: x[*] = 0 INTEGER :: n = 0 On image 1 On image 2 CALL ATOMIC_OR(x[3],z'1',n) CALL ATOMIC_OR(x[3],z'2',n) PRINT *, n CALL ATOMIC_OR(x[3],z'4',n) CALL ATOMIC_OR(x[3],z'8',n) PRINT *, n This can obviously print 8 and 9, but is it allowed to print 0 and 3? And, worse, is it allowed to print 4 and 12? Fortran must do the same as C++ and say what it means, even if it does not specify a memory model; allowing such lunacies as the above (and they ARE plausible optimistions, even the second) is a recipe for massive user confusion. A possible solution would be to modify C++'s rule, and change sentence 2 of Fortran 2008 13.1 paragraph 3 from: "The effect of executing an atomic subroutine is as if the subroutine were executed instantaneously, thus not overlapping other atomic actions that might occur asynchronously." to "The effect of executing atomic subroutines on a single atomic object is as if the subroutines were executed in some unspecified serial order, with none of the accesses to that object in any one subroutine execution interleaving with those in any other." I believe that is the bare minimum necessary for sanity. _______________________________________________________________________ David Muxworthy 3) No, for the following reasons. Clearly, consensus on the design has not yet been achieved. Whether the eventual design can be implemented satisfactorily on multiple platforms is still to be proved. The statement about inclusion in the next revision of ISO/IEC 1539-1 (Introduction paragraph 5) should refer instead to “a future revision”. ______________________________________________________________________ John Reid 3) No, for the following reasons. 1. Rather hurriedly in Delft, we added the option of a team variable appearing in an image selector, e.g., a[parent::i,j]. The intention was to allow the cosubscripts of a coarray declared in an ancestor to be interpreted in exactly the same way in a change team construct as in the ancestor, for example, when performing halo exchanges. This does not work well if the coarray is a dummy argument because its name and the names of its ancestors are unknown. J3 therefore added the intrinsic GET_TEAM to place a copy of the value of the team variable of the ancestor at a level DISTANCE in a local team variable. It seems to me that a much better solution would be to specify the distance directly in the image selector, e.g., a[distance::i,j]. It is much simpler and there is far less scope for inconsistent setting of team variables. I would like GET_TEAM to be removed entirely. 2. We have added an optional DISTANCE argument to NUM_IMAGES. We need to do this also for LCOBOUND and UCOBOUND so that the coshape of an ancestor can be determined. 3. Add a new subroutine FAIL_IMAGE() whose effect is to cause the executing image to behave as failed. This is needed for the testing of a program that is intended to continue execution in the presence of failed images. An optional argument IMAGE might be added to give the effect of communication between the executing image and image IMAGE having failed - each would continue executing but see the other as failed. 4. I support the concept of RECODIMENSION that Reinhold Bader suggests in his ballot. I also support the view that Malcolm Cohen expressed in an email: "I would prefer different syntax to be used when one intends to re-codimension an array, perhaps RECODIMENSION :: ...whatever and this would not be a general specification statement, but part of the CHANGE TEAM syntax which would then be [ ]... And rather than "modifying the attribute of an existing object" (horrible), it would be declaring a "construct entity that is associated with the local entity of the same name". We do already have construct entities that are associated with outer-scoped objects (via ASSOCIATE and SELECT TYPE) so this is not a new concept. In any case one must write quite a bit of new text to specify how this is going to work, but making it a construct entity is probably easier than making the CHANGE TEAM construct into a scoping unit thus wheeling out host association (already complicated) and then adding more complication to it. > I suppose a logical question would be whether this should also be > allowed in a BLOCK construct. Perhaps that should be left as an > integration issue. No, it cannot be left as an integration issue. It should be either part of the CHANGE TEAM syntax (and described as a construct entity), or a normal specification-stmt in which case the CHANGE TEAM construct ought to be a scoping unit with a specification-part. Or some slight tweak of those major options." 5. (An edit) [11:15-16] Consider the sentence "The value of team-id species the team to which the executing image belongs." This is nonsense: the current team is the team to which the executing image belongs. Replace it by "The value of team-id species the new team to which the executing image will belong." _______________________________________________________________________ Van Snyder 3) No, for the following reasons. I have been on holiday in Asia for most of November. I have not had the time to study the entire DTS in detail. Therefore, I comment here on only one aspect of the DTS. I remain unconvinced that teams have been correctly designed, but if so they are not sufficiently well described. The phrase "image indices are relative to the current team" in Subclause 5.1 does not adequately explain what parts of a coarray are accessible in a subteam. The mapping from coarray coelements accessible in the parent team, and their cosubscripts, to coarray coelements accessible in the subteam, and their cosubscripts, needs to be more explicitly explained. More importantly, it is not possible to change the coextents of leading codimensions of coarrays of rank greater than one when a subteam commences execution. This means that teams are not very useful if one has coarrays of corank greater than one. Finally, it is not possible for a subteam to access more coelements than the number of images in the subteam. This makes it difficult to handle cross-boundary effects in, say, an elliptic PDE problem, without using cosubscripts relative to a specific ancestor team. This is a fundamentally bad idea, that is antithetical to one of the reasons advanced for providing teams: software reuse. Reinhold proposed a RECODIMENSION statement. This would presumably have an effect on leading codimensions analogous to the effect of a DIMENSION statement on leading dimensions of a non-coarray dummy argument of a procedure. The attached text file provides more detail concerning what I believe to be the problems, and proposes a scheme of coarray coassociation similar to the association established by the ASSOCIATE construct. ....................................................................... 1. Problem description ---------------------- Within a subteam created by a CHANGE TEAM construct, it is desired to access a portion of a coarray belonging to the parent team, using cosubscripts such that the range of accessible coelements, taken in coarray coelement order, depends upon the subteam. It is undesirable to require the subteam to be aware of the mapping from coelements of the parent coarray to coelements germane to the subteam. The phrase "image indices are relative to the current team" in Subclause 5.1 does not adequately explain what parts of a coarray are accessible in a subteam. The mapping from coarray coelements accessible in the parent team, and their cosubscripts, to coarray coelements accessible in the subteam, and their cosubscripts, needs to be more explicitly explained. If this is described elsewhere, it needs to be in Subclause 5.1. For example, if one forms a subteam using 1+mod(this_image(),2) for the , it is not obvious that the coelements of coarrays accessible in each subteam are the odd-numbered and even-numbered coelements of coarrays in the parent team, taken in the parent team's coarray coelement order (a concept we have not defined). More importantly, it is impossible to change the coextents of the leading codimensions of coarrays of rank greater than one when a subteam commences execution. Suppose one has 100 images and a coarray with coextent [1:10,1:10]. Suppose one wishes to divide the current team into four subteams of 25 members each, each accessing a quadrant of the coarray with coextent [1:5,1:5]. All subteams would access the coarray with the leading coextent declared in the parent team, in this case [1:10]. We have no concept of copointers, coassociation, copointer corank remapping, cosections, or coassociation during procedure reference or execution of an ASSOCIATE or SELECT TYPE statement, analogous to pointer association, argument association, or construct association for arrays. This means either that teams are not very useful if one has arrays of corank greater than one, or a subteam must be made aware of at least some properties of the mapping from the parent team to the subteam, analogously to the way that subprograms having explicit-shape dummy arguments need to be told what parts of leading dimensions to use. A third problem is that it is not possible to access coelements outside the mapping for the current team, or for parts of a coarray to be accessible in more than one subteam using subteam-relative cosubcripts. For example, in the above problem, one might wish to divide the [1:10,1:10] coarray into pieces with coextents [1:6,1:6], with the first team having coelements [1:6,1:6], the second having [4:10,1:6], the third having [1:6,4:10], and the last having [4:10,4:10]. This makes it difficult to handle cross-boundary effects between regions of, say, an elliptic PDE problem. The present DTS requires the use of cosubscripts whose values apply to a specific ancestor team. This is a fundamentally bad idea, that is antithetical to one of the reasons advanced for providing teams: software reuse. 3. Proposal ----------- An addition to the syntax of the CHANGE TEAM statement, analogous to the ASSOCIATE statement, could specify coassociation. For example, assuming s = 1+mod(this_image(),2) one might use the following to associate A1 with the odd (even) coelements of A. change team ( t(s), a1 => a[s:*:2] ) ! Herein, A1 is a coarray that is coassociated with the ! odd-numbered coelements of A in subteam 1 and the even-numbered ! ones in subteam 2, of which there are n/2, and therefore ! cosubscripts of A1 in the range 1:n/2 access the expected ! coelements of A. ... end team The mapping from A to A1 is not necessarily the same as the mapping established by the FORM TEAM statement. If it is necessary for the mappings to correspond, that should be explicitly required. A STAT= specifier value could indicate a mismatch. In the case of a coarray of corank greater than one, one might compute cobounds that depend upon the subteam id, and do something like change team ( t2(s1,s2), c1 => c[i1:i2,j1:*] ) One could calculate the cosubscripts to handle the problem of a cosection belonging to more than one subteam. For example, one subteam might have i1:i2 == 1:6 while another has i1:i2 == 4:10. This would be inconsistent with the proposition that the mapping shall correspond to the one implied by the FORM SUBTEAM statements that created the team variable. Vector cosubscripted cosections would not be an insurmountable problem here (until A1 or C1 is an actual argument corresponding to a coarray dummy argument, which should perhaps be prohibited), because the processor clearly could see the vector. _______________________________________________________________________ Stan Whitlock 3) No, for the following reasons. From several different comments and discussions, I think there are issues with corank, cobounds, and coindex mapping inside a change team construct. The recodimension and the coassociation proposals also appear to need further work.