Additional comments to accompany BSI vote on SC22 N4487
1. Error Termination (Subclause 2.3.5)
The current wording overspecifies error termination in paragraph 4 of 2.3.5 (Execution sequence). Specifically, requiring one image to be able to force others into synchronisation without them executing any special statement is a heavy burden on implementors, and may not always be feasible.
The current wording specifies that, if image 1 executes an ALL STOP statement, and image N is not responding, image 1 should NOT proceed to termination and close all of its output files, but should simply hang.
Also, the current specification makes it impossible for a Fortran compiler to implement coarrays on a basis of MPI. The first paragraph of the specification of MPI_Abort is:
This routine makes a "best attempt" to abort all tasks in the group of comm. This function does not require that the invoking environment take any action with the error code. However, a Unix or POSIX environment should handle this as a return errorcode from the main program.
And the first paragraph of the specification of MPI_Finalize is:
This routine cleans up all MPI state. Each process must call MPI_FINALIZE before it exits. Unless there has been a call to MPI_ABORT, each process must ensure that all pending non-blocking communications are (locally) complete before calling MPI FINALIZE.
There is no requirement for MPI_Abort to attempt to synchronise with the other processes, and many implementations do not do so - they merely request the operating system or job scheduler to kill the MPI job.
Proposal:
This is to reorganise the order of the wording now in paragraphs 4 and 7 of 2.3.5, so as to specify the exact process for normal and error termination. The new paragraph 4 states that the intent of ALL STOP is to terminate the program immediately, but leaves it largely unspecified as to how.
Replace paragraph 4 of subclause 2.3.5 (page 33 lines 24-29):
Termination of execution of an image occurs in three steps: initiation, synchronization, and completion. All images synchronize execution at the second step so that no image starts the completion step until all images have finished the initiation step. Termination of execution of an image is either normal termination or error termination. An image that initiates normal termination also completes normal termination. An image that initiates error termination also completes error termination. The synchronization step is executed by all images. Termination of execution of the program occurs when all images have terminated execution.
by:
Termination of execution of a program is either normal termination or error termination. Normal termination occurs only if all images initiate normal termination and occurs in three steps: initiation, synchronization, and completion. In this case, all images synchronize execution at the second step so that no image starts the completion step until all images have finished the initiation step. Error termination occurs if any image initiates error termination. Once error termination has been initiated on an image, error termination is initiated on all images that have not already initiated error termination. Termination of execution of the program occurs when all images have terminated execution.
Delete paragraph 7 of subclause 2.3.5 (page 34 lines 2-3):
If an image initiates error termination, all other images that have not already initiated termination initiate error termination.
2. Clarification of rectangular pattern (Subclause 2.4.7)
The following constructions are not clearly permitted by the text of 5.3.6 CODIMENSION attribute, though they are by 13.7.172 (UCOBOUND), and there is a clarificatory NOTE in 6.6 (Image selectors). Neither is an obvious place to look for the precise specification of explicit-coshape-spec.
! running on 17 images
INTEGER, SAVE :: coarray_1(10)[5,*], coarray_2(10)[34,*]
! and similarly for ALLOCATE
The root of the problem, however, lies elsewhere. The initial description of coarray in 2.4.7 says, in paragraph 3:
"The set of corresponding coarrays on all images is arranged in a rectangular pattern. The dimensions of this pattern are the codimensions; the number of codimensions is the corank. The bounds for each codimension are the cobounds."
In conventional mathematics, the size of a rectangle is an exact multiple of each side, and that is not the case here. That is confusing. That is also true for assumed-size arrays, but the potential for confusion there is much less.
Proposal:
After paragraph 3 of subclause 2.4.7 (page 37 line 11+) add the new NOTE:
NOTE 2.12a
If the total number of images is not a multiple of the product of the number of images in each of the codimensions, the rectangular pattern will be incomplete.
3. Correct Note on SYNC IMAGES (Subclause 8.5.4)
The first paragraph of Note 8.37 in subclause 8.5.4 is incorrect. SYNC IMAGES(*) does not have the same effect as SYNC ALL in the presence of some SYNC IMAGES(int-expr) statements. Replace the first sentence of NOTE 8.37 by the paragraph:
"In a program that uses SYNC ALL as its only synchronization mechanism, every SYNC ALL statement could be replaced by a SYNC IMAGES (*) statement, but SYNC ALL might give better performance."
and move the second sentence to the start of the second paragraph.
4. Clarify difference between SYNC IMAGES and SYNC ALL (Subclause 8.5.4)
At the end of Note 8.37 in subclause 8.5.4, in order to clarify the difference between SYNC IMAGES and SYNC ALL, add
"In the following example, each image synchronizes with both of its neighbors, in a circular fashion.
INTEGER :: up, down
up = THIS_IMAGE()+1; if (up>NUM_IMAGES()) up=1
down = THIS_IMAGE()-1; if (down==0) down=NUM_IMAGES()
SYNC_IMAGES( (/ up, down /) )
This might appear to have the same effect as SYNC ALL but there is no ordering between the preceding and succeeding segments on non-adjacent segments. For example, the preceding segment on image 3 will be ordered before the succeeding ones on images 2 and 4, but not those on images 1 and 5."
5. Clarify execution of certain intrinsic procedures (Subclause 13.5)
The document does not specify how certain intrinsic procedures, principally in class S, may be used as far as their use in unordered images is concerned. The proposal is already implied for those of classes E, ES and PS by other wording. Those of class A are already covered.
For implementations where images are separate processes run under a batch scheduler, it is common for the command and its environment to be different for image 1 and the other images; this restriction is already in the standard for "READ (*,...)". It would also be reasonable to implement secondary images by calls to 'network CPU servers', where they might not be associated with a command at all.
Proposal:
After Table 13.1 in subclause 13.5, add the new paragraphs:
The effects of calling COMMAND_ARGUMENT_COUNT, EXECUTE_COMMAND_LINE, GET_COMMAND, GET_COMMAND_ARGUMENT, GET_ENVIRONMENT_VARIABLE on any image other than image 1 are processor dependent.
If RANDOM_SEED is called in a segment A, and either RANDOM_SEED or RANDOM_NUMBER is called in segment B, then segments A and B shall be ordered. It is processor dependent whether all images use a common generator or whether each image uses a separate generator. If a processor uses a common generator, then the interleaving of the calls to RANDOM_NUMBER in unordered segments is unspecified.
It is processor dependent whether the results returned from CPU_TIME, DATE_AND_TIME and SYSTEM_CLOCK are dependent on which image calls them.
NOTE 13.x
This means that it is unspecified whether images use synchronized clocks, whether CPU_TIME returns a per-image or per-program value, whether all images run in the same time zone and whether the count rate and maximum in SYSTEM_CLOCK are the same for all images.
The use of all other standard intrinsic procedures in unordered segments is subject only to their argument use following the rules in 8.5.2.