ISO/IEC JTC1/SC22/WG5 N1712

            Reducing the extent of the co-array feature

                       Jim Xia and John Reid

1. Introduction

There was a significant minority view at the London meeting that the 
extent of the co-array feature should be reduced, but few of the 
suggestions made then were accepted by WG5. This paper explores  
these new suggestions from Jim:

1. Limit the co-rank to one and the lower co-bound to one.

2. Remove team i/o for direct-access files.

3. Remove team i/o for sequential files, perhaps with output_unit and 
   error_unit connected on all images. 

The intent is to simplify the language without sacrificing either 
functionality or performance. Any of all of these features could 
be reinstated in a future revision.


2. Rationale from Jim

2.1  Rationale for further reduction on co-arrays

2.1.1  High cost of co-array implementation

There has been a view shared among some vendors that the current co-array
feature is far too big for its implementation.  Besides the wait for demand
increase from the user community, the high cost in implementing this feature
is a deterring factor for most vendors.  There are a few contributing
factors: 

1. The concept of explicitly expressing the parallelism using images is new
   to the language and most vendors don't have the necessary infrastructure
   in place to support it.  To design a feasible infrastructure that allows
   future expansion, e.g. for code optimization that can take advantage of
   CAF, and for possibly allowing co-existence of CAF and UPC in an
   application, is a daunting task.  

2. The sheer volume of the feature makes itself hard to implement.  

3. The interactions between co-arrays and other language features may not be
   well understood yet.  Co-arrays are built on top of Fortran 2003, and
   thus most features newly introduced in Fortran 2003, such as polymorphism
   and parameterized derived types (PDT), are required to be supported by
   co-arrays. The implementation on polymorphic or PDT co-arrays may well be
   a challenging area to implement.  Considering the fact there hasn't been
   any experience in this area by any vendor, nor there seems to be any
   concrete demands from user community for it, further reduction in this
   area could be suggested if one can be convinced that it is of very little
   use in real applications.

Based on a recent estimate within IBM, the cost of implementing co-arrays
is roughly half of that for Fortran 2003.  Given the fact that there hasn't
been a single vendor claiming full conformance to Fortran 2003 yet, three
years after the publication of the standard, one can easily understand a
feature of this size becomes a huge burden on nearly all vendors.  Therefore
further reduction in co-arrays is necessary in order to lower the cost and
to improve the chance for implementers to adopt the feature.

2.1.2  Non-essential sub-features of the co-arrays

From a user perspective, there are sub-features not essential to co-arrays.
In particular, team I/O is not necessary to be required.  Further there
seems a flaw in the language to have co-rank to be higher than one.

Driven by these motivations, I'm proposing further reduction of co-arrays.
As laid out before, the goal is to remove the nonessential part of the
language without impacting on either functionality or performance, and at
the same time lower the cost for implementing the language.


2.2 Limit the co-rank to one and the lower co-bound to one

The intent of the co-dimension is to represent the entire list of 
images available to the program after it starts.  This is a completely 
different concept from the traditional Fortran array. And a rank of 
more than one is unnatural in its design in this respect -- it is hard 
to understand, from a user's point of view, what different 
co-dimensions really represent.

A related issue with multiple co-dimensions is that users may view 
co-dimension as physical topology, which really leads to my view that 
this is a flaw in the current design.  The concept of team can be used 
for the purpose of grouping.  If there is a real need to introduce a
construct to represent virtual communication topology of the images, 
then we need to look for a better method.

A third problem with high rank is that it leads to brittle software.  
Depending on the actual number of images available to the programmer, 
he or she will have no idea whether or not the "array of images" can 
be "mapped" into a closed n-dimensional space at the coding stage. It
is very likely a high-co-rank array will be ragged at its edge.   
Robust programming is nearly impossible under this condition.

Reducing the co-rank to one not only removes the ambiguous concept
in the language, but also simplifies the implementation of co-arrays.
In addition, optimization on co-array applications can be made much
easier because there is no need to keep track of all the co-bound values.

2.3  Remove team i/o for direct-access files

Parallel direct-access i/o is intended to allow programmers to read
or write data to a single file from different images without 
synchronization.  The mechanism assumes that different images will 
limit themselves to different records - a burden on programmers.  

2.4 Remove team i/o for sequential files

In general, parallel i/o in sequential mode has to be guarded with 
synchronization, which often leads to performance penalties 
(in some cases, significant slowdown can result from excessive 
synchronization).  In addition, file operations like BACKSPACE, 
ENDFILE and REWIND are not supported on the unit opened for parallel i/o.  
These restrictions make the sub-feature of very limited value to 
applications.

2.5 Futher comments on parallel i/o

A typical i/o operation in a co-array application is carried out by a
single master image. In such a case, the master can gather all the data
needed before writing out to a file, or it  can read in data first before
broadcasting to the appropriate images. In this way, there is no restriction
on the access mode. And the programmers have flexibility in choosing the
access modes, format, etc in their i/o operations. Also importantly, the
programmer has control in minimizing the synchronizations.

Removal of parallel i/o not only simplifies the language, making it much
easier to implement, it also encourages users to exercise good practice in
performing i/o.  
 

3. Rationale from John

For some while, I have felt that the co-array feature has become too large.
Its size has been drawn to my attention while writing my unofficial
summaries, the latest of which is N1697. I found that writing this and
revising it with changes to the draft standard involved a significant amount
of work. The size makes it harder for users to understand all the features 
and there is a danger that it will delay implementations. 

There are good reasons for including each of the features that are now 
there and this applies to the features discussed here. I cannot say that I
like losing any of them, but their loss will not prevent my writing
significant co-array programs. 

3.1 Limit the co-rank to one and the lower co-bound to one

Requiring the co-rank to be 1 and the lower co-bound to be 1 would 
significantly reduce the complexity since there would be no distinction
between a co-subscript value and an image index. 

3.2  Remove team i/o for direct-access files

There is little, if any, experience of parallel i/o with direct-access files
and my feeling is that once the language contains sequence i/o it is this
that should be extended. It offers far more flexibility than is available
with the rigid fixed-length records of direct-access i/o. 

3.3 Remove team i/o for sequential files

I defer to Jim here, but I wish to keep output_unit and error_unit 
as connected on all images for debugging purposes. Without this, 
writing simple diagnostic messages is very tedious.