ISO/IEC JTC1/SC22/WG5 N1712 Reducing the extent of the co-array feature Jim Xia and John Reid 1. Introduction There was a significant minority view at the London meeting that the extent of the co-array feature should be reduced, but few of the suggestions made then were accepted by WG5. This paper explores these new suggestions from Jim: 1. Limit the co-rank to one and the lower co-bound to one. 2. Remove team i/o for direct-access files. 3. Remove team i/o for sequential files, perhaps with output_unit and error_unit connected on all images. The intent is to simplify the language without sacrificing either functionality or performance. Any of all of these features could be reinstated in a future revision. 2. Rationale from Jim 2.1 Rationale for further reduction on co-arrays 2.1.1 High cost of co-array implementation There has been a view shared among some vendors that the current co-array feature is far too big for its implementation. Besides the wait for demand increase from the user community, the high cost in implementing this feature is a deterring factor for most vendors. There are a few contributing factors: 1. The concept of explicitly expressing the parallelism using images is new to the language and most vendors don't have the necessary infrastructure in place to support it. To design a feasible infrastructure that allows future expansion, e.g. for code optimization that can take advantage of CAF, and for possibly allowing co-existence of CAF and UPC in an application, is a daunting task. 2. The sheer volume of the feature makes itself hard to implement. 3. The interactions between co-arrays and other language features may not be well understood yet. Co-arrays are built on top of Fortran 2003, and thus most features newly introduced in Fortran 2003, such as polymorphism and parameterized derived types (PDT), are required to be supported by co-arrays. The implementation on polymorphic or PDT co-arrays may well be a challenging area to implement. Considering the fact there hasn't been any experience in this area by any vendor, nor there seems to be any concrete demands from user community for it, further reduction in this area could be suggested if one can be convinced that it is of very little use in real applications. Based on a recent estimate within IBM, the cost of implementing co-arrays is roughly half of that for Fortran 2003. Given the fact that there hasn't been a single vendor claiming full conformance to Fortran 2003 yet, three years after the publication of the standard, one can easily understand a feature of this size becomes a huge burden on nearly all vendors. Therefore further reduction in co-arrays is necessary in order to lower the cost and to improve the chance for implementers to adopt the feature. 2.1.2 Non-essential sub-features of the co-arrays From a user perspective, there are sub-features not essential to co-arrays. In particular, team I/O is not necessary to be required. Further there seems a flaw in the language to have co-rank to be higher than one. Driven by these motivations, I'm proposing further reduction of co-arrays. As laid out before, the goal is to remove the nonessential part of the language without impacting on either functionality or performance, and at the same time lower the cost for implementing the language. 2.2 Limit the co-rank to one and the lower co-bound to one The intent of the co-dimension is to represent the entire list of images available to the program after it starts. This is a completely different concept from the traditional Fortran array. And a rank of more than one is unnatural in its design in this respect -- it is hard to understand, from a user's point of view, what different co-dimensions really represent. A related issue with multiple co-dimensions is that users may view co-dimension as physical topology, which really leads to my view that this is a flaw in the current design. The concept of team can be used for the purpose of grouping. If there is a real need to introduce a construct to represent virtual communication topology of the images, then we need to look for a better method. A third problem with high rank is that it leads to brittle software. Depending on the actual number of images available to the programmer, he or she will have no idea whether or not the "array of images" can be "mapped" into a closed n-dimensional space at the coding stage. It is very likely a high-co-rank array will be ragged at its edge. Robust programming is nearly impossible under this condition. Reducing the co-rank to one not only removes the ambiguous concept in the language, but also simplifies the implementation of co-arrays. In addition, optimization on co-array applications can be made much easier because there is no need to keep track of all the co-bound values. 2.3 Remove team i/o for direct-access files Parallel direct-access i/o is intended to allow programmers to read or write data to a single file from different images without synchronization. The mechanism assumes that different images will limit themselves to different records - a burden on programmers. 2.4 Remove team i/o for sequential files In general, parallel i/o in sequential mode has to be guarded with synchronization, which often leads to performance penalties (in some cases, significant slowdown can result from excessive synchronization). In addition, file operations like BACKSPACE, ENDFILE and REWIND are not supported on the unit opened for parallel i/o. These restrictions make the sub-feature of very limited value to applications. 2.5 Futher comments on parallel i/o A typical i/o operation in a co-array application is carried out by a single master image. In such a case, the master can gather all the data needed before writing out to a file, or it can read in data first before broadcasting to the appropriate images. In this way, there is no restriction on the access mode. And the programmers have flexibility in choosing the access modes, format, etc in their i/o operations. Also importantly, the programmer has control in minimizing the synchronizations. Removal of parallel i/o not only simplifies the language, making it much easier to implement, it also encourages users to exercise good practice in performing i/o. 3. Rationale from John For some while, I have felt that the co-array feature has become too large. Its size has been drawn to my attention while writing my unofficial summaries, the latest of which is N1697. I found that writing this and revising it with changes to the draft standard involved a significant amount of work. The size makes it harder for users to understand all the features and there is a danger that it will delay implementations. There are good reasons for including each of the features that are now there and this applies to the features discussed here. I cannot say that I like losing any of them, but their loss will not prevent my writing significant co-array programs. 3.1 Limit the co-rank to one and the lower co-bound to one Requiring the co-rank to be 1 and the lower co-bound to be 1 would significantly reduce the complexity since there would be no distinction between a co-subscript value and an image index. 3.2 Remove team i/o for direct-access files There is little, if any, experience of parallel i/o with direct-access files and my feeling is that once the language contains sequence i/o it is this that should be extended. It offers far more flexibility than is available with the rigid fixed-length records of direct-access i/o. 3.3 Remove team i/o for sequential files I defer to Jim here, but I wish to keep output_unit and error_unit as connected on all images for debugging purposes. Without this, writing simple diagnostic messages is very tedious.