Many parallel scientific applications spend a significant amount of time reading and writing data files. Collective I/O operations allow to optimize the file access of a process group by redistributing data across processes to match the data layout on the file system. In most parallel I/O libraries, the implementation of collective I/O operations is based on the two-phase I/O algorithm, which consists of a communication phase and a file access phase. This papers evaluates various design options for overlapping two internal cycles of the two-phase I/O algorithm, and explores using different data transfer primitives for the shuffle phase, including non-blocking two-sided communication and multiple versions of one-sided communication. The results indicate that overlap algorithms incorporating asynchronous I/O outperform overlapping approaches that only rely on nonblocking communication. However, in the vast majority of the testcases one-sided communication did not lead to performance improvements over two-sided communication.
|Original language||English (US)|
|Title of host publication||2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)|
|Number of pages||8|
|State||Published - Jul 28 2020|