Transformation of sequential Fortran programs for their parallelization into hybrid clusters in the SAPFOR


  • Alexander S. Kolganov
  • Grigory D. Gusev


SAPFOR (System FOR Automated Parallelization)
parallelization automation for clusters
transformation automation
parallel computing
DVM (Distributed Virtual Memory)
GPU clusters


The process of parallelizing programs can be difficult due to their optimization for sequential execution. Because of this, the resulting parallel version may be inefficient, and in some cases parallelization is not possible. Transformations of the source code of programs help to solve these problems. This article discusses the implementation of transformations of sequential Fortran programs in the SAPFOR (System FOR Automated Parallelization) system, which make it possible to facilitate the user's work in the system and significantly reduce the complexity of program parallelization. The application of the implemented transformations in the SAPFOR system is demonstrated on a program that solves a system of non-linear partial differential equations. The performance of the obtained parallel version was also compared with the versions parallelized manually using DVM and MPI technologies.





Parallel software tools and technologies

Author Biographies

Alexander S. Kolganov

Grigory D. Gusev


  1. OpenMP Specification. Cited October 12, 2022.
  2. MPI Documents. Cited October 12, 2022.
  3. CUDA Toolkit Documentation. Cited October 12, 2022.
  4. Khronos OpenCL Registry. Cited October 12, 2022.
  5. M. Wolfe, High Performance Compilers for Parallel Computing (Addison-Wesley, New York, 1995).
  6. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, “A Practical Automatic Polyhedral Parallelizer and Locality Optimizer,” SIGPLAN Not. 43 (6), 101-113 (2008).
    doi 10.1145/1379022.1375595.
  7. S. Verdoolaege, J. C. Juega, A. Cohen, et al., “Polyhedral Parallel Code Generation for CUDA,” ACM Trans. Archit. Code Optim. 9 (4), 1-23 (2013).
    doi 10.1145/2400682.2400713.
  8. T. Grosser, A. Groesslinger, and C. Lengauer, “Polly -- Performing Polyhedral Optimizations on a Low-Level Intermediate Representation,” Parallel Process. Lett. 22 (2012).
    doi 10.1142/S0129626412500107.
  9. T. Grosser and T. Hoefler, “Polly-ACC Transparent Compilation to Heterogeneous Hardware,” in Proc. Int. Conf. on Supercomputing, Istambul, Turkey, June 1-3, 2016 (ACM Press, New York, 2016),
    doi 10.1145/2925426.2926286.
  10. J. M. M. Caamano, A. Sukumaran-Rajam, A. Baloian, et al., “APOLLO: Automatic Speculative POLyhedral Loop Optimizer,” in Proc. 7th Int. Workshop on Polyhedral Compilation Techniques (IMPACT 2017), Stockholm, Sweden, January 23, 2017 , . Cited October 12, 2022.
  11. C. Lattner and V. Adve, “LLVM: A Compilation Framework for Lifelong Program Analysis &Transformation,” in Proc. Int. Symp. on Code Generation and Optimization. Palo Alto, USA, March 20-24, 2004 ,
    doi 10.1109/CGO.2004.1281665.
  12. J. Doerfert, K. Streit, S. Hack, and Z. Benaissa, “Polly’s Polyhedral Scheduling in the Presence of Reductions,” in Proc. 5th Int. Workshop on Polyhedral Compilation Techniques (IMPACT 2015), Amsterdam, The Netherlands, January 19, 2015 , . Cited October 12, 2022.
  13. Description of DVM-system. Cited October 12, 2022.
  14. Documentation for C-DVM and Fortran-DVMH Languages. Cited October 12, 2022.
  15. Description of SAPFOR System. Cited October 12, 2022.
  16. V. A. Bakhtin, O. F. Zhukova, N. A. Kataev, et al., “Automation of Parallelization of Software Complexes,” in Proc. Conf. on, “Scientific Service on the Internet’’, Novorossiysk, Russia, September 19-24, 2016 (Keldysh Institute of Applied Mathematics, Moscow, 2016), pp. 76-85.
  17. A. S. Kolganov, Automation of Parallelization of Fortran Programs for Heterogeneous Clusters , Candidate’s Dissertation in Mathematics and Physics (Keldysh Institute of Applied Mathematics, Moscow, 2020).
  18. O. V. Bartenev, Modern Fortran (DIALOG-MEPHI, Moscow, 2000) [in Russian].
  19. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools (Addison-Wesley, Boston, 2006; Williams, Moscow, 2008).
  20. NAS Parallel Benchmarks. . Cited October 14, 2022.