DOI: https://doi.org/10.26089/NumMet.v23r424

Automated parallelization of programs for heterogeneous clusters using the SAPFOR system

Authors

  • Nikita A. Kataev
  • Alexander S. Kolganov

Keywords:

SAPFOR
DVMH
parallelization automation
data distribution
distribution of computations
heterogeneous clusters

Abstract

This paper has proposed an approach to the automated parallelization of programs for heterogeneous computational clusters. This approach is implemented in SAPFOR (System FOR Automated Parallelization). SAPFOR is a software development suite that aims to produce a parallel version of a sequential program in a semi-automatic way. SAPFOR uses the DVMH directivebased programming model to expose parallelism in the code. SAPFOR also implements different source-to-source transformations and gives the user opportunity to control the parallelization process through the graphical user interface. Fully automatic parallelization is also possible if the program is well-formed and satisfies certain requirements. This paper has described an approach which allows SAPFOR to automate selection of data and computation distribution. We use the NAS Parallel Benchmarks to evaluate the performance of generated programs.


Published

2022-12-15

Issue

Section

Parallel software tools and technologies

Author Biographies

Nikita A. Kataev

Alexander S. Kolganov


References

  1. B. Ya. Steinberg and O. B. Steinberg, “Program Transformations as the Base for Optimizing Parallelizing Compilers,” Program Systems: Theory and Applications 12, Issue 1, 21-113 (2021).
    doi 10.25209/2079-3316-2021-12-1-21-113.
  2. P. Czarnul, J. Proficz, and K. Drypczewski, “Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems,” Sci. Program. 2020, Article ID 4176794 (2020).
    doi 10.1155/2020/4176794.
  3. SYCL Academy.
    https://sycl.tech . Cited December 8, 2022.
  4. Celerity. High-level C++ for Accelerator Clusters.
    https://celerity.github.io . Cited December 8, 2022.
  5. H. Murai, M. Nakao, T. Shimosaka, et al., “XcalableACC -- a Directive-Based Language Extension for Accelerated Parallel Computing,” in Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, New Orleans, USA, November 16-21, 2014.
    https://pro-env.riken.jp/data/2014/post266s2-file3.pdf . Cited December 8, 2022.
  6. N. A. Konovalov, V. A. Krukov, S. N. Mikhajlov, and A. A. Pogrebtsov, “Fortran DVM: a Language for Portable Parallel Program Development,” Program. Comput. Softw. 21 (1), 35-38 (1995).
  7. V. A. Bakhtin, M. S. Klinov, V. A. Krukov, et al., “Extension of the DVM-Model of Parallel Programming for Clusters with Heterogeneous Nodes,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Mat. Model. Program. No. 12, 82-92 (2012).
  8. V. A. Bakhtin, A. S. Kolganov, V. A. Krukov, et al., “Dynamic Tuning Methods of DVMH-Programs for Clusters with Accelerators,” in Proc. Int. Conf. on Russian Supercomputing Days, Moscow, Russia, September 28-29, 2015 (Mosk. Gos. Univ., Moscow, 2015), pp. 257-268.
  9. W.-M. Hwu, S. Ryoo, S.-Z. Ueng, et al., “Implicitly Parallel Programming Models for Thousand-Core Microprocessors,” in Proc. 44th Annual Design Automation Conference, San Diego, USA, June 4-8, 2007 (ACM Press, New York, 2007), pp. 754-759.
    doi 10.1145/1278480.1278669.
  10. R. Baghdadi, U. Beaugnon, A. Cohen, et al, “PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming,” in Proc. Int. Conf. on Parallel Architecture and Compilation Techniques, San Francisco, USA, October 18-21, 2015 (IEEE Press, Washington, DC, 2015), pp. 138-149.
    doi 10.1109/PACT.2015.17.
  11. M. Kruse, Introducing Molly: Distributed Memory Parallelization with LLVM , arXiv preprint: 1409.2088v1[cs.PL] (Cornell Univ. Library, Ithaca, 2014),
    https://doi.org/10.48550/arXiv.1409.2088 . Cited December 8, 2022.
  12. H. Vandierendonck, S. Rul, and K. De Bosschere, “The Paralax Infrastructure: Automatic Parallelization with a Helping Hand,” in Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, Vienna, Austria, September 11-15, 2010 (ACM Press, New York, 2010), pp. 389-400.
    doi 10.1145/1854273.1854322.
  13. N. A. Kataev and A. S. Kolganov, “Additional Parallelization of Existing MPI Programs Using SAPFOR,” Numer. Methods Program. 22 (4), 239-251 (2021).
    doi 10.26089/NumMet.v22r415.
  14. M. S. Klinov and V. A. Krukov, “Automatic Parallelization of Fortran Programs. Mapping to Cluster,” Vestn. Lobachevskii Univ. Nizhni Novgorod, No. 2, 128-134 (2009).
  15. V. A. Bakhtin, I. G. Borodich, N. A. Kataev, et al., “Dialogue with a Programmer in the Automatic Parallelization Environment SAPFOR,” Vestn. Lobachevskii Univ. Nizhni Novgorod, No. 5(2), 242-245 (2012).
  16. N. Kataev, “Application of the LLVM Compiler Infrastructure to the Program Analysis in SAPFOR,” in Communications in Computer and Information Science (Springer, Cham, 2018), Vol. 965, pp. 487-499.
    doi 10.1007/978-3-030-05807-4_41.
  17. N. Kataev, A. Smirnov, and A. Zhukov, “Dynamic Data-Dependence Analysis in SAPFOR,” CEUR Workshop Proc. Vol. 2543 (2020), pp. 199-208.
    doi 10.20948/abrau-2019-62.
  18. N. Kataev, “Interactive Parallelization of C Programs in SAPFOR,” CEUR Workshop Proc. Vol. 2784 (2020), pp. 139-148.
  19. N. Kataev, “LLVM Based Parallelization of C Programs for GPU,” in Communications in Computer and Information Science (Springer, Cham, 2020), Vol. 1331, pp. 436-448. doi10.1007/978-3-030-64616-5_38.
  20. NAS Parallel Benchmarks.
    https://www.nas.nasa.gov/publications/npb.html . Cited December 8, 2022.
  21. S. P. Amarasinghe and M. S. Lam, “Communication Optimization and Code Generation for Distributed Memory Machines,” ACM SIGPLAN Not. 28 (6), 126-138 (1993).
    doi 10.1145/155090.155102.
  22. H. P. Zima, H.-J. Bast, and M. Gerndt, “SUPERB: A Tool for Semi-Automatic MIMD/SIMD Parallelization,” Parallel Comput. 6 (1), 1-18 (1988).
    doi 10.1016/0167-8191(88)90002-6.
  23. T. Grosser, A. Groesslinger, and C. Lengauer, “Polly -- Performing Polyhedral Optimizations on a Low-Level Intermediate Representation,” Parallel Process. Lett. 22 (2012).
    doi 10.1142/S0129626412500107.
  24. C. Lattner and V. Adve, “LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation,” in Proc. Int. Symp. on Code Generation and Optimization (CGO’04), Palo Alto, USA, March 20-24, 2004.
    doi 10.1109/CGO.2004.1281665.
  25. L. R. Gervich, E. N. Kravchenko, B. Ya. Shteinberg, and M. V. Yurushkin, “Automatic Program Parallelization with a Block Data Distribution,” Numer. Analysis Appl. 8 (1), 35-45 (2015).
    doi 10.1134/S1995423915010048.
  26. U. Bondhugula, “Compiling Affine Loop Nests for Distributed-Memory Parallel Architectures,” in Proc. Int. Conf. on High Performance Computing, Networking, Storage and Analysis, Denver, USA, November 17-22, 2013.
    doi 10.1145/2503210.2503289.
  27. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, “A Practical Automatic Polyhedral Parallelizer and Locality Optimizer,” SIGPLAN Not. 43 (6), 101-113 (2008).
    doi 10.1145/1375581.1375595.
  28. P. Banerjee, J. A. Chandy, M. Gupta, et al., “The Paradigm Compiler for Distributed-Memory Multicomputers,” Computer 28 (10), 37-47 (1995).
    doi 10.1109/2.467577.
  29. L. B. Sokolinsky, “BSF: A Parallel Computation Model for Scalability Estimation of Iterative Numerical Algorithms on Cluster Computing Systems,” J. Parall. Distrib. Comput. 149, 193-206 (2021).
    doi 10.1016/j.jpdc.2020.12.009.
  30. Heterogeneous cluster K10.
    https://www.kiam.ru/MVS/resourses/k10.html . Cited December 8, 2022.