DOI: https://doi.org/10.26089/NumMet.v25r322

On accelerating transformations of programs for solving the generalized Dirichlet problem

Authors

  • Elena A. Metelitsa
  • Boris Ya. Steinberg

Keywords:

tiling
wavefront
parallelization
innermost loop
high performance computing
generalized Dirichlet problem

Abstract

The chain of transformations in the program implementation of the Gauss–Seidel algorithm for solving the generalized two-dimensional Dirichlet problem of the Poisson equation is considered in this paper. It complements the previous chain of accelerating (in particular, parallelizing) transformations of this program. The previous chain of transformations contained “skewing”, “tiling”, “hyperplane method” and “parallelization”. In this work, it is supplemented with the transformations “removal of general subexpressions”, “removal of loop invariants”, “optimization of the loop header”, “optimization of the calculation of array pointers”. A series of numerical experiments were carried out with the resulting chain of transformations on a computer with an eight-core processor. Experiments were performed for different tile sizes. The greatest obtained acceleration is 24%. An analysis of the obtained accelerations was conducted.


Published

2024-09-04

Issue

Section

Parallel software tools and technologies

Author Biographies

Elena A. Metelitsa

Southern Federal University,
Vorovich Institute for Mathematics, Mechanics and Computer Science,
• Junior Researcher

Boris Ya. Steinberg

Southern Federal University,
Vorovich Institute for Mathematics, Mechanics and Computer Science,
• Senior Researcher, Head of the Department of Algebra and Discrete Mathematics


References

  1. E. A. Metelitsa, “Justification of Methods for Accelerating Iterative Loops Nests,” Program Systems: Theory and Applications, 15 (1), 63-94 (2024).
    doi 10.25209/2079-3316-2024-15-1-63-94
  2. A. P. Bagliy, E. A. Metelitsa, and B. Ya. Steinberg, “Automatic Parallelization of Iterative Loops Nests on Distributed Memory Computing Systems,” in Lecture Notes in Computer Science (Springer, Cham, 2023), Vol. 14098, pp. 18-29.
    doi 10.1007/978-3-031-41673-6_2
  3. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, et al., “Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model,” in Lecture Notes in Computer Science (Springer, Berlin, 2008), Vol. 4959, pp. 132-146.
    doi 10.1007/978-3-540-78791-4_9
  4. E. D. Kotina, “On Convergence of Block Iterative Methods,” Izv. Irkutsk Gos. Univ. Ser. Matem. 5 (3), 41-55 (2012).
  5. Z. Gong, Z. Chen, J. Szaday, et al., “An Empirical Study of the Effect of Source-Level Loop Transformations on Compiler Stability,” Proc. ACM Program. Lang. 2 (OOPSLA), Article No. 126 (2018).
    doi 10.1145/3276496
  6. A. Vasilenko, V. Veselovskiy, E. Metelitsa, et al., “Precompiler for the ACELAN-COMPOS Package Solvers,” in Lecture Notes in Computer Science (Springer, Cham, 2021), Vol. 12942, pp. 103-116.
    doi 10.1007/978-3-030-86359-3_8
  7. Optimizing Parallelizing System.
    http://www.ops.rsu.ru/.Cited August 9, 2024.
  8. R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures (Morgan Kaufmann, San Francisco, 2002).
  9. S. S. Muchnick, Advanced Compiler Design Implementation (Morgan Kaufmann, San Francisco, 1997).
  10. LLVM’s Analysis and Transform Passes.
    https://opensource.apple.com/source/lldb/lldb-76/llvm/docs/Passes.html . Cited August 9, 2024.
  11. M. Wolfe, “Loop Skewing: The Wavefront Method Revisited,” Int. J. Parallel Prog. 15 (4), 279-293 (1986).
    doi 10.1007/BF01407876
  12. F. Irigoin and R. Triolet, “Supernode Partitioning,” in Proc. 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, USA, January 10-13, 1988
    https://dl.acm.org/doi/pdf/10.1145/73560.73588 . Cited August 10, 2024.
    doi 10.1145/73560.73588
  13. M. E. Wolf and M. S. Lam, “A Loop Transformation Theory and an Algorithm to Maximize Parallelism,” IEEE Trans. Parallel Distrib. Syst. 2 (4), 452-471 (1991).
    doi 10.1109/71.97902
  14. L. Lamport, “The Parallel Execution of DO Loops,” Commun. ACM. 17 (2), 83-93 (1974).
    doi 10.1145/360827.360844
  15. S. L. Graham, M. Snir, and C. A. Patterson (Eds.). Getting up to Speed: The Future of Supercomputing (National Academies Press, Washington, D.C., 2005).
    doi 10.17226/11148