DOI: https://doi.org/10.26089/NumMet.v26r210

An asynchronous sparse Cholesky solver for workstations with NUMA architecture

Authors

  • Andrei S. Maslov
  • Michail M. Makarov
  • Nikolay N. Potravkin
  • Svyatoslav O. Proskurnia

Keywords:

Cholesky factorization
NUMA architecture
asynchronous task paradigm
directed acyclic graph
hwloc library

Abstract

A parallel Cholesky factorization algorithm for sparse matrices has been implemented, based on the asynchronous task paradigm and accounting for the specifics of NUMA architecture. The execution of the numerical factorization stage and forward/backward substitution is represented as a directed acyclic graph (DAG), which removes synchronization barriers and enhances data access locality to improve the utilization efficiency of the computational device’s memory hierarchy. Performance evaluation demonstrates good scalability compared to the highly optimized commercial package Intel MKL PARDISO, confirming the effectiveness of the proposed approach.


Published

2025-04-11

Issue

Section

Methods and algorithms of computational mathematics and their applications

Author Biographies

Andrei S. Maslov

LLC “TS INTEGRATION”
• Developer

Michail M. Makarov

LLC “TS INTEGRATION”
• Head of Department

Nikolay N. Potravkin

LLC “TS INTEGRATION”
• Lead Developer

Svyatoslav O. Proskurnia

LLC “TS INTEGRATION”
• Lead Developer


References

  1. O. Schenk, K. G854rtner, W. Fichtner, and A. Stricker, “PARDISO: A High-Performance Serial and Parallel Sparse Linear Solver in Semiconductor Device Simulation,” Future Gener. Comput. Syst. 18 (1), 69-78 (2001).
    doi 10.1016/S0167-739X(00)00076-5
  2. U. Trottenberg, C. W. Oosterlee, and A. Schüller, Multigrid (Academic Press, London, 2001).
  3. P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, and J. Koster, “A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling,” SIAM J. Matrix Anal. Appl. 23 (1), 15-41 (2001).
    doi 10.1137/S0895479899358194
  4. P. Hénon, P. Ramet, and J. Roman, “PaStiX: A High-Performance Parallel Direct Solver for Sparse Symmetric Positive Definite Systems,” Parallel Comput. 28 (2), 301-321 (2002).
    doi 10.1016/S0167-8191(01)00141-7
  5. M. Jacquelin, Y. Zheng, E. Ng, and K. Yelick, “An Asynchronous Task-based Fan-Both Sparse Cholesky Solver,” arXiv: 1608.00044v2 (2016).
    doi 10.48550/arXiv.1608.00044
  6. G. Ballard, E. Carson, J. Demmel, et al., “Communication Lower Bounds and Optimal Algorithms for Numerical Linear Algebra,” Acta Numer. 23, 1-155 (2014).
    doi 10.1017/S0962492914000038
  7. M. Bollhöfer, O. Schenk, R. Janalik, et al., “State-of-the-Art Sparse Direct Solvers,” in Parallel Algorithms in Computational Science and Engineering (Birkh854user, Cham, 2020), pp. 3-33.
    doi 10.1007/978-3-030-43736-7_1
  8. oneAPI Threading Building Blocks (oneTBB).
    https://www.threadingbuildingblocks.org/.
  9. O. Schenk and K. G854rtner, “Two-Level Dynamic Scheduling in PARDISO: Improved Scalability on Shared Memory Multiprocessing Systems,” Parallel Comput. 28 (2), 187-197 (2002).
    doi 10.1016/S0167-8191(01)00135-1
  10. G. Karypis and V. Kumar, “A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs,” SIAM J. Sci. Comput. 20 (1), 359-392 (1998).
    doi 10.1137/S1064827595287997
  11. C. Chevalier and F. Pellegrini, “PT-Scotch: A Tool for Efficient Parallel Graph Ordering,” Parallel Comput. 34 (6-8), 318-331 (2008).
    doi 10.1016/j.parco.2007.12.001
  12. J. W. H. Liu, “The Role of Elimination Trees in Sparse Factorization,” SIAM J. Matrix Anal. Appl. 11 (1), 134-172 (1990).
    doi 10.1137/0611010
  13. C. C. Ashcraft, A Taxonomy of Column-Based Cholesky Factorizations , PhD Thesis in Mathematics (Yale University, New Haven, US, 1996).
    https://dl.acm.org/doi/book/10.5555/241365.
  14. F. Broquedis, J. Clet-Ortega, S. Moreaud, et al., “hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications,” 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing (2010).
    doi 10.1109/PDP.2010.67
  15. T. A. Davis and Y. Hu, “The University of Florida Sparse Matrix Collection,” ACM Trans. Math. Softw. 38 (1), Article No 1, 1-25 (2011).
    doi 10.1145/2049662.2049663