Асинхронная реализация метода Холецкого для разреженных матриц на компьютерах с NUMA-архитектурой

Andrei S. Maslov; Michail M. Makarov; Nikolay N. Potravkin; Svyatoslav O. Proskurnia

doi:10.26089/NumMet.v26r210

https://doi.org/10.26089/NumMet.v26r210

An asynchronous sparse Cholesky solver for workstations with NUMA architecture

Authors

Andrei S. Maslov
Michail M. Makarov
Nikolay N. Potravkin
Svyatoslav O. Proskurnia

Keywords:

Cholesky factorization

NUMA architecture

asynchronous task paradigm

directed acyclic graph

hwloc library

Abstract

A parallel Cholesky factorization algorithm for sparse matrices has been implemented, based on the asynchronous task paradigm and accounting for the specifics of NUMA architecture. The execution of the numerical factorization stage and forward/backward substitution is represented as a directed acyclic graph (DAG), which removes synchronization barriers and enhances data access locality to improve the utilization efficiency of the computational device’s memory hierarchy. Performance evaluation demonstrates good scalability compared to the highly optimized commercial package Intel MKL PARDISO, confirming the effectiveness of the proposed approach.

Downloads

Published

2025-04-11

Issue

Vol. 26 (2025): Issue 2.

Section

Methods and algorithms of computational mathematics and their applications

Authors

Andrei S. Maslov

LLC “TS INTEGRATION”
• Developer

Michail M. Makarov

LLC “TS INTEGRATION”
• Head of Department

Nikolay N. Potravkin

LLC “TS INTEGRATION”
• Lead Developer

Svyatoslav O. Proskurnia

LLC “TS INTEGRATION”
• Lead Developer

References

O. Schenk, K. G854rtner, W. Fichtner, and A. Stricker, “PARDISO: A High-Performance Serial and Parallel Sparse Linear Solver in Semiconductor Device Simulation,” Future Gener. Comput. Syst. 18 (1), 69-78 (2001).
doi 10.1016/S0167-739X(00)00076-5
U. Trottenberg, C. W. Oosterlee, and A. Schüller, Multigrid (Academic Press, London, 2001).
P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, and J. Koster, “A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling,” SIAM J. Matrix Anal. Appl. 23 (1), 15-41 (2001).
doi 10.1137/S0895479899358194
P. Hénon, P. Ramet, and J. Roman, “PaStiX: A High-Performance Parallel Direct Solver for Sparse Symmetric Positive Definite Systems,” Parallel Comput. 28 (2), 301-321 (2002).
doi 10.1016/S0167-8191(01)00141-7
M. Jacquelin, Y. Zheng, E. Ng, and K. Yelick, “An Asynchronous Task-based Fan-Both Sparse Cholesky Solver,” arXiv: 1608.00044v2 (2016).
doi 10.48550/arXiv.1608.00044
G. Ballard, E. Carson, J. Demmel, et al., “Communication Lower Bounds and Optimal Algorithms for Numerical Linear Algebra,” Acta Numer. 23, 1-155 (2014).
doi 10.1017/S0962492914000038
M. Bollhöfer, O. Schenk, R. Janalik, et al., “State-of-the-Art Sparse Direct Solvers,” in Parallel Algorithms in Computational Science and Engineering (Birkh854user, Cham, 2020), pp. 3-33.
doi 10.1007/978-3-030-43736-7_1
oneAPI Threading Building Blocks (oneTBB).
https://www.threadingbuildingblocks.org/.
O. Schenk and K. G854rtner, “Two-Level Dynamic Scheduling in PARDISO: Improved Scalability on Shared Memory Multiprocessing Systems,” Parallel Comput. 28 (2), 187-197 (2002).
doi 10.1016/S0167-8191(01)00135-1
G. Karypis and V. Kumar, “A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs,” SIAM J. Sci. Comput. 20 (1), 359-392 (1998).
doi 10.1137/S1064827595287997
C. Chevalier and F. Pellegrini, “PT-Scotch: A Tool for Efficient Parallel Graph Ordering,” Parallel Comput. 34 (6-8), 318-331 (2008).
doi 10.1016/j.parco.2007.12.001
J. W. H. Liu, “The Role of Elimination Trees in Sparse Factorization,” SIAM J. Matrix Anal. Appl. 11 (1), 134-172 (1990).
doi 10.1137/0611010
C. C. Ashcraft, A Taxonomy of Column-Based Cholesky Factorizations , PhD Thesis in Mathematics (Yale University, New Haven, US, 1996).
https://dl.acm.org/doi/book/10.5555/241365.
F. Broquedis, J. Clet-Ortega, S. Moreaud, et al., “hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications,” 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing (2010).
doi 10.1109/PDP.2010.67
T. A. Davis and Y. Hu, “The University of Florida Sparse Matrix Collection,” ACM Trans. Math. Softw. 38 (1), Article No 1, 1-25 (2011).
doi 10.1145/2049662.2049663

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

https://doi.org/10.26089/NumMet.v26r210

An asynchronous sparse Cholesky solver for workstations with NUMA architecture

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

Authors

Andrei S. Maslov

Michail M. Makarov

Nikolay N. Potravkin

Svyatoslav O. Proskurnia

References

License

Language

Information

Make a Submission