A parallel multilevel nested dissection algorithm for shared-memory computing systems
Keywords:multilevel nested dissection, sparse matrix ordering, Cholesky factorization, parallel algorithms, shared-memory computing systems, high-performance computing
This paper deals with the NP-complete problem of finding a symmetric positive definite sparse matrix ordering that minimizes the Cholesky factor fill-in. For this purpose, heuristic approaches based on graph algorithms are applied. A new parallel ordering algorithm for shared-memory computing systems is proposed. The modified multilevel nested dissection algorithm from the recently presented MORSy library is used as a basis for ordering. The parallel processing is done in a task-based fashion. It uses the OpenMP 3.0 task parallelism relying on the dynamic load balancing implemented during the OpenMP runtime. The numerical experiments were performed using a number of symmetric positive definite matrices from the University of Florida Sparse Matrix Collection. The experimental results show the competitiveness of the proposed implementation on shared memory systems compared to the widely used ParMETIS library. In our experiments, the parallel MORSy version provides a better ordering than ParMETIS on all but one matrix in terms of the Cholesky factor fill-in and outperforms ParMETIS in most cases. The parallel MORSy version is publicly available from the Supercomputing Center of Lobachevsky State University of Nizhni Novgorod.
- M. Yannakakis, “Computing the Minimum Fill-In is NP-Complete,” SIAM J. Alg. Disc. Meth. 2 (1), 77-79 (1981).
- W. F. Tinney and J. W. Walker, “Direct Solutions of Sparse Network Equations by Optimally Ordered Triangular Factorization,” Proc. IEEE 55 (11), 1801-1809 (1967).
- A. George, “Nested Dissection of a Regular Finite Element Mesh,” SIAM J. Numer. Anal. 10 (2), 345-363 (1973).
- A. Pirova and I. Meyerov, “MORSy - A New Tool For Sparse Matrix Reordering,” in Proc. Int. Conf. on Engineering and Applied Sciences Optimization, Kos Island, Greece, June 4-6, 2014 (National Tech. Univ. of Athens, Athens, 2014), pp. 1952-1964.
- J. W. H. Liu, “Modification of the Minimum-Degree Algorithm by Multiple Elimination,” ACM Trans. Math. Softw. 11 (2), 141-153 (1985).
- P. R. Amestoy, T. A. Davis, and I. S. Duff, “An Approximate Minimum Degree Ordering Algorithm,” SIAM. J. Matrix Anal. Appl. 17 (4), 886-905 (1996).
- T. A. Davis, J. R. Gilbert, S. I. Larimore, and E. G. Ng, “A Column Approximate Minimum Degree Ordering Algorithm,” ACM Trans. Math. Soft. 30 (3), 353-376 (2004).
- R. J. Lipton, D. J. Rose, and R. E. Tarjan, “Generalized Nested Dissection,” SIAM J. Numer. Anal. 16 (2), 346-358 (1979).
- A. George and J. W. H. Liu, “An Automatic Nested Dissection Algorithm for Irregular Finite Element Problems,” SIAM J. Numer. Anal. 15 (5), 1053-1069 (1978).
- A. Pothen, H. D. Simon, and L. Wang, Spectral Nested Dissection , Technical Report CS-92-01 (Pennsylvania State Univ., State College, 1992).
- M. T. Heath and P. Raghavan, “A Cartesian Parallel Nested Dissection Algorithm,” SIAM J. Matrix Anal. Appl. 16 (1), 235-253 (1995).
- G. L. Miller, S.-H. Teng, and S. A. Vavasis, “A Unified Geometric Approach to Graph Separators,” in Proc. 32nd Annual Symp. on Foundations of Computer Science, San Juan, USA, October 1-4, 1991 (IEEE Press, New York, 1991), pp. 538-547.
- T. N. Bui and C. Jones, “A Heuristic for Reducing Fill-In in Sparse Matrix Factorization,” in Proc. 6th SIAM Conf. on Parallel Processing for Scientific Computing, Norfolk, USA, March 22-24, 1993 (SIAM Press, Philadelphia, 1993), pp. 445-452.
- B. Hendrickson and R. Leland, A Multilevel Algorithm for Partitioning Graphs , Tech. Rep. SAND93-1301 (Sandia National Labs., Albuquerque, 1993).
- G. Karypis and V. Kumar, “A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs,” SIAM J. Sci. Comput. 20 (1), 359-392 (1998).
- B. Hendrickson and E. Rothberg, “Improving the Runtime and Quality of Nested Dissection Ordering,” SIAM J. Sci. Comput. 20 (2), 468-489 (1999).
- A. George, M. T. Heath, J. Liu, and E. Ng, “Sparse Cholesky Factorization on a Local-Memory Multiprocessor ,” SIAM J. Sci. Stat. Comput. 9 (2), 327-340 (1988).
- G. Karipis, METIS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices , Version 5.0 (University of Minnesota, Minneapolis, 2011).
- Intel Math Kernel Library Reference Manual.
http://software.intel.com/sites/default/files/managed/9d/c8/mklman.pdf . Cited August 11, 2015.
- G. Karypis and V. Kumar, ParMetis: Parallel Graph Partitioning and Sparse Matrix Ordering Library , Technical Report 97-060 (University of Minnesota, Minneapolis, 1997).
- G. Karypis and V. Kumar, “A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering,” J. Parallel Distrib. Comput. 48 (1), 71-95 (1998).
- K. Schloegel, G. Karypis, and V. Kumar, “Parallel Multilevel Algorithms for Multi-Constraint Graph Partitioning,” in Lecture Notes in Computer Science (Springer, London, 2000), Vol. 1900, pp. 296-310.
- F. Pellegrini, Scotch and LibScotch 6.0 User’s Guide (Université Bordeaux, Bordeaux, 2012).
- F. Pellegrini, J. Roman, and P. Amestoy, “Hybridizing Nested Dissection and Halo Approximate Minimum Degree for Efficient Sparse Matrix Ordering,” Concurrency: Pract. Exper. 12 (2-3), 68-84 (2000).
- C. Chevalier and F. Pellegrini, “PT-Scotch: A Tool for Efficient Parallel Graph Ordering,” Parallel Comput. 34 (6-8), 318-331 (2008).
- D. Lasalle and G. Karypis, “Multi-Threaded Graph Partitioning,” in Proc. 27th Int. Symp. on Parallel and Distributed Processing, Boston, USA, May 20-24, 2013 (IEEE Press, New York, 2013), pp. 225-236.
- F. Pellegrini, “Shared Memory Parallel Algorithms in Scotch 6,”
http://graal.ens-lyon.fr/MUMPS/doc/ud_2013/Pellegrini.pdf . Cited August 11, 2015.
http://hpc-education.unn.ru/research/overview/sparse-algebra/morsy . Cited August 11, 2015.
- B. W. Kernighan and S. Lin, “An Efficient Heuristic Procedure for Partitioning Graphs,” The Bell Syst. Tech. J. 49 (2), 291-307 (1970).
- C. M. Fiduccia and R. M. Mattheyses, “A Linear-Time Heuristic for Improving Network Partitions,” in Proc. 19th IEEE Design Automation Conf., Las Vegas, USA, June 14-16, 1982 (IEEE Press, Piscataway, 1982), pp. 175-181.
- C. Aschcraft and J. W. H. Liu, A Partition Improvement Algorithm for Generalized Nested Dissection , Technical Report BCSTECH-94-020 (Boeing Computer Services, Seattle, 1994).
- C. Aschcraft and J. W. H. Liu, “Using Domain Decomposition to Find Graph Bisectors,” BIT Numer. Math. 37 (3), 506-534 (1997).
- T. A. Davis and Y. Hu, “The University of Florida Sparse Matrix Collection,” ACM Trans. Math. Softw. 38 (2011).
- G. Karipis and K. Schloegel, ParMETIS Manual. Parallel Graph Partitioning and Sparse Matrix Ordering Library , Version 4.0.
http://glaros.dtc.umn.edu/gkhome/fetch/sw/parmetis/manual.pdf . Cited August 11, 2015.
- S. Lebedev, D. Akhmedzhanov, E. Kozinov, et al., “Dynamic Parallelization Strategies for Multifrontal Sparse Cholesky Factorization,” Lecture Notes in Computer Science (in press).
- Е. A. Kozinov, I. G. Lebedev, S. A. Lebedev, et al., “Novel Linear Equations Systems Solver for Sparse Symmetric Positive-Definite Matrix,” Vestn. Univ. Nizhni Novgorod 5 (2), 376-384 (2012).