A method of two-level parallelization of the Thomas algorithm for solving tridiagonal linear systems on hybrid computers with multicore coprocessors
Keywords:
systems of linear algebraic equations
tridiagonal matrices
Thomas algorithm
parallelization of Thomas algorithm
parallel-pipeline method
Yanenko’s method
parallel computers
Intel Xeon Phi
Abstract
A method of two-level parallelization of the Thomas algorithm for solving tridiagonal linear systems (the thread-level parallelism using OpenMP and the process-level parallelism using MPI) arising when modeling two-dimensional and three-dimensional physical processes is described. The features of its implementation for parallel multiprocessor systems and for hybrid multiprocessor systems with multicore coprocessors Intel Xeon Phi are analyzed. The arithmetic complexity of this method is estimated. Some numerical results obtained when studying its scalability are discussed.
Section
Section 1. Numerical methods and applications
References
- A. N. Bykov, V. A. Veselov, B. L. Voronin, and A. M. Erofeev, “RAMZES-KP Technique for Computation of Multicomponent Heat-Conducting Space Motions in Euler-Lagrange Coordinates,” in Proc. Russian Federal Nuclear Center (Russian Federal Nuclear Center, Sarov, 2008), Issue 13, pp. 50-57.
- J. Jeffers and J. Reinders, Intel Xeon Phi Coprocessor High-Performance Programming (Morgan Kaufmann, San Francisco, 2013).
- NVIDIA. Tesla GPU Accelerators for Servers.
http://www.nvidia.ru/object/tesla-server-gpus-ru.html . Cited June 17, 2016.
- A. Davidson, Y. Zhang, and J. D. Owens, “An Auto-Tuned Method for Solving Large Tridiagonal Systems on the GPU,” in IPDPS’11 Proc. 2011 IEEE Int. Parallel & Distributed Processing Symp., Anchorage, USA, May 16-20, 2011 (IEEE Press, Washington, DC, 2011), pp. 956-965.
http://idav.ucdavis.edu/func/return_pdf?pub_id=1052 . Cited June 17, 2016.
- H.-S. Kim, S. Wu, L.-W. Chang, and W.-M. Hwu, “A Scalable Tridiagonal Solver for GPUs,” in ICPP ’11 Proc. 2011 Int. Conf. on Parallel Processing, Taipei, Taiwan, September 13-16, 2011 (IEEE Press, Washington, DC, 2011), pp. 444-453.
- R. W. Hockney and C. R. Jesshope, Parallel Computers: Architecture, Programming, and Algorithm (Hilger, Bristol, 1988).
- A. A. Samarskii, Introduction to Numerical Methods (Nauka, Moscow, 1982) [in Russian].
- NVIDIA Corporation. NVIDIA CUDA Sparse Matrix Library (cuSPARSE).
https://developer.nvidia.com/cusparse . Cited June 17, 2016.
- S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens, “Scan Primitives for GPU Computing,”
http://www.idav.ucdavis.edu/func/return_pdf?pub_id=915 . Cited June 17, 2016.
- D. Goddeke and R. Strzodka, “Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid,” IEEE Trans. Parallel Distrib. Syst. 22 (1), 22-32 (2011).
- A. Davidson and J. D. Owens, “Register Packing for Cyclic Reduction: A Case Study,”
http://idav.ucdavis.edu/func/return_pdf?pub_id=1053 . Cited June 17, 2016.
- D. Egloff, “High Performance Finite Difference PDE Solvers on GPUs,”
http://www.gpucomputing.net/
sites/default/files/papers/1380/fdm_gpu.pdf . Cited June 17, 2016.
- N. Sakharnykh, “Efficient Tridiagonal Solvers for ADI Methods and Fluid Simulation,”
http://on-demand.gputechconf.com/gtc/2010/presentations/
S12015-Tridiagonal-Solvers-ADI-Methods-Fluid-Simulation.pdf . Cited June 17, 2016.
- Y. Zhang, J. Cohen, and J. D. Owens, “Fast Tridiagonal Solvers on the GPU,” ACM SIGPLAN Notices 45 (5), 127-136 (2010).
- H. S. Stone, “An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations,” J. ACM 20 (1), 27-38 (1973).
- L.-W. Chang, J. A. Stratton, H.-S. Kim, and W.-M.W. Hwu, “A Scalable, Numerically Stable, High-Performance Tridiagonal Solver Using GPUs,” in SC’12 Proc. Int. Conf. on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, USA, November 11-15, 2012 (IEEE Press, Los Alamitos, 2012), Article No. 27.
- E. Polizzi and A. H. Sameh, “A Parallel Hybrid Banded System Solver: The SPIKE Algorithm,” Parallel Comput. 32 (2), 177-194 (2006).
- E. Polizzi and A. Sameh, “SPIKE: A Parallel Environment for Solving Banded Linear Systems,” Comput. Fluids 36 (1), 113-120 (2007).
- I. E. Venetis, A. Sobczyk, A. Kouris, et al., “A General Tridiagonal Solver for Coprocessors: Adapting g-SPIKE for the Intel Xeon Phi,”
http://www.researchgate.net/publication/282286515 . Cited June 17, 2016.
- N. N. Yanenko, A. N. Konovalov, A. N. Bugrov, and G. V. Shustov, “Organization of Parallel Computing and the Thomas Algorithm Parallelization,” in Numerical Methods in Continuum Mechanics (Comput. Center Sib. Branch of USSR Acad. Sci., Novosibirsk, 1978), Vol. 9, Issue 7, pp. 139-146.
- A. Povitsky, Parallelization of the Pipelined Thomas Algorithm , ICASE Report No. 98-48 (NASA Langley Research Center, Hampton, 1998).
- I. S. Sapronov and A. N. Bykov, “A Parallel-Pipelined Algorithm,” Atom, No. 4, 24-25 (2009).
- A. N. Bykov, A. M. Erofeev, E. A. Sizov, and A. A. Fedorov, “A parallel Sweep Method for Hybrid Supercomputers,” Vychisl. Metody Programm. 14, 43-47 (2013).
- S. A. Il’in and A. V. Starchenko, “Parallelization of a Componentwise Splitting for the Numerical Solution of the Heat Conduction Equation,” in Proc. Int. Conf. on Parallel Computational Technologies, Ekaterinburg, Russia, March 30-April 3, 2015 (Inst. of Mathematics and Mechanics, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, 2015), pp. 399-402.
- V. M. Verzhbitskii, Numerical Linear Algebra (Direkt-Media, Moscow, 2013) [in Russian].
- Education Center.
http://compcenter.org/predostavlenie-vychislitelnyh-resur . Cited June 17, 2016.
- Top-50.
http://top50.supercomputers.ru/?page=archive&rating=23 . Cited June 17, 2016.
- G. I. Voronov, V. D. Trushchin, V. V. Shumilin, and D. V. Yezhov, “S-MPI Software System for the Development, Optimization and Execution of Parallel Applications on Supercomputer Clusters,” Voprosy Atomnoi Nauki i Tekhniki, Ser.: Mat. Mod. Fiz. Proc., No. 3, 55-60 (2013).