DOI: https://doi.org/10.26089/NumMet.v20r439

A parallel preconditioner based on the approximation of an inverse matrix by power series for solving sparse linear systems on graphics processors

Authors

  • A.V. Yuldashev
  • N.V. Repin
  • V.V. Spele

Keywords:

CUDA
graphics processors
iterative methods
parallel computing
preconditioners
sparse matrices
tridiagonal systems

Abstract

The applicability of the AIPS method approximating an inverse matrix using Neumann series is considered in the framework of the CPR two stage preconditioner. A parallel CUDA-oriented algorithm is proposed for solving linear systems with tridiagonal matrices consisting of independent blocks of different sizes. It is shown that the implementation of the proposed algorithm can be more than twice the speed of the similar functions from the cuSPARSE library. Experimental evaluation of the BiCGStab method with the CPR-AIPS preconditioner on modern GPUs, including a hybrid computing system with 4 GPU NVIDIA Tesla V100, is performed. Numerical experiments show an adequate scalability of this preconditioner as well as the possibility (compared to the CPR-AMG) to accelerate the solution of linear systems being typical for the reservoir modeling problems.


Published

2020-01-11

Issue

Section

Section 1. Numerical methods and applications

Author Biographies

A.V. Yuldashev

N.V. Repin

V.V. Spele


References

  1. Топ50.
    http://top50.supercomputers.ru . Cited October 25, 2019.
  2. K. Aziz and A. Settari, Petroleum Reservoir Simulation (Appl. Sci. Publ., London, 1979; Comput. Sci. Inst., Moscow, 2004).
  3. I. I. Gazizov and A. V. Yuldashev, “Development of Parallel Linear Solver for Hydrodynamic Modeling of Oil and Gas Fields,” Evristich. Algortmy i Raspredel. Vychisl. 1 (1), 88-96 (2014).
  4. AlgoWiki: Open Encyclopedia of Parallel Algorithmic Features. Biconjugate gradient stabilized method (BiCGStab).
    |
    http://algowiki-project.org/en/Biconjugate_gradient_stabilized_method_(BiCGStab)|. Cited October 15, 2019.
  5. Y. Saad, Iterative Methods for Sparse Linear Systems (SIAM, Philadelphia, 2003; Mosk. Gos. Univ., Moscow, 2013).
  6. R. R. Gubaidullin, N. V. Repin, R. F. Sayfutdinov, and A. V. Yuldashev, “Specialized Solver of Sparse Linear Systems of Algebraic Equations on GPU Clusters,” in Proc. Int. Conf. on Russian Supercomputing Days, Moscow, Russia, September 26-27, 2016 (Mosk. Gos. Univ., Moscow, 2016), pp. 673-682.
  7. J. R. Wallis, R. P. Kendall, and T. E. Little, “Constrained Residual Acceleration of Conjugate Residual Methods,” in Proc. SPE Reservoir Simulation Symposium, Dallas, USA, February 10-13, 1985 ,
    doi 10.2118/13536-MS
  8. J. W. Ruge and K. Stüben, “Algebraic Multigrid (AMG),” in Multigrid Methods (SIAM Press, Philadelphia, 1987), Vol. 3, pp. 73-130.
  9. N. S. Nedozhogin, S. P. Kopysov, and A. K. Novikov., “Parallel Formation of a Preconditioner Based on the Sherman-Morrison Inversion,” in Proc. Int. Conf. on Parallel Computing Technologies, Ekaterinburg, Russia, March 31-April 2, 2015 (South Ural State Univ., Chelyabinsk, 2015), pp. 436-441.
  10. K. Chen, Matrix Preconditioning Techniques and Applications (Cambridge Univ. Press, Cambridge, 2005).
  11. H. Prabhu, O. Edfors, J. Rodrigues, et al., “Hardware Efficient Approximative Matrix Inversion for Linear Pre-coding in Massive MIMO,” in IEEE Int. Symp. on Circuits and Systems (ISCAS), Melbourne, Australia, June 1-5, 2014
    |
    https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6865481|
  12. L. S. K. Fung and A. H. Dogru, “Parallel Unstructured Solver Methods for Complex Giant Reservoir Simulation,” in Proc. SPE Reservoir Simulation Symposium, Houston USA, February 26-28, 2007 ,
    doi 10.2118/106237-MS
  13. A. V. Boreskov, A. A. Charlamov, N. D. Markovskii, et al., Parallel Computing on GPU. Architecture and CUDA Programming Model (Mosk. Gos. Univ., Moscow, 2012) [in Russian].
  14. AlgoWiki: Open Encyclopedia of Parallel Algorithmic Features. Thomas Algorithm, Pointwise Version.
    |
    http://algowiki-project.org/en/Thomas_algorithm, _pointwise_version.| Cited October 15, 2019.
  15. A. V. Frolov, “Yet Another Tridiagonal Matrix Algorithm Parallelizing Method,” in Proc. Int. Conf. on Russian Supercomputing Days, Moscow, Russia, September 28-29, 2015 (Mosk. Gos. Univ., Moscow, 2015), pp. 151-162.
  16. cuSPARSE. The API Reference Guide for cuSPARSE, the CUDA Sparse Matrix Library.
    https://docs.nvidia.com/cuda/cusparse/. Cited October 25, 2019.
  17. L.-W. Chang, J. A. Stratton, H.-S. Kim, and W.-M. W. Hwu, “A Scalable, Numerically Stable, High-Performance Tridiagonal Solver Using GPUs,” in Proc. Int. Conf. on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, November 10-16, 2012 (IEEE Press, Los Alamitos, 2012),
    doi 10.1109/SC.2012.12
  18. R. W. Hockney and C. R. Jesshope, Parallel Computers: Architecture, Programming and Algorithms (Adam Hilger, Bristol 1981; Radio i Svyaz’, Moscow, 1986).
  19. E. László, M. Giles, and J. Appleyard, “Manycore Algorithms for Batch Scalar and Block Tridiagonal Solvers,” ACM Trans. Math. Softw. 42 (2016).
    doi 10.1145/2830568
  20. AlgoWiki: Open Encyclopedia of Parallel Algorithmic Features. Repeated Thomas Algorithm, Pointwise Version.
    |
    http://algowiki-project.org/en/Repeated_Thomas_algorithm, _pointwise_version|. Cited October 15, 2019.
  21. USATU’s Supercomputer.
    https://www.ugatu.su/en/research/supercomputer/. Cited October 15, 2019.
  22. V. M. Verzhbitskii, Numerical Linear Algebra (Direkt-Media, Moscow, 2013) [in Russian].