DOI: https://doi.org/10.26089/NumMet.v24r433

Performance study of the architecture-independent VGL framework for efficient implementation of graph algorithms

Authors

  • Dmitry I. Lichmanov
  • Ilya V. Afanasyev
  • Vadim V. Voevodin

Keywords:

graph framework
graph algorithms
high-performance computing
performance analysis
vector processing
VGL

Abstract

Graph algorithms are currently often used to solve various modeling tasks, since many real-life objects are well modeled by graphs (for example, a road network or social connections). At the same time, the efficient implementation of such algorithms is often very complex, which is due, in particular, to irregular memory access when working with graphs and the huge size of the input graphs. Graph frameworks — software environments for implementing graph algorithms — can help solve this problem. Previously, an architecture-independent VGL (Vector Graph Library) framework was developed that allows for efficient implementation of graph algorithms on various hardware platforms (multi-core processors with vector extensions, graphics accelerators and NEC vector processors). In this work, the performance of VGL was studied on different platforms, its performance was also compared with existing analogues, and an approach for automatic selection of input graph format based on machine learning methods was proposed and evaluated.


Published

2023-12-18

Issue

Section

Parallel software tools and technologies

Author Biographies

Dmitry I. Lichmanov

Ilya V. Afanasyev

Vadim V. Voevodin


References

  1. VGL: Vector Graph Library.
    https://vgl.parallel.ru/. Cited December 13, 2023.
  2. I. Afanasyev and S. Krymskii, “VGL Rating: A Novel Benchmarking Suite for Modern Supercomputing Architectures,” in Communications in Computer and Information Science (Springer, Cham, 2022), Vol. 1618, pp. 3-16.
    doi 10.1007/978-3-031-11623-0_1
  3. Y. Yamada and S. Momose, “Vector Engine Processor of NEC’s Brand-New Supercomputer SX-Aurora TSUBASA,” in Proc. Int. Symp. on High Performance Chips (Hot Chips2018), Cupertino, USA, August 19-21, 2018,
    https://www.old.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf . Cited December 13, 2023.
  4. Y. Wang, Y. Pan, A. Davidson, et al., “Gunrock: GPU Graph Analytics,” ACM Trans. Parallel Comput. 4 (1), Article Number 3 (2017).
    doi 10.1145/3108140
  5. M. Osama, S. D. Porumbescu, and J. D. Owens, “Essentials of Parallel Graph Analytics,” in Proc. IEEE Int. Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lyon, France, May 30-June 3, 2022 ,
    doi 10.1109/IPDPSW55747.2022.00061
  6. J. Shun and G. E. Blelloch, “Ligra: A Lightweight Graph Processing Framework for Shared Memory,” SIGPLAN Not. 48 (8), 135-146 (2013).
    doi 10.1145/2442516.2442530
  7. J. Shun, “Practical Parallel Hypergraph Algorithms,” in Proc. of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, USA, February 22-26, 2020.
    doi 10.1145/3332466.3374527
  8. S. Beamer, K. Asanović, and D. Patterson, “The GAP Benchmark Suite,” arXiv preprint arXiv: 1508.03619.
    doi 10.48550/arXiv.1508.03619
  9. I. V. Afanasyev, Vad. V. Voevodin, Vl. V. Voevodin, et al., “Analysis of Relationship between SIMD-Processing Features Used in NVIDIA GPUs and NEC SX-Aurora TSUBASA Vector Processors,” in Lecture Notes in Computer Science (Springer, Cham, 2019), Vol. 11657, pp. 125-139.
    doi 10.1007/978-3-030-25636-4_10
  10. I. V. Afanasyev, Research and Development of Effective Graph Algorithms Implementation on Modern Vector Architectures , Candidate’s Dissertation in Mathematics and Physics (Moscow State Univ., Moscow, 2020).
  11. NVidia CUDA Toolkit.
    https://developer.nvidia.com/cuda-toolkit . Cited December 13, 2023.
  12. N. Bell and J. Hoberock, “Thrust: A Productivity-Oriented Library for CUDA,” in GPU Computing Gems Jade Edition (Morgan Kaufmann, Amsterdam, 2012), pp. 359-371.
    doi 10.1016/B978-0-12-385963-1.00026-5
  13. Graph500 benchmark.
    https://graph500.org/. Cited December 13, 2023.
  14. I. V. Afanasyev, Vad. V. Voevodin, Vl. V. Voevodin, et al., “Developing Efficient Implementations of Shortest Paths and Page Rank Algorithms for NEC SX-Aurora TSUBASA Architecture,” Lobachevskii J. Math. 40 (11), 1753-1762 (2019).
    doi 10.1134/S1995080219110039
  15. I. V. Afanasyev and Vl. V. Voevodin, “Developing Efficient Implementations of Connected Component Algorithms for NEC SX-Aurora TSUBASA,” Lobachevskii J. Math. 41 (8), 1417-1426 (2020).
    doi 10.1134/s1995080220080028
  16. D. Chakrabarti, Y. Zhan, and C. Faloutsos, “R-MAT: A Recursive Model for Graph Mining,” in Proc. 2004 SIAM Int. Conf. on Data Mining, April, 2004 , pp. 442-446.
    https://epubs.siam.org/doi/epdf/10.1137/1.9781611972740.43 . Cited December 13, 2023.
  17. J. Kunegis, “Konect: the Koblenz Network Collection,” in Proc. of the 22nd Int. Conf. on World Wide Web, Rio de Janeiro, Brazil, May 13-17, 2013 (ACM Press, New York, 2013), pp. 1343-1350.
    http://dl.acm.org/citation.cfm?id=2488173 . Cited December 13, 2023.
  18. I. Afanasyev, K. Komatsu, D. Lichmanov, et al., “High-Performance GraphBLAS Backend Prototype for NEC SX-Aurora TSUBASA,” in IEEE Int. Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lyon, France, May 30-June 3, 2022 (IEEE Press, Piscataway, 2022), pp. 221-229.
    doi 10.1109/ipdpsw55747.2022.00050
  19. M. Kulkarni, K. Pingali, B. Walter, et al., “Optimistic Parallelism Requires Abstractions,” in Proc. 28th ACM SIGPLAN Conf. on Programming Language Design and Implementation, San Diego, USA, June 10-13, 2007 (ACM Press, New York, 2017), pp. 211-222.
    doi 10.1145/1250734.1250759
  20. Y. Zhang, M. Yang, R. Baghdadi, et al., “GraphIt: A High-Performance DSL for Graph Analytics,” arXiv preprint, arXiv: 1805.00923. 2018.
    doi 10.48550/arXiv.1805.00923
  21. T. A. Davis, “Algorithm 1000: SuiteSparse: GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra,” ACM Trans. Math. Softw. 45 (4), 1-25 (2019).
    doi 10.1145/3322125
  22. R. F. Boisvert, R. Pozo, and K. A. Remington, “The Matrix Market Exchange Formats: Initial Design,” US Department of Commerce, National Institute of Standards and Technology. Gaithersburg, 1996.
    https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5bce84be62e12ffe4d8a63fd118e4cd42f512807 . Cited December 13, 2023.
  23. T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, USA, August 13-17, 2016 (ACM Press, New York, 2016), pp. 785-794.
    doi 10.1145/2939672.2939785
  24. F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-Learn: Machine Learning in Python,” J. Mach. Lear. Res. 12, 2825-2830 (2011),
    https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf . Cited December 13, 2023.