Bottlenecks in organizing the workflows of large HPC centers

Authors

DOI:

https://doi.org/10.26089/NumMet.v24r101

Keywords:

supercomputing, provision of computing resources, use of computing resources, workflows at supercomputer center, shared research facilities, provision of computing services

Abstract

Effective output from data centers are determined by many complementary factors. Often, attention is paid to only a few, at first glance, the most significant of them. For example, this is the efficiency of the scheduler, the efficiency of resource utilization by user tasks. At the same time, a more general view of the problem is often missed: the level at which the interconnection of work processes in the HPC center is determined, the organization of effective work as a whole. missions at this stage can negate any subtle optimizations at a low level. This paper provides a scheme for describing workflows in the supercomputer center and analyzes the experience of large HPC facilities in identifying the bottlenecks in this chain. A software implementation option that gives the possibility of optimizing the organization of work at all stages is also proposed in the form of a support system for the functioning of the HPC site.

Author Biography

Dmitry A. Nikitenko

References

  1. Vad. V. Voevodin, D. I. Shaikhislamov, and D. A. Nikitenko, “How to Assess the Quality of Supercomputer Resource Usage,” Supercomput. Front. Innov. 9 (3), 4-18 (2022).
    doi 10.14529/jsfi220301.
  2. R. McLay, K. W. Schulz, W. L. Barth, and T. Minyard, “Best Practices for the Deployment and Management of Production HPC Clusters,” in Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Seattle, USA, November 12-18, 2011 (ACM Press, New York, 2011),
    doi 10.1145/2063348.2063360.
  3. S. Varrette, P. Bouvry, H. Cartiaux, and F. Georgatos, “Management of an Academic HPC Cluster: The UL Experience,” in Proc. Int. Conf. on High Performance Computing & Simulation, Bologna, Italy, July 21-25, 2014 (IEEE Press, New York, 2014), pp. 959-967.
    doi 10.1109/HPCSim.2014.6903792.
  4. S. Varrette, E. Kieffer, and F. Pinel, “Optimizing the Resource and Job Management System of an Academic HPC & Research Computing Facility,” in Proc. 21st Int. Symposium on Parallel and Distributed Computing, Basel, Switzerland, July 11-13, 2022 (IEEE Press, New York, 2022), pp. 129-137.
    doi 10.1109/ISPDC55340.2022.00027.
  5. Vad. V. Voevodin, R. A. Chulkevich, P. S. Kostenetskiy, et al., “Administration, Monitoring and Analysis of Supercomputers in Russia: a Survey of 10 HPC Centers,” Supercomput. Front. Innov. 8 (3), 82-103 (2021).
    doi 10.14529/jsfi210305.
  6. A. V. Paokin and D. A. Nikitenko, “Unified Approach for Provision of Supercomputer Center Resources,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 11 (1), 5-14 (2022).
    doi 10.14529/cmse220101.
  7. V. Voevodin, A. Antonov, D. Nikitenko, et al., “Lomonosov-2: Petascale Supercomputing at Lomonosov Moscow State University,” in Contemporary High Performance Computing (CRC Press, Boca Raton, 2019), Vol. 3, pp. 305-330.
    doi 10.1201/9781351036863-12.
  8. Vl. V. Voevodin, A. S. Antonov, D. A. Nikitenko, et al., “Supercomputer Lomonosov-2: Large Scale, Deep Monitoring and Fine Analytics for the User Community,” Supercomput. Front. Innov. 6 (2), 4-11 (2019).
    doi 10.14529/jsfi190201.
  9. G. I. Savin, B. M. Shabanov, P. N. Telegin, and A. V. Baranov, “Joint Supercomputer Center of the Russian Academy of Sciences: Present and Future,” Lobachevskii J. Math. 40 (11), 1853-1862 (2019).
    doi 10.1134/S1995080219110271.
  10. P. S. Kostenetskiy, R. A. Chulkevich, and V. I. Kozyrev, “HPC Resources of the Higher School of Economics,” J. Phys.: Conf. Ser. 1740 (1), Article 012050 (2021).
    doi 10.1088/1742-6596/1740/1/012050.
  11. D. A. Nikitenko, Vad. V. Voevodin, and S. A. Zhumatiy, “Driving a Petascale HPC Center with Octoshell Management System,” Lobachevskii J. Math. 40 (11), 1817-1830 (2019).
    doi 10.1134/S1995080219110192.
  12. P. Shvets, Vad. Voevodin, and D. Nikitenko, “Approach to Workload Analysis of Large HPC Centers,” in Communications in Computer and Information Science (Springer, Cham, 2020), Vol. 1263, pp. 16-30.
    doi 10.1007/978-3-030-55326-5_2.
  13. D. A. Nikitenko, P. A. Shvets, and V. V. Voevodin, “Why do Users Need to Take Care of Their HPC Applications Efficiency?,” Lobachevskii J. Math. 41 (8), 1521-1532 (2020).
    doi 10.1134/s1995080220080132.

Downloads

Published

18-01-2023

How to Cite

Nikitenko D.A. Bottlenecks in Organizing the Workflows of Large HPC Centers // Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie). 2023. 24. 1-9. doi 10.26089/NumMet.v24r101

Issue

Section

Parallel software tools and technologies