Bottlenecks in organizing the workflows of large HPC centers

Dmitry A. Nikitenko

doi:10.26089/NumMet.v24r101

https://doi.org/10.26089/NumMet.v24r101

Bottlenecks in organizing the workflows of large HPC centers

Authors

Dmitry A. Nikitenko

Keywords:

supercomputing

provision of computing resources

use of computing resources

workflows at supercomputer center

shared research facilities

provision of computing services

Abstract

Effective output from data centers are determined by many complementary factors. Often, attention is paid to only a few, at first glance, the most significant of them. For example, this is the efficiency of the scheduler, the efficiency of resource utilization by user tasks. At the same time, a more general view of the problem is often missed: the level at which the interconnection of work processes in the HPC center is determined, the organization of effective work as a whole. missions at this stage can negate any subtle optimizations at a low level. This paper provides a scheme for describing workflows in the supercomputer center and analyzes the experience of large HPC facilities in identifying the bottlenecks in this chain. A software implementation option that gives the possibility of optimizing the organization of work at all stages is also proposed in the form of a support system for the functioning of the HPC site.

Downloads

Published

2023-01-18

Issue

Vol. 24 (2023): Issue 1.

Section

Parallel software tools and technologies

Author

Dmitry A. Nikitenko

Lomonosov Moscow State University,
Research Computing Center
• Leading Researcher

References

Vad. V. Voevodin, D. I. Shaikhislamov, and D. A. Nikitenko, “How to Assess the Quality of Supercomputer Resource Usage,” Supercomput. Front. Innov. 9 (3), 4-18 (2022).
doi 10.14529/jsfi220301.
R. McLay, K. W. Schulz, W. L. Barth, and T. Minyard, “Best Practices for the Deployment and Management of Production HPC Clusters,” in Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Seattle, USA, November 12-18, 2011 (ACM Press, New York, 2011),
doi 10.1145/2063348.2063360.
S. Varrette, P. Bouvry, H. Cartiaux, and F. Georgatos, “Management of an Academic HPC Cluster: The UL Experience,” in Proc. Int. Conf. on High Performance Computing & Simulation, Bologna, Italy, July 21-25, 2014 (IEEE Press, New York, 2014), pp. 959-967.
doi 10.1109/HPCSim.2014.6903792.
S. Varrette, E. Kieffer, and F. Pinel, “Optimizing the Resource and Job Management System of an Academic HPC & Research Computing Facility,” in Proc. 21st Int. Symposium on Parallel and Distributed Computing, Basel, Switzerland, July 11-13, 2022 (IEEE Press, New York, 2022), pp. 129-137.
doi 10.1109/ISPDC55340.2022.00027.
Vad. V. Voevodin, R. A. Chulkevich, P. S. Kostenetskiy, et al., “Administration, Monitoring and Analysis of Supercomputers in Russia: a Survey of 10 HPC Centers,” Supercomput. Front. Innov. 8 (3), 82-103 (2021).
doi 10.14529/jsfi210305.
A. V. Paokin and D. A. Nikitenko, “Unified Approach for Provision of Supercomputer Center Resources,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 11 (1), 5-14 (2022).
doi 10.14529/cmse220101.
V. Voevodin, A. Antonov, D. Nikitenko, et al., “Lomonosov-2: Petascale Supercomputing at Lomonosov Moscow State University,” in Contemporary High Performance Computing (CRC Press, Boca Raton, 2019), Vol. 3, pp. 305-330.
doi 10.1201/9781351036863-12.
Vl. V. Voevodin, A. S. Antonov, D. A. Nikitenko, et al., “Supercomputer Lomonosov-2: Large Scale, Deep Monitoring and Fine Analytics for the User Community,” Supercomput. Front. Innov. 6 (2), 4-11 (2019).
doi 10.14529/jsfi190201.
G. I. Savin, B. M. Shabanov, P. N. Telegin, and A. V. Baranov, “Joint Supercomputer Center of the Russian Academy of Sciences: Present and Future,” Lobachevskii J. Math. 40 (11), 1853-1862 (2019).
doi 10.1134/S1995080219110271.
P. S. Kostenetskiy, R. A. Chulkevich, and V. I. Kozyrev, “HPC Resources of the Higher School of Economics,” J. Phys.: Conf. Ser. 1740 (1), Article 012050 (2021).
doi 10.1088/1742-6596/1740/1/012050.
D. A. Nikitenko, Vad. V. Voevodin, and S. A. Zhumatiy, “Driving a Petascale HPC Center with Octoshell Management System,” Lobachevskii J. Math. 40 (11), 1817-1830 (2019).
doi 10.1134/S1995080219110192.
P. Shvets, Vad. Voevodin, and D. Nikitenko, “Approach to Workload Analysis of Large HPC Centers,” in Communications in Computer and Information Science (Springer, Cham, 2020), Vol. 1263, pp. 16-30.
doi 10.1007/978-3-030-55326-5_2.
D. A. Nikitenko, P. A. Shvets, and V. V. Voevodin, “Why do Users Need to Take Care of Their HPC Applications Efficiency?,” Lobachevskii J. Math. 41 (8), 1521-1532 (2020).
doi 10.1134/s1995080220080132.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

https://doi.org/10.26089/NumMet.v24r101

Bottlenecks in organizing the workflows of large HPC centers

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

Author

Dmitry A. Nikitenko

References

License

Language

Information

Make a Submission