New approaches for automatic analysis of HPC application performance using the TASC software suite

Vladimir A. Matveev; Alexander V. Setyaev; Vadim V. Voevodin

doi:10.26089/NumMet.v26r433

https://doi.org/10.26089/NumMet.v26r433

New approaches for automatic analysis of HPC application performance using the TASC software suite

Authors

Vladimir A. Matveev
Alexander V. Setyaev
Vadim V. Voevodin

Keywords:

supercomputer

monitoring

performance analysis

HPC applications

supercomputer usage quality

application class

efficiency assessment

Abstract

This paper presents new automatic analysis approaches for identifying performance issues and useful properties in jobs running on a supercomputer. Methods for detecting problematic application classes, such as “hung” programs and jobs with underutilized nodes, are proposed. New assessments for automatic preliminary evaluation of GPU processors and memory usage efficiency are developed and tested as well. These approaches extend the functionality of the existing TASC software suite designed for conducting comprehensive analysis of usage quality of modern supercomputers.

Downloads

Published

2025-11-24

Issue

Vol. 26 (2025): Issue 4.

Section

Parallel software tools and technologies

Authors

Vladimir A. Matveev

Lomonosov Moscow State University, Research Computing Center

• Technic

Alexander V. Setyaev

Lomonosov Moscow State University, Research Computing Center

• Programmer

Vadim V. Voevodin

Lomonosov Moscow State University, Research Computing Center

• Head of Laboratory

References

V. V. Voevodin, D. I. Shaikhislamov, and D. A. Nikitenko, “How to Assess the Quality of Supercomputer Resource Usage,” Supercomputing Frontiers and Innovations 9 (3), 4-18 (2022).
doi 10.14529/jsfi220301
D. A. Nikitenko, P. A. Shvets, and V. V. Voevodin, “Why do Users Need to Take Care of Their HPC Applications Efficiency?’’ Lobachevskii Journal of Mathematics 41 (8), 1521-1532 (2020).
doi 10.1134/s1995080220080132
V. V. Voevodin, D. I. Shaikhislamov, and V. A. Serov, “TASC Software for HPC Performance Analysis: Current State and Latest Developments,” Bulletin of the South Ural State University Series Computational Mathematics and Software Engineering 13 (3), 61-78 (2024).
doi 10.14529/cmse240304
P. Shvets, V. Voevodin, and S. Zhumatiy, “Primary Automatic Analysis of the Entire Flow of Supercomputer Applications,” in Proceedings of the 4th Ural Workshop on Parallel, Distributed, and Cloud Computing for Young Scientists, Yekaterinburg, Russia, November 15, 2018 CEUR Workshop Proceedings, Vol. 2281, pp. 20–32.
P. A. Shvets, and V. V. Voevodin, “ ’Endless’ Workload Analysis of Large-Scale Supercomputers,” Lobachevskii Journal of Mathematics 42 (1), 184–194 (2021).
doi 10.1134/s1995080221010236
E. Ates, O. Tuncer, A. Turk, et al., “Taxonomist: Application Detection Through Rich Monitoring Data,” in Proceedings of Euro-Par 2018: Parallel Processing, Turin, Italy, August 27-31, 2018 Lecture Notes in Computer Science Vol. 11014, pp. 92–105.
doi 10.1007/978-3-319-96983-1_7
T. Jakobsche, N. Lachiche, A. Cavelan, and F. M. Ciorba, “An Execution Fingerprint Dictionary for HPC Application Recognition,” in Proceedings of 2021 IEEE International Conference on Cluster Computing (CLUSTER), Portland, USA, September 7-10, 2021 IEEE Press, New York, 2021, pp. 604-608.
doi 10.1109/Cluster48925.2021.00092
R. D. Lewis, Z. Liu, R. Kettimuthu, and M. E. Papka, “Log-Based Identification, Classification, and Behavior Prediction of HPC Applications,” in Proceedings of HPCSYSPROS’20: HPC System Professionals Workshop, Atlanta, GA, USA, November 11-13, 2020 ACM, New York, 2020, pp. 1-7.
A. Bezrukov, M. Kokarev, D. Shaykhislamov, V. Voevodin, S. Zhumatiy, “Machine Learning Techniques for Detecting Supercomputer Applications with Abnormal Behavior,” in Proceedings of 12th Int. Conference on Parallel Computational Technologies (PCT 2018), Rostov-on-Don, Russia, April 2–6, 2018 Communications in Computer and Information Science 2018. Vol. 910, pp. 31–46.
doi 10.1007/978-3-319-99673-8_3
K. Yamamoto, Y. Tsujita, and A. Uno, “Classifying Jobs and Predicting Applications in HPC Systems,” in Proceedings of ISC on High Performance Computing, Frankfurt, Germany, June 24-28, 2018 , Lecture Notes in Computer Science Vol. 10876, pp. 81–99.
doi 10.1007/978-3-319-92040-5_5
K. Stefanov, Vl. Voevodin, S. Zhumatiy, and V. Voevodin, “Dynamically Reconfigurable Distributed Modular Monitoring System for Supercomputers (DiMMon),” Procedia Computer Science 66, 625–634 (2015).
doi 10.1016/j.procs.2015.11.071
https://doi.org/10.1016/j.procs.2015.11.071Cited November 14, 2025.
Vl. Voevodin, A. Antonov, D. Nikitenko, et al., “Supercomputer Lomonosov-2: Large Scale, Deep Monitoring and Fine Analytics for the User Community,” Supercomputing Frontiers and Innovations 6 (2), 4–11 (2019).
doi 10.14529/jsfi190201
Top-down Microarchitecture Analysis Method.
https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitecture-analysis-method.html . Cited November 14, 2025.
NVIDIA Nsight Compute Documentation.
https://docs.nvidia.com/nsight-compute/.Cited November 14, 2025.
Description of PC Sampling in CUPTI Library.
https://docs.nvidia.com/cupti/main/main.html#cupti-pc-sampling-api . Cited November 14, 2025.
NVIDIA Management Library (NVML) homepage.
https://developer.nvidia.com/management-library-nvml . Cited November 14, 2025.
NVIDIA Data Center GPU Manager (DCGM) homepage.
https://developer.nvidia.com/dcgm . Cited November 14, 2025.
Description of PM Sampling in CUPTI Library.
https://docs.nvidia.com/cupti/main/main.html#cupti-pm-sampling-api . Cited November 14, 2025.
A. Saiz, P. Prieto, P. Abad, et al., “Top-Down Performance Profiling on NVIDIA’s GPUs,” in Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, May 30–June 3, 2022 IEEE Press, New York, 2022, pp. 179-189.
doi 10.1109/IPDPS53621.2022.00026
NAS Parallel Benchmarks for GPUs.
https://github.com/GMAP/NPB-GPU . Cited November 14, 2025.
D. Bailey, T. Harris, W. Saphir, et al., “The NAS Parallel Benchmarks 2.0,” Technical Report NAS-95-020, NASA Ames Research Center 156 (1995).

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

https://doi.org/10.26089/NumMet.v26r433

New approaches for automatic analysis of HPC application performance using the TASC software suite

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

Authors

Vladimir A. Matveev

Alexander V. Setyaev

Vadim V. Voevodin

References

License

Language

Information

Make a Submission