Preprocessing of system monitoring data for workload analysis of HPC systems
Keywords:supercomputing, supercomputers, system monitoring data analysis, system monitoring data cleaning, system monitoring data reduction
HPC systems are complex in architecture and contain millions of components. To ensure reliable operation and efficient output, functioning of most subsystems should be supervised. This is done on the basis of collected data from various logging and monitoring systems. This means that different data sources are used, and accordingly, data analysis can face multiple issues processing this data.
Some of the data subsets can be incorrect due to the malfunctioning of used sensors, monitoring system data aggregation errors, etc. This is why it is crucial to preprocess such monitoring data before analyzing it, taking into the consideration the analysis goals. The aim of this paper is, being based on the MSU HPC Center monitoring data, to propose an approach to data preprocessing of HPC monitoring systems, giving some real life examples of issues that may be faced, and recommendations for further analysis of similar datasets.
- L. P. English, Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits (Wiley, New York, 1999).
- D. Pyle, Data Preparation for Data Mining (Morgan Kaufmann, San Francisco, 1999).
- D. Loshin, Enterprise Knowledge Management: The Data Quality Approach (Morgan Kaufmann, San Francisco, 2001).
- T. C. Redman, Data Quality: The Field Guide (Digital Press, Boston, 2001).
- T. Dasu and T. Johnson, Exploratory Data Mining and Data Cleaning (Wiley, Hoboken, 2003).
- R. Y. Wang, V. C. Storey, and C. P. Firth, “A Framework for Analysis of Data Quality Research,” IEEE Trans. Knowl. Data Eng. 7 (4), 623-640 (1995).
- Y. Wand and R. Y Wang, “Anchoring Data Quality Dimensions in OntologicalFoundations,” Commun. ACM. 39 (11), 86-95, 1996.
- D. P. Ballou and G. K. Tayi, “Enhancing Data Quality in Data Warehouse Environments,” Commun. ACM. 42 (1), 73-78 (1999).
- J. E. Olson, Data Quality: The Accuracy Dimension (Morgan Kaufmann, San Francisco, 2003).
- J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques (Morgan Kaufmann, San Francisco, 2012).
- The Top500 list of the world’s most powerful supercomputers.
https://www.top500.org/. Cited September 26, 2021.
- The Top50 list of supercomputers in the Russian Federation.
http://top50.supercomputers.ru/newsfeed}. Cited September 26, 2021.
- MSU HPC Center.
https://parallel.ru/cluster . Cited September 26, 2021.
- V. V. Voevodin, A. S. Antonov, D. A. Nikitenko, et al., “Supercomputer Lomonosov-2: Large Scale, Deep Monitoring and Fine Analytics for the User Community,” Supercomput. Front. Innov.6 (2), 4-11 (2019).
- V. Voevodin, A. Antonov, D. Nikitenko, et al., “Lomonosov-2: Petascale Supercomputing at Lomonosov Moscow State University,” in Contemporary High Performance Computing: from Petascale toward Exascale (CRC Press, Boca Raton, 2019), Vol. 3, pp. 305-330.
- S. I. Sobolev, A. S. Antonov, P. A. Shvets, et al., “Evaluation of the Octotron System on the Lomonosov-2 Supercomputer,” in Proc. Int. Conf. on Parallel Computational Technologies, Rostov-on-Don, Russia, April 2-6, 2018 (South Ural State Univ., Chelyabinsk, 2018), pp. 176-184.
- A. V. Adinets, P. A. Bryzgalov, Vad. V. Voevodin, et al., “Job Digest: An Approach to Dynamic Analysis of Job Characteristics on Supercomputers,” Numerical Methods and Programming 13, 160-166 (2012).
- D. A. Nikitenko, K. S. Stefanov, S. A. Zhumatiy, et al., “System Monitoring-Based Holistic Resource Utilization Analysis for Every User of a Large HPC Center,” in Lecture Notes in Computer Science (Springer, Cham, 2016), Vol. 10049, pp. 305-318.
- D. Nikitenko, S. Zhumatiy, and P, Shvets, “Making Large-Scale Systems Observable -- Another Inescapable Step towards Exascale,” Supercomput. Front. Innov. 3 (2), 72-79 (2016).
- P. Shvets, Vad. Voevodin, and D. Nikitenko, “Approach to Workload Analysis of Large HPC Centers,” in Parallel Computational Technologies (Springer, Cham, 2020), Vol. 1263, pp. 16-30.
- D. A. Nikitenko, Vad. V. Voevodin, and S. A. Zhumatiy, “Deep Analysis of Job State Statistics on Lomonosov-2 Supercomputer,” Supercomput. Front. Innov. 5 (2), 4-10 (2018).
- S. I. Sobolev, Vl. V. Voevodin, A. S. Antonov, et al., “Making Supercomputers Smart: the Moscow State University Experience,” in Proc. 27th Int. Symp. on Nuclear Electronics and Computing (NEC 2019), Budva, Becici, Montenegro, September 30-October 4, 2019 CEUR Workshop Proc. 2507, 1-6 (2019).
- A. Shah, M. Müller, and F. Wolf, “Estimating the Impact of External Interference on Application Performance,” in Lecture Notes in Computer Science (Springer, Cham, 2018), Vol. 11014, pp. 46-58.
- T. Hoefler, T. Schneider, and A. Lumsdaine, “Characterizing the Influence of System Noise on Large-Scale Applications by Simulation,” in Proc. ACM/IEEE Conf. on Supercomputing (SC 2010), New Orlean, USA, November 13-19, 2010 (IEEE Press, Washington, DC, 2010), doi 10.1109/SC.2010.12
- D. A. Nikitenko, F. Wolf, B. Mohr, et al., “Influence of Noisy Environments on Behavior of HPC Applications,” Lobachevskii J. Math. 42 (8), 1560-1570 (2021).
- pandas: Python Data Analysis Library.
https://pandas.pydata.org/. Cited September 26, 2021.
- M. Chien and A. Jain, “Gartner Magic Quadrant for Data Quality Solutions,”
https://www.gartner.com/en/documents/3988016/magic-quadrant-for-data-quality-solutions . Cited September 26, 2021.
How to Cite
Copyright (c) 2021 M.I. Martyshov, D.A. Nikitenko
This work is licensed under a Creative Commons Attribution 4.0 International License.