DOI: https://doi.org/10.26089/NumMet.v22r314

Preprocessing of system monitoring data for workload analysis of HPC systems

Authors

  • M.I. Martyshov
  • D.A. Nikitenko

Keywords:

supercomputing
supercomputers
system monitoring data analysis
system monitoring data cleaning
system monitoring data reduction

Abstract

HPC systems are complex in architecture and contain millions of components. To ensure reliable operation and efficient output, functioning of most subsystems should be supervised. This is done on the basis of collected data from various logging and monitoring systems. This means that different data sources are used, and accordingly, data analysis can face multiple issues processing this data.

Some of the data subsets can be incorrect due to the malfunctioning of used sensors, monitoring system data aggregation errors, etc. This is why it is crucial to preprocess such monitoring data before analyzing it, taking into the consideration the analysis goals. The aim of this paper is, being based on the MSU HPC Center monitoring data, to propose an approach to data preprocessing of HPC monitoring systems, giving some real life examples of issues that may be faced, and recommendations for further analysis of similar datasets.


Downloads

Published

2021-09-23

Issue

Section

Parallel software tools and technologies

Author Biographies

M.I. Martyshov

D.A. Nikitenko


References

  1. L. P. English, Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits (Wiley, New York, 1999).
  2. D. Pyle, Data Preparation for Data Mining (Morgan Kaufmann, San Francisco, 1999).
  3. D. Loshin, Enterprise Knowledge Management: The Data Quality Approach (Morgan Kaufmann, San Francisco, 2001).
  4. T. C. Redman, Data Quality: The Field Guide (Digital Press, Boston, 2001).
  5. T. Dasu and T. Johnson, Exploratory Data Mining and Data Cleaning (Wiley, Hoboken, 2003).
  6. R. Y. Wang, V. C. Storey, and C. P. Firth, “A Framework for Analysis of Data Quality Research,” IEEE Trans. Knowl. Data Eng. 7 (4), 623-640 (1995).
  7. Y. Wand and R. Y Wang, “Anchoring Data Quality Dimensions in OntologicalFoundations,” Commun. ACM. 39 (11), 86-95, 1996.
  8. D. P. Ballou and G. K. Tayi, “Enhancing Data Quality in Data Warehouse Environments,” Commun. ACM. 42 (1), 73-78 (1999).
  9. J. E. Olson, Data Quality: The Accuracy Dimension (Morgan Kaufmann, San Francisco, 2003).
  10. J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques (Morgan Kaufmann, San Francisco, 2012).
  11. The Top500 list of the world’s most powerful supercomputers.
    https://www.top500.org/. Cited September 26, 2021.
  12. The Top50 list of supercomputers in the Russian Federation.
    http://top50.supercomputers.ru/newsfeed}. Cited September 26, 2021.
  13. MSU HPC Center.
    https://parallel.ru/cluster . Cited September 26, 2021.
  14. V. V. Voevodin, A. S. Antonov, D. A. Nikitenko, et al., “Supercomputer Lomonosov-2: Large Scale, Deep Monitoring and Fine Analytics for the User Community,” Supercomput. Front. Innov.6 (2), 4-11 (2019).
  15. V. Voevodin, A. Antonov, D. Nikitenko, et al., “Lomonosov-2: Petascale Supercomputing at Lomonosov Moscow State University,” in Contemporary High Performance Computing: from Petascale toward Exascale (CRC Press, Boca Raton, 2019), Vol. 3, pp. 305-330.
  16. S. I. Sobolev, A. S. Antonov, P. A. Shvets, et al., “Evaluation of the Octotron System on the Lomonosov-2 Supercomputer,” in Proc. Int. Conf. on Parallel Computational Technologies, Rostov-on-Don, Russia, April 2-6, 2018 (South Ural State Univ., Chelyabinsk, 2018), pp. 176-184.
  17. A. V. Adinets, P. A. Bryzgalov, Vad. V. Voevodin, et al., “Job Digest: An Approach to Dynamic Analysis of Job Characteristics on Supercomputers,” Numerical Methods and Programming 13, 160-166 (2012).
  18. D. A. Nikitenko, K. S. Stefanov, S. A. Zhumatiy, et al., “System Monitoring-Based Holistic Resource Utilization Analysis for Every User of a Large HPC Center,” in Lecture Notes in Computer Science (Springer, Cham, 2016), Vol. 10049, pp. 305-318.
  19. D. Nikitenko, S. Zhumatiy, and P,  Shvets, “Making Large-Scale Systems Observable -- Another Inescapable Step towards Exascale,” Supercomput. Front. Innov. 3 (2), 72-79 (2016).
  20. P. Shvets, Vad. Voevodin, and D. Nikitenko, “Approach to Workload Analysis of Large HPC Centers,” in Parallel Computational Technologies (Springer, Cham, 2020), Vol. 1263, pp. 16-30.
  21. D. A. Nikitenko, Vad. V. Voevodin, and S. A. Zhumatiy, “Deep Analysis of Job State Statistics on Lomonosov-2 Supercomputer,” Supercomput. Front. Innov. 5 (2), 4-10 (2018).
  22. S. I. Sobolev, Vl. V. Voevodin, A. S. Antonov, et al., “Making Supercomputers Smart: the Moscow State University Experience,” in Proc. 27th Int. Symp. on Nuclear Electronics and Computing (NEC 2019), Budva, Becici, Montenegro, September 30-October 4, 2019 CEUR Workshop Proc. 2507, 1-6 (2019).
  23. A. Shah, M. Müller, and F. Wolf, “Estimating the Impact of External Interference on Application Performance,” in Lecture Notes in Computer Science (Springer, Cham, 2018), Vol. 11014, pp. 46-58.
  24. T. Hoefler, T. Schneider, and A. Lumsdaine, “Characterizing the Influence of System Noise on Large-Scale Applications by Simulation,” in Proc. ACM/IEEE Conf. on Supercomputing (SC 2010), New Orlean, USA, November 13-19, 2010 (IEEE Press, Washington, DC, 2010),
    doi 10.1109/SC.2010.12
  25. D. A. Nikitenko, F. Wolf, B. Mohr, et al., “Influence of Noisy Environments on Behavior of HPC Applications,” Lobachevskii J. Math. 42 (8), 1560-1570 (2021).
  26. pandas: Python Data Analysis Library.
    https://pandas.pydata.org/. Cited September 26, 2021.
  27. M. Chien and A. Jain, “Gartner Magic Quadrant for Data Quality Solutions,”
    https://www.gartner.com/en/documents/3988016/magic-quadrant-for-data-quality-solutions . Cited September 26, 2021.