Job Digest: an approach to dynamic analysis of job characteristics on supercomputers
Keywords:
supercomputer, performance, efficiency study, monitoring, parallel computing, dynamic job characteristics, high performance computingAbstract
With the scale of supercomputing systems and applications growing fast, the difficulty of developing performance efficient applications also grows rapidly. The reason for this is an extensive number of factors that potentially influence the application performance. Hardware and software specifics of the supercomputer, peculiarities of the application, interference of jobs running simultaneously mdash; everything needs to be taken into account when trying to achieve high performance. With supercomputers constantly evolving, all these specifics become more and more complicated. This indicates the demand for a specific tool that would allow seeing where and, what is more important, why does the performance loss happen. In this paper we give an overview of the developed toolkit and discuss in detail one of the approaches aimed at studying the application behavior during the job run. This approach studies the dynamic characteristics of jobs that are gathered by monitoring tools. Its aim is to provide system administrators and users with overall job characteristics in order to get both overall and detailed analysis of every separate job run. This approach and the generated detailed report have been named «Job Digest».
References
- Methods and instrumental systems development for analysis of effectiveness of parallel programs and supercomputers (official site of RU-part of HOPSA project) (http://hopsa.parallel.ru).
- HOlistic Performance System Analysis (EU HOPSA website) (http://vi-hps.org/projects/hopsa/).
- Никитенко Д.А., Стефанов К.С. Исследование эффективности параллельных программ по данным мониторинга // Вычислительные методы и программирование. 2012. 13. 97-102.
- Адинец А.В., Жуматий С.А., Никитенко Д.А. Hoplang -язык обработки потоков данных мониторинга // Тр. Междунар. науч. конф. «Параллельные вычислительные технологии» (ПаВТ) 2012 (Новосибирск, март 26-30, 2012). Челябинск: ЮУрГУ, 2012. 351-359.
- Hoplang language for data processing of cluster monitoring systems (http://github.com/zhum/hoplang).
- Hadoop project homepage URL: http://hadoop.apache.org/.
- Apache PIG Latin official cite URL: http://pig.apache.org/.
- Адинец А.В., Брызгалов П.А., Воеводин Вад.В., Жуматий С.А., Никитенко Д.А. Об одном подходе к мониторингу, анализу и визуализации потока заданий на кластерной системе // Вычислительные методы и программирование. 2011. 12. 90-93.
- Адинец А.В., Брызгалов П.А., Жуматий С.А., Никитенко Д.А. Система визуализации параметров работы больших вычислительных систем // Тр. Междунар. науч. конф. «Параллельные вычислительные технологии» (ПаВТ) 2012 (Новосибирск, март 26-30, 2012). Челябинск: ЮУрГУ, 2012. 714.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2012 Вычислительные методы и программирование

This work is licensed under a Creative Commons Attribution 4.0 International License.