Job Digest: an approach to dynamic analysis of job characteristics on supercomputers
Authors
-
A.V. Adinets
-
P.A. Bryzgalov
-
Vad.V. Voevodin
-
S.A. Zhumatii
-
D.A. Nikitenko
-
K.S. Stefanov
Keywords:
supercomputer
performance
efficiency study
monitoring
parallel computing
dynamic job characteristics
high performance computing
Abstract
With the scale of supercomputing systems and applications growing fast, the difficulty of developing performance efficient applications also grows rapidly. The reason for this is an extensive number of factors that potentially influence the application performance. Hardware and software specifics of the supercomputer, peculiarities of the application, interference of jobs running simultaneously mdash; everything needs to be taken into account when trying to achieve high performance. With supercomputers constantly evolving, all these specifics become more and more complicated. This indicates the demand for a specific tool that would allow seeing where and, what is more important, why does the performance loss happen. In this paper we give an overview of the developed toolkit and discuss in detail one of the approaches aimed at studying the application behavior during the job run. This approach studies the dynamic characteristics of jobs that are gathered by monitoring tools. Its aim is to provide system administrators and users with overall job characteristics in order to get both overall and detailed analysis of every separate job run. This approach and the generated detailed report have been named «Job Digest».
Section
Section 2. Programming
References
- Methods and instrumental systems development for analysis of effectiveness of parallel programs and supercomputers (official site of RU-part of HOPSA project) (http://hopsa.parallel.ru).
- HOlistic Performance System Analysis (EU HOPSA website) (http://vi-hps.org/projects/hopsa/).
- Никитенко Д.А., Стефанов К.С. Исследование эффективности параллельных программ по данным мониторинга // Вычислительные методы и программирование. 2012. 13. 97-102.
- Адинец А.В., Жуматий С.А., Никитенко Д.А. Hoplang -язык обработки потоков данных мониторинга // Тр. Междунар. науч. конф. «Параллельные вычислительные технологии» (ПаВТ) 2012 (Новосибирск, март 26-30, 2012). Челябинск: ЮУрГУ, 2012. 351-359.
- Hoplang language for data processing of cluster monitoring systems (http://github.com/zhum/hoplang).
- Hadoop project homepage URL: http://hadoop.apache.org/.
- Apache PIG Latin official cite URL: http://pig.apache.org/.
- Адинец А.В., Брызгалов П.А., Воеводин Вад.В., Жуматий С.А., Никитенко Д.А. Об одном подходе к мониторингу, анализу и визуализации потока заданий на кластерной системе // Вычислительные методы и программирование. 2011. 12. 90-93.
- Адинец А.В., Брызгалов П.А., Жуматий С.А., Никитенко Д.А. Система визуализации параметров работы больших вычислительных систем // Тр. Междунар. науч. конф. «Параллельные вычислительные технологии» (ПаВТ) 2012 (Новосибирск, март 26-30, 2012). Челябинск: ЮУрГУ, 2012. 714.