Job Digest: an approach to dynamic analysis of job characteristics on supercomputers

Authors

  • A.V. Adinets
  • P.A. Bryzgalov
  • Vad.V. Voevodin
  • S.A. Zhumatii
  • D.A. Nikitenko
  • K.S. Stefanov

Keywords:

supercomputer
performance
efficiency study
monitoring
parallel computing
dynamic job characteristics
high performance computing

Abstract

With the scale of supercomputing systems and applications growing fast, the difficulty of developing performance efficient applications also grows rapidly. The reason for this is an extensive number of factors that potentially influence the application performance. Hardware and software specifics of the supercomputer, peculiarities of the application, interference of jobs running simultaneously mdash; everything needs to be taken into account when trying to achieve high performance. With supercomputers constantly evolving, all these specifics become more and more complicated. This indicates the demand for a specific tool that would allow seeing where and, what is more important, why does the performance loss happen. In this paper we give an overview of the developed toolkit and discuss in detail one of the approaches aimed at studying the application behavior during the job run. This approach studies the dynamic characteristics of jobs that are gathered by monitoring tools. Its aim is to provide system administrators and users with overall job characteristics in order to get both overall and detailed analysis of every separate job run. This approach and the generated detailed report have been named «Job Digest».


Published

2012-12-11

Issue

Section

Section 2. Programming

Author Biographies

A.V. Adinets

Lomonosov Moscow State University,
Research Computing Center,
Ленинские горы, 119991, Москва
• Researcher

P.A. Bryzgalov

Lomonosov Moscow State University,
Research Computing Center,
Ленинские горы, 119991, Москва
• Senior Researcher

Vad.V. Voevodin

Lomonosov Moscow State University,
Research Computing Center,
Ленинские горы, 119991, Москва
• Researcher

S.A. Zhumatii

Lomonosov Moscow State University,
Research Computing Center,
Ленинские горы, 119991, Москва
• Leading Researcher

D.A. Nikitenko

Lomonosov Moscow State University,
Research Computing Center,
Ленинские горы, 119991, Москва
• Researcher

K.S. Stefanov

Lomonosov Moscow State University,
Research Computing Center,
Ленинские горы, 119991, Москва
• Senior Researcher


References

  1. Methods and instrumental systems development for analysis of effectiveness of parallel programs and supercomputers (official site of RU-part of HOPSA project) (http://hopsa.parallel.ru).
  2. HOlistic Performance System Analysis (EU HOPSA website) (http://vi-hps.org/projects/hopsa/).
  3. Никитенко Д.А., Стефанов К.С. Исследование эффективности параллельных программ по данным мониторинга // Вычислительные методы и программирование. 2012. 13. 97-102.
  4. Адинец А.В., Жуматий С.А., Никитенко Д.А. Hoplang -язык обработки потоков данных мониторинга // Тр. Междунар. науч. конф. «Параллельные вычислительные технологии» (ПаВТ) 2012 (Новосибирск, март 26-30, 2012). Челябинск: ЮУрГУ, 2012. 351-359.
  5. Hoplang language for data processing of cluster monitoring systems (http://github.com/zhum/hoplang).
  6. Hadoop project homepage URL: http://hadoop.apache.org/.
  7. Apache PIG Latin official cite URL: http://pig.apache.org/.
  8. Адинец А.В., Брызгалов П.А., Воеводин Вад.В., Жуматий С.А., Никитенко Д.А. Об одном подходе к мониторингу, анализу и визуализации потока заданий на кластерной системе // Вычислительные методы и программирование. 2011. 12. 90-93.
  9. Адинец А.В., Брызгалов П.А., Жуматий С.А., Никитенко Д.А. Система визуализации параметров работы больших вычислительных систем // Тр. Междунар. науч. конф. «Параллельные вычислительные технологии» (ПаВТ) 2012 (Новосибирск, март 26-30, 2012). Челябинск: ЮУрГУ, 2012. 714.