Examination of supercomputer system jobs flow dynamic characteristics

Authors

  • A.S. Antonov
  • S.A. Zhumatii
  • D.A. Nikitenko
  • K.S. Stefanov
  • A.M. Teplov
  • P.A. Shvets

Keywords:

supercomputer
performance
efficiency
dynamic characteristics
supercomputer system load
monitoring
resource manager PDF (in Russian) (134KB) PDF. zip (in Russian) (115KB)

Abstract

The article presents the supercomputer jobs flow dynamic characteristics monitoring system implemented on the «SKIF MSU Chebyshev» supercomputer at present. The proposed approach to the analysis allows one to get an adequate estimate of actual properties of the job flow efficiently and technologically simply. It allows one to gain a better understanding of the supercomputer resources utilization, to identify shortcomings in the supercomputer architecture, and to find possible directions of its optimization.


Published

2013-11-11

Issue

Section

Section 2. Programming

Author Biographies

A.S. Antonov

S.A. Zhumatii

D.A. Nikitenko

K.S. Stefanov

A.M. Teplov

P.A. Shvets


References

  1. Адинец А. B., Брызгалов П.А., Воеводин Вад.В., Жуматий С.А., Никитенко Д.А., Стефанов К.С. Job Digest - подход к исследованию динамических свойств задач на суперкомпьютерных системах // Вестн. Уфимского гос. авиационного техн. ун-та. 2013. 17, № 2. 131-137.
  2. Adinets A.V., Bryzgalov P.A., Voevodin Vad.V., Zhumatii S.A., Nikitenko D.A., Stefanov K.S. Job Digest: an approach to dynamic analysis of job characteristics on supercomputers // Numerical Methods and Programming: Advanced Computing. 2012. 13, section 2. 160-166.
  3. Воеводин Вл.В., Жуматий С.А., Соболев С.И., Антонов А.С., Брызгалов П.А., Никитенко Д.А., Стефанов К.С., Воеводин Вад.В. Практика суперкомпьютера «Ломоносов» // Открытые системы. 2012. № 7. 36-39.
  4. MongoDB (http://www.mongodb.org/).
  5. Cassandra (http://cassandra.apache.org/).
  6. Cleo Cluster Batch System (http://sourceforge.net/projects/cleo-bs/).
  7. SLURM: A Highly Scalable Resource Manager (https://computing.llnl.gov/linux/slurm/).
  8. Антонов А.С. СКИФ МГУ - основа Суперкомпьютерного комплекса Московского университета // Вторая Международная научная конференция «Суперкомпьютерные системы и их применение» (SSA’2008).
  9. Linux load averages, for example from top and uptime commands, can be massively incorrect on the low side (http://www.smythies.com/ doug/network/load_average/original.html).