Target optimization of a supercomputer task flow

Authors

  • S.N. Leonenkov Lomonosov Moscow State University

DOI:

https://doi.org/10.26089/NumMet.v20r319

Keywords:

supercomputer, scheduling efficiency, scheduling algorithms, SLURM

Abstract

This paper is a result of studying the task flows observed on the Lomonosov and Lomonosov-2 supercomputers. A new approach to evaluating the performance of a supercomputer system based on its basic performance characteristics is proposed. A supercomputer’s scheduling efficiency function is introduced for Lomonosov, Lomonosov-2 and other systems. The approach allows the system administrators to compare various supercomputer systems based on their usage aims. This paper describes the Moscow State University experience of applying the proposed approach to the optimization of Lomonosov and Lomonosov-2 scheduling resources.

Author Biography

S.N. Leonenkov

References

  1. Vl. V. Voevodin, S. A. Zhumatii, S. I. Sobolev, et al., “The Lomonosov Supercomputer in Practice,” Otkrytye Sistemy, No. 7, 36-39 (2012).
  2. A. I. Avetisyan, D. A. Grushin, and A. G. Ryzhov, “Cluster Control Systems,” Tr. Mat. Inst. Sistemnogo Programm. Ross. Akad. Nauk 3, 39-62 (2002).
  3. G. Staples, “Torque Resource Manager,” in Proc. 2006 ACM/IEEE Conference on Supercomputing, Tampa, USA, November 11-17, 2006 (ACM Press, New York, 2006),
    doi 10.1145/1188455.1188464
  4. D. Klusácek, V. Chlumský, and H. Rudová, “Optimizing User Oriented Job Scheduling within TORQUE,” in Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Denver, USA, November 17-21, 2013.
    http://sc13.supercomputing.org/sites/default/files/PostersArchive/tech_posters/post185s2-file3.pdf . Cited May 28, 2019.
  5. V. Chlumský, D. Klusácek, and M. Ruda, “Planning, Predictability and Optimization within the TORQUE Scheduler,” in MEMICS 2012 (Novpress, Brno, 2012), pp. 96-97.
  6. TORQUE Resource Manager.
    https://www.adaptivecomputing.com/products/torque/. Cited May 28, 2019.
  7. LoadLeveler.
    http://hpc.cmc.msu.ru/bgp/jobs/loadleveler . Cited May 28, 2019.
  8. Documentation Update: IBM LoadLeveler Version 5 Release 1.
    https://www.ibm.com/support/knowledgecenter/SSFJTW_5.1.0/loadl51_content.html . Cited May 28, 2019.
  9. IBM LoadLeveler to IBM Platform LSF Migration Guide, An IBM Redpaper publication.
    http://www.redbooks.ibm.com/redpapers/pdfs/redp5048.pdf . Cited May 28, 2019.
  10. S. A. Zhumatii, Job Control System Manual.
    http://www.hpc.icc.ru/documentation/cleo_ug.pdf . Cited May 28, 2019.
  11. Parallel Job Control System.
    http://suppz.jscc.ru . Cited May 28, 2019.
  12. A. V. Baranov and A. I. Tikhomirov, “Scheduling of Jobs in a Territorially Distributed Computing System with Absolute Priorities,” Vychisl. Tekhnol. 22 (Suppl. 1), 4-12 (2017).
  13. Maui Cluster Scheduler.
    https://ru.wikipedia.org/wiki/Maui_(программа).
  14. Maui Scheduler Administrator’s Guide.
    http://docs.adaptivecomputing.com/maui/. Cited May 28, 2019.
  15. Moab HPC Suite.
    http://www.adaptivecomputing.com/moab-hpc-basic-edition/. Cited May 28, 2019.
  16. D. Jackson, Q. Snell, and M. Clement, “Core Algorithms of the Maui Scheduler,” in Lecture Notes in Computer Science (Springer, Heidelberg, 2001), Vol. 2221, pp. 87-102.
  17. Comparison of Cluster Software.
    https://en.wikipedia.org/wiki/Comparison_of_cluster_software. Cited May 28, 2019.
  18. A. B. Yoo, M. A. Jette, and M. Grondona, “SLURM: Simple Linux Utility for Resource Management,” in Lecture Notes in Computer Science (Springer, Heidelberg, 2003), Vol. 2862, pp. 44-60.
  19. M. Novotny, Job Scheduling with the SLURM Resource Manager (Masarykova Univ., Bachelor Thesis, Brno, 2009).
  20. D. Lipari, “The SLURM Scheduler Design,”
    http://slurm.schedmd.com/slurm_ug_2012/SUG-2012-Scheduling.pdf.
  21. S. N. Leonenkov and S. A. Zhumatiy, “Scheduling Algorithms and Efficiency of Lomonosov Supercomputer,” in Computing Technologies in Natural Science (Inst. Kosmich. Issled. Ross. Akad. Nauk, Moscow, 2017), Vol. 4, 53-63.
  22. M. Jones, “Optimization of Resource Management Using Supercomputers SLURM,”
    http://www.ibm.com/developerworks/ru/library/l-slurm-utility . Cited May 28, 2019.
  23. Argonne Leadership Computing Facility.
    https://www.alcf.anl.gov
  24. W. Allcock, P. Rich, Y. Fan, and Z. Lan, “Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne,”
    http://jsspp.org/papers17/paper_19.pdf . Cited May 28, 2019.
  25. SLURM User Group Meeting.
    https://sc18.supercomputing.org/proceedings/bof/bof_pages/bof106.html . Cited May 28, 2019.
  26. PBS Professional Open Source Project.
    https://www.pbspro.org . Cited May 28, 2019.
  27. Altair PBS Professional: Overview.
    https://www.pbsworks.com/PBSProduct.aspx?n=Altair-PBS-Professional&c=Overview-and-Capabilities . Cited May 28, 2019.
  28. LoadLeveler. IBM Knowledge Center.
    https://www.ibm.com/support/knowledgecenter/en/SSFJTW/loadl_welcome.html . Cited May 28, 2019.

Published

10-06-2019

How to Cite

Леоненков С. Target Optimization of a Supercomputer Task Flow // Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie). 2019. 20. 199-210. doi 10.26089/NumMet.v20r319

Issue

Section

Section 1. Numerical methods and applications