DOI: https://doi.org/10.26089/NumMet.v20r319

Target optimization of a supercomputer task flow

Authors

  • S.N. Leonenkov

Keywords:

supercomputer
scheduling efficiency
scheduling algorithms
SLURM

Abstract

This paper is a result of studying the task flows observed on the Lomonosov and Lomonosov-2 supercomputers. A new approach to evaluating the performance of a supercomputer system based on its basic performance characteristics is proposed. A supercomputer’s scheduling efficiency function is introduced for Lomonosov, Lomonosov-2 and other systems. The approach allows the system administrators to compare various supercomputer systems based on their usage aims. This paper describes the Moscow State University experience of applying the proposed approach to the optimization of Lomonosov and Lomonosov-2 scheduling resources.


Published

2019-06-10

Issue

Section

Section 1. Numerical methods and applications

Author Biography

S.N. Leonenkov


References

  1. Vl. V. Voevodin, S. A. Zhumatii, S. I. Sobolev, et al., “The Lomonosov Supercomputer in Practice,” Otkrytye Sistemy, No. 7, 36-39 (2012).
  2. A. I. Avetisyan, D. A. Grushin, and A. G. Ryzhov, “Cluster Control Systems,” Tr. Mat. Inst. Sistemnogo Programm. Ross. Akad. Nauk 3, 39-62 (2002).
  3. G. Staples, “Torque Resource Manager,” in Proc. 2006 ACM/IEEE Conference on Supercomputing, Tampa, USA, November 11-17, 2006 (ACM Press, New York, 2006),
    doi 10.1145/1188455.1188464
  4. D. Klusácek, V. Chlumský, and H. Rudová, “Optimizing User Oriented Job Scheduling within TORQUE,” in Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Denver, USA, November 17-21, 2013.
    http://sc13.supercomputing.org/sites/default/files/PostersArchive/tech_posters/post185s2-file3.pdf . Cited May 28, 2019.
  5. V. Chlumský, D. Klusácek, and M. Ruda, “Planning, Predictability and Optimization within the TORQUE Scheduler,” in MEMICS 2012 (Novpress, Brno, 2012), pp. 96-97.
  6. TORQUE Resource Manager.
    https://www.adaptivecomputing.com/products/torque/. Cited May 28, 2019.
  7. LoadLeveler.
    http://hpc.cmc.msu.ru/bgp/jobs/loadleveler . Cited May 28, 2019.
  8. Documentation Update: IBM LoadLeveler Version 5 Release 1.
    https://www.ibm.com/support/knowledgecenter/SSFJTW_5.1.0/loadl51_content.html . Cited May 28, 2019.
  9. IBM LoadLeveler to IBM Platform LSF Migration Guide, An IBM Redpaper publication.
    http://www.redbooks.ibm.com/redpapers/pdfs/redp5048.pdf . Cited May 28, 2019.
  10. S. A. Zhumatii, Job Control System Manual.
    http://www.hpc.icc.ru/documentation/cleo_ug.pdf . Cited May 28, 2019.
  11. Parallel Job Control System.
    http://suppz.jscc.ru . Cited May 28, 2019.
  12. A. V. Baranov and A. I. Tikhomirov, “Scheduling of Jobs in a Territorially Distributed Computing System with Absolute Priorities,” Vychisl. Tekhnol. 22 (Suppl. 1), 4-12 (2017).
  13. Maui Cluster Scheduler.
    https://ru.wikipedia.org/wiki/Maui_(программа).
  14. Maui Scheduler Administrator’s Guide.
    http://docs.adaptivecomputing.com/maui/. Cited May 28, 2019.
  15. Moab HPC Suite.
    http://www.adaptivecomputing.com/moab-hpc-basic-edition/. Cited May 28, 2019.
  16. D. Jackson, Q. Snell, and M. Clement, “Core Algorithms of the Maui Scheduler,” in Lecture Notes in Computer Science (Springer, Heidelberg, 2001), Vol. 2221, pp. 87-102.
  17. Comparison of Cluster Software.
    https://en.wikipedia.org/wiki/Comparison_of_cluster_software. Cited May 28, 2019.
  18. A. B. Yoo, M. A. Jette, and M. Grondona, “SLURM: Simple Linux Utility for Resource Management,” in Lecture Notes in Computer Science (Springer, Heidelberg, 2003), Vol. 2862, pp. 44-60.
  19. M. Novotny, Job Scheduling with the SLURM Resource Manager (Masarykova Univ., Bachelor Thesis, Brno, 2009).
  20. D. Lipari, “The SLURM Scheduler Design,”
    http://slurm.schedmd.com/slurm_ug_2012/SUG-2012-Scheduling.pdf.
  21. S. N. Leonenkov and S. A. Zhumatiy, “Scheduling Algorithms and Efficiency of Lomonosov Supercomputer,” in Computing Technologies in Natural Science (Inst. Kosmich. Issled. Ross. Akad. Nauk, Moscow, 2017), Vol. 4, 53-63.
  22. M. Jones, “Optimization of Resource Management Using Supercomputers SLURM,”
    http://www.ibm.com/developerworks/ru/library/l-slurm-utility . Cited May 28, 2019.
  23. Argonne Leadership Computing Facility.
    https://www.alcf.anl.gov
  24. W. Allcock, P. Rich, Y. Fan, and Z. Lan, “Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne,”
    http://jsspp.org/papers17/paper_19.pdf . Cited May 28, 2019.
  25. SLURM User Group Meeting.
    https://sc18.supercomputing.org/proceedings/bof/bof_pages/bof106.html . Cited May 28, 2019.
  26. PBS Professional Open Source Project.
    https://www.pbspro.org . Cited May 28, 2019.
  27. Altair PBS Professional: Overview.
    https://www.pbsworks.com/PBSProduct.aspx?n=Altair-PBS-Professional&c=Overview-and-Capabilities . Cited May 28, 2019.
  28. LoadLeveler. IBM Knowledge Center.
    https://www.ibm.com/support/knowledgecenter/en/SSFJTW/loadl_welcome.html . Cited May 28, 2019.