Target optimization of a supercomputer task flow

Authors

  • S.N. Leonenkov Lomonosov Moscow State University

DOI:

https://doi.org/10.26089/NumMet.v20r319

Keywords:

supercomputer, scheduling efficiency, scheduling algorithms, SLURM

Abstract

This paper is a result of studying the task flows observed on the Lomonosov and Lomonosov-2 supercomputers. A new approach to evaluating the performance of a supercomputer system based on its basic performance characteristics is proposed. A supercomputer’s scheduling efficiency function is introduced for Lomonosov, Lomonosov-2 and other systems. The approach allows the system administrators to compare various supercomputer systems based on their usage aims. This paper describes the Moscow State University experience of applying the proposed approach to the optimization of Lomonosov and Lomonosov-2 scheduling resources.

Author Biography

S.N. Leonenkov

References

  1. Воеводин Вл., Жуматий С., Соболев С., Антонов А., Брызгалов П., Никитенко Д., Стефанов К., Воеводин Вад. Практика суперкомпьютера "Ломоносов" // Открытые системы. 2012. № 7. 36-39.
  2. Аветисян А.И., Грушин Д.А., Рыжов А.Г. Системы управления кластерами // Труды ИСП РАН. 2002. 3. 39-62.
  3. Staples G. TORQUE resource manager // Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. New York: ACM Press, 2006. doi 10.1145/1188455.1188464.
  4. Klusacek D., Chlumsky V., Rudova H. Optimizing user oriented job scheduling within TORQUE. http://sc13.supercomputing.org/sites/default/files/PostersArchive/tech_posters/post185s2-file3.pdf.
  5. Chlumsky V., Klusacek D., Ruda M. Planning, predictability and optimization within the TORQUE scheduler // MEMICS 2012. Brno: Novpress, 2012. 96-97.
  6. Torque Resource Manager. https://www.adaptivecomputing.com/products/torque/.
  7. Loadleveler, ВМК МГУ. http://hpc.cmc.msu.ru/bgp/jobs/loadleveler.
  8. Documentation Update: IBM LoadLeveler Version 5 Release 1. // // https://www.ibm.com/support/knowledgecenter/SSFJTW_5.1.0/loadl51_content.html.
  9. IBM LoadLeveler to IBM Platform LSF Migration Guide, An IBM Redpaper publication. // // http://www.redbooks.ibm.com/redpapers/pdfs/redp5048.pdf
  10. Жуматий С.А. Система управления заданиями Cleo. Руководство пользователя, 2007. http://www.hpc.icc.ru/documentation/cleo_ug.pdf.
  11. Parallel Job Control System. http: // suppz.jscc.ru.
  12. Баранов А.В., Тихомиров А.И. Планирование заданий в территориально распределенной системе с абсолютными приоритетами // Вычислительные технологии. 2017. 22, Спец. вып. 1. 4-12.
  13. MAUI Cluster Scheduler. https://ru.wikipedia.org/wiki/Maui_(программа).
  14. Maui Scheduler Administrators Guide. URL:http://docs.adaptivecomputing.com/maui/.
  15. Moab HPC Suite. http://www.adaptivecomputing.com/moab-hpc-basic-edition/.
  16. Jackson D., Snell Q., Clement M. Core algorithms of the Maui scheduler // Lecture Notes in Computer Science. Vol. 2221. Heidelberg: Springer, 2001. 87-102.
  17. Comparison of cluster software. https://en.wikipedia.org/wiki/Comparison_of_cluster_software.
  18. Yoo A.B., Jette M.A., Grondona M. SLURM: Simple Linux Utility for Resource Management // Lecture Notes in Computer Science. Vol. 2862. Heidelberg: Springer, 2003. 44-60.
  19. Novotny M. Job scheduling with the SLURM resource manager. https://is.muni.cz/th/173052/fi_b_b1/thesis.pdf.
  20. Lipari D. The SLURM Scheduler Design // SLURM User Group. http://slurm.schedmd.com/slurm_ug_2012/SUG-2012-Scheduling.pdf.
  21. Леоненков С.Н., Жуматий С.А. Алгоритмы планирования и эффективность использования суперкомпьютера "Ломоносов" // В сб. Вычислительные технологии в естественных науках. Методы суперкомпьютерного моделирования. Серия Механика, управление и информатика. 4. M.: ИКИ РАН, 2017. 53-63.
  22. Jones M. Optimization of resource management using supercomputers SLURM. 2012. // // http://www.ibm.com/developerworks/ru/library/l-slurm-utility/.
  23. Argonne Leadership Computing Facility. https://www.alcf.anl.gov.
  24. Allcock W., Rich P., Fan Y., Lan Z. Experience and practice of batch scheduling on Leadership Supercomputers at Argonne. http://jsspp.org/papers17/paper_19.pdf.
  25. SLURM User Group Meeting. https://sc18.supercomputing.org/proceedings/bof/bof_pages/bof106.html.
  26. PBS Professional Open Source Project. https://www.pbspro.org.
  27. Altair PBS Professional: Overview. // // https://www.pbsworks.com/PBSProduct.aspx?n=Altair-PBS-Professional&;c=Overview-and-Capabilities.
  28. LoadLeveler. IBM Knowledge Center. https://www.ibm.com/support/knowledgecenter/en/SSFJTW/loadl_welcome.html.

Published

2019-06-10

How to Cite

Леоненков С.Н. Target Optimization of a Supercomputer Task Flow // Numerical methods and programming. 2019. 20. 199-210. doi 10.26089/NumMet.v20r319

Issue

Section

Section 1. Numerical methods and applications