Applying Composable Parallelism in Programs Using Parallel
Authors
-
Vladimir A. Bakhtin
-
Nikita А. Kataev
-
Alexander S. Kolganov
-
Dmitriy А. Zakharov
-
Alexander A. Smirnov
-
Anton A. Malakhov
Keywords:
Multiprocessor
multilevel parallelism
user threads
OpenMP
TBB
OpenBLAS
HPL
Abstract
As the number of computational cores and the threads utilizing them increases, the overhead associated with thread scheduling, creation, and destruction grows, while the amount of computation performed by each individual thread decreases. Multilevel parallelism is a way to address this issue. One potential source of such parallelism is the optimization of functions within libraries called from parallel programs. However, this approach may require additional support at the level of parallel programming models. In our study, we propose an alternative implementation of the OpenMP runtime library and evaluate its efficiency using the OpenBLAS library and the High Performance Linpack (HPL) benchmark on Arm-based systems.
Section
Parallel software tools and technologies
References
- E. Fiksman, A. Malakhov, “Chapter 18 -- Efficient Nested Parallelism On Large-Scale Systems,” in Reinders J., Jeffers J. (eds). High Performance Parallelism Pearls.(Morgan Kaufmann Pub., Boston, 2015).
doi 10.1016/B978-0-12-802118-7.00018-2
- A. Malakhov, D. Liu, A. Gorshkov and T. Wilmarth, “Composable Multi-Threading and Multi-Processing for Numeric Libraries,” in Proc. of the 17th Python in Science Conf. (SciPy 2018), Austin, Texas, USA, July 9-15, 2018, pp. 18-24.
doi 10.25080/Majora-4af1f417-003
- A. Malakhov, “Composable Multi-Threading for Python Libraries,” in Proc. of the 15th Python in Science Conf. (SciPy 2016), Austin, Texas, USA, July 11-17, 2016, pp. 15-19.
doi 10.25080/Majora-629e541a-002
- PowerPoint Presentation - Hydra22 - Fusing - Anton Malakhov (1).pdf.
https://squidex.jugru.team/api/assets/srm/2502f3b5-36bf-4e6d-88f9-8a6170b688e2/hydra22-fusing-anton-malakhov-1-.pdf . Cited October 21, 2025.
- V. Bakhtin, N. Kataev, A. Kolganov, D. Zakharov, A. Smirnov, M. Kocharmin, “A Study of a Composable Approach to Parallel Programming for Many-Core Multiprocessors,” in V. Voevodin, A. Antonov, D. Nikitenko (eds) Supercomputing. RuSCDays 2024. Lecture Notes in Computer Science, Moscow, Russia, September 23-24, 2024.(Springer, Cham, 2025).
doi 10.1007/978-3-031-78459-0_21
- S. Iwasaki, A. Amer, K. Taura, et al., “BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads,” in Proc. The 28th International Conference on Parallel Architectures and Compilation Techniques (PACT ’19), Sept., 2019, pp. 29-42.
doi 10.1109/PACT.2019.00011
- S. Seo, A. Amer, P. Balaji, et al., “Argobots: A Lightweight Low-Level Threading and Tasking Framework,” IEEE Transactions on Parallel and Distributed Systems. 29 (3), 512-526 (2018).
doi 10.1109/TPDS.2017.2766062
- V. Bakhtin, N. Kataev, A. Kolganov, et al., “Exploring Composable Parallelism in Computational Modelling,” Mathematical Models and Computer Simulations. 16 (2), S216-S224 (2024).
doi 10.1134/S2070048224700923
- GitHub - uxlfoundation/oneTBB: oneAPI Threading Building Blocks (oneTBB).
https://github.com/oneapi-src/oneTBB . Cited October 21, 2025.
- K. B. Wheeler, R. C. Murphy, and D. Thain, “Qthreads: An API for programming with millions of lightweight threads,” in Proc. 2008 IEEE International Symposium on Parallel and Distributed Processing, Miami, FL, USA, April 14-18, 2008, pp. 1-8.
doi 10.1109/IPDPS.2008.4536359
- J. Nakashima, K. Taura, “MassiveThreads: A Thread Library for High Productivity Languages,” In: Agha G., Igarashi A., Kobayashi N., et al. (eds).Concurrent Objects and Beyond: Lecture Notes in Computer Science. Vol. 8665.(Springer, Berlin, Heidelberg, 2014).
doi 10.1007/978-3-662-44471-9_10
- J. H. M. Korndörfer, A. Eleliemy, A. Mohammed, and F. M. Ciorba, “LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications,” IEEE Transactions on Parallel and Distributed Systems. 33 (4), 830-841 (2022).
doi 10.1109/TPDS.2021.3107775
- F. M. Ciorba, C. Iwainsky, and P. Buder, “OpenMP loop scheduling revisited: Making a case for more schedules,” arXiv: 1809.03188 [cs.DC].
doi 10.48550/arXiv.1809.03188
- P. H. Penna, A. T. A. Gomes, M. Castro, P. D. M. Plentz, et al., “A comprehensive performance evaluation of the binLPT workload-aware loop scheduler,” Concurrency Computation: Practice and Experience, 31 (18), e5170 (2019).
doi 10.1002/cpe.5170
- F. Kasielke, R. Tschüter, C. Iwainsky, et al., “Exploring Loop Scheduling Enhancements in OpenMP: An LLVM Case Study,” in Proc. of the 2019 18th International Symposium on Parallel and Distributed Computing (ISPDC), Amsterdam, Netherlands, 2019, pp. 131-138,
doi 10.1109/ISPDC.2019.00026
- S. Shiina, S. Iwasaki, K. Taura, and P. Balaji, “Lightweight preemptive user-level threads,” In Proc. of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’21). Association for Computing Machinery, New York, NY, USA, 2021, pp. 374-388.
doi 10.1145/3437801.3441610
- J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, “An extended set of FORTRAN basic linear algebra subprograms,” ACM Trans. Math. Softw., 14, 1-17 (1988).
doi 10.1145/42288.42291
- Intel® Math Kernel Library Documentation Library.
https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-documentation.html . Cited October 21, 2025.
- KML_BLAS Library Functions - Kunpeng BoostKit 22.0.0 Kunpeng Math Library Developer Guide 01 - Huawei.
https://support.huawei.com/enterprise/en/doc/EDOC1100283141/88ccc310/kml_blas-library-functions . Cited October 21, 2025.
- GitHub - OpenMathLib/OpenBLAS: OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
https://github.com/xianyi/OpenBLAS . Cited October 21, 2025.
- R. Clint Whaley, A. Petitet, and J. J. Dongarra, “Automated empirical optimization of software and the ATLAS project,” Parallel Computing, 27 (1), 3-35 (2001).
doi 10.1016/S0167-8191(00)00087-9
- Taskflow. A General-purpose Task-parallel Programming System.
https://taskflow.github.io/.Cited October 21, 2025.
- The LLVM Compiler Infrastructure Project.
https://llvm.org/.Cited October 21, 2025.
- Miniconda - Anaconda.
https://www.anaconda.com/docs/getting-started/miniconda/main . Cited October 21, 2025.