https://doi.org/10.26089/NumMet.v27r321

Selection of the most efficient sorting algorithm for small-sized data structures by an integer field

Authors

  • Andrey A. Galyuzov

Keywords:

parallel computations
radix sort
counting sort
OpenMP
CUDA
CUDA Thrust
CUDA CUB
CPU
GPU

Abstract

The problem of repeatedly sorting an array of movable particles by an integer field that specifies the type of physical interaction is considered. The study compares 16 sorting implementations for an array of 2·107 structures on CPUs and a GPU. The optimal algorithm is defined as the one with the lowest average execution time on fixed input data and a chosen computing platform; when average execution times are close, a smaller amount of auxiliary memory is considered an additional advantage. All algorithms were tested on the same input array, where the integer sorting key took random values from the set {−1, 0, 1, 2, 3}. For the GPU, only on-device sorting time measured using CUDA events was compared; memory allocation, data transfers between host and device, warm-up, and other runtime overheads were not included in the measurement interval. It is shown that for the considered problem, the comparison-based sorting algorithms and integer-key sorting methods are fundamentally different: due to the small number of possible sorting key values, specialized methods such as counting sort and radix sort are significantly more efficient than general-purpose comparison algorithms. On CPUs, the best results were achieved by the custom specialized algorithm TPT3 and schemes with parallel sorting of subarrays, while on an NVIDIA Tesla V100 GPU, the leader was DeviceRadixSort from the CUDA CUB library. The conclusions drawn apply to the specific movable particle structure and key range, the on-device execution scenario on a GPU and shared-memory computing systems.


Published

2026-07-01

Issue

Section

Methods and algorithms of computational mathematics and their applications

Author

Andrey A. Galyuzov


References

  1. R. M. Bergmann and J. L. Vuji’c, “Algorithmic Choices in WARP – A Framework for Continuous Energy Monte Carlo Neutron Transport in General 3D Geometries on GPUs,” Annals of Nuclear Energy 77, 176–193 (2015).
    doi 10.1016/j.anucene.2014.10.039
  2. C. Liu, “Study on the Particle Sorting Performance for Reactor Monte Carlo Neutron Transport on Apple Unified Memory GPUs,” EPJ Web of Conferences 302, Article Number 04001 (2024).
    doi 10.1051/epjconf/202430204001
  3. J. Tramm, P. Romano, P. Shriwise, et al., “Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs,” EPJ Web of Conferences 302, Article Number 04010 (2024).
    doi 10.1051/epjconf/202430204010
  4. M. S. Egorova, S. A. Dyachkov, A. N. Parshikov, and V. V. Zhakhovsky, “Parallel SPH modeling using dynamic domain decomposition and load balancing displacement of Voronoi subdomains,” Comp. Phys. Comm. 234, 112–125 (2019).
    doi 10.1016/j.cpc.2018.07.019
  5. V. A. Vshivkov, T. V. Markelova, and V. I. Shelehov, “On Sorting Algorithms in the Particle-in-Cell Method,” Nauch. Vestnik NGTU. 33 (4), 79–92 (2008).
    https://cyberleninka.ru/article/n/ob-algoritmah-sortirovki-v-metode-chastits-v-yacheykah Cited June 7, 2026.
  6. A. A. Romanenko and A. V. Snytnikov, “Optimization of Model Reordering Optimization for GPU Implementation of Particle-in-Cell Method,” Vestnik NGU. Ser.: Informat. Technol. 17 (1), 82–89 (2019).
    doi 10.25205/1818-7900-2019-17-1-82-89
  7. N. V. Snytkov, O. P. Stoyanovskaya, and T. A. Glushko, “Parallel Algorithm for Supercomputing Simulation of Dust-Gaseous Gravitating Systems Using Particle-in-Cell and SPH Methods,” Journal of Physics: Conference Series 1336 (1), Article Number 012021 (2019).
    doi 10.1088/1742-6596/1336/1/012021
  8. S. Agostinelli, J. Allison, K. Amako, et al., “Geant4 – A simulation toolkit,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 506 (3), 250–303 (2003).
    doi 10.1016/S0168-9002(03)01368-8
  9. J. Allison, K. Amako, J. Apostolakis, et al., “Geant4 Developments and Applications,” IEEE Transactions on Nuclear Science 53 (1), 270–278 (2006).
    doi 10.1109/TNS.2006.869826
  10. A. A. Galyuzov and M. V. Kosov, “Increasing Performance of the Radiation Transport Simulation: TPT3 Parallel Program,” Software & Systems 38 (2), 269–279 (2025).
    doi 10.15827/0236-235X.150.269-279
  11. A. A. Galyuzov and M. V. Kosov, The TPT3 program for high-performance simulation of radiation transfer, Certificate of RF Registration of Computer Program №. 2024614877. Date of Registration: 29.02.2024.
  12. OpenMP Preprocessor Directive Standard.
    https://www.openmp.org.
  13. NVIDIA CUDA Toolkit.
    https://developer.nvidia.com/cuda/toolkit Cited June 7, 2026.
  14. OpenACC Preprocessor Directive Standard.
    https://www.openacc.org Cited June 7, 2026.
  15. D. E. Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching. 2nd ed.(Addison-Wesley, Boston, 1998).
  16. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms. 3rd ed.(MIT Press, Cambridge, 2009).
  17. S. J. Ross, “The Spreadsort High-Performance General-Case Sorting Algorithm,” Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications 3, 1100–1106 (2002).
  18. N. Satish, M. Harris, and M. Garland, “Designing efficient sorting algorithms for manycore GPUs,” in 2009 IEEE International Symposium on Parallel & Distributed Processing, Italy, Rome, May 23–29, 2009(IEEE Press, 2009), pp. 1–10.
    doi 10.1109/IPDPS.2009.5161005
  19. D. R. Musser, “Introspective Sorting and Selection Algorithms,” Software: Practice and Experience 27 (8), 983–993 (1997).
  20. A. A. Galyuzov and M. V. Kosov, “Calculation of the Gamma Quanta Yield for the ^(235)U Fission in the Uranium Cube by the TPT3 Program,” Technol. EMC. 89 (2), 26–32 (2024).
  21. The repository with the source code for the examples from the article.
    https://github.com/Agar92/benchmarksortingalgorithms.git Cited June 7, 2026.
  22. Boost.Sort Documentation.
    https://www.boost.org/libs/sort/ Cited June 7, 2026.
  23. The repository with the source code for the CUDA CUB library.
    https://github.com/NVIDIA/cub.git Cited June 7, 2026.