Workload balancing in GPU implementation of breadth-first search

Authors

  • M.A. Chernoskutov
  • D.G. Ermakov

Keywords:

breadth-first search
parallel algorithm
graphics processing units PDF (in Russian) (315KB) PDF. zip (in Russian) (275KB)

Abstract

Parallel processing of unstructured data пшмут in a graph-like form can be a severe computational challenge because of significant overheads caused by the irregular nature of graph algorithms and the hardware latency of intensive data access. The GPU implementation of the load balancing method that allows one to dramatically improve the parallel breadth-first search algorithm compared to its sequential analog on CPU is considered. This work was partially supported by the Russian Foundation for Basic Research (project 14–07–00435) and by UB RAS (projects 12–P–1–1029 and RCP–13–P18). Numerical experiments were performed using the «Uran» supercomputer installed at IMM UB RAS. This paper was recommended for publication by the Program Committee of the International Scientific Conference «Scientific Services and Internet: all bounds of parallelism» ((http://agora.guru.ru/abrau2013)).


Published

2013-09-17

Issue

Section

Section 2. Programming

Author Biographies

M.A. Chernoskutov

D.G. Ermakov


References

  1. Cormen T.H., Leiserson C.E., Rivest R.L., Stein C. Introduction to algorithms. Cambridge: MIT Press, 2001.
  2. Merril D., Garland M., Grimshaw A. High performance and scalable GPU graph traversal. Technical Report CS-2011-05. Charlottesville: University of Virginia, 2011.
  3. Luo L., Wong M., Hwu W. An effective GPU implementation of breadth-first search // Proc. of the 47th Design Automation Conference. New York: ACM Press, 2010. 52-55.
  4. Harish P., Narayanan P.J. Accelerating large graph algorithms on the GPU using CUDA // Proc. of the 14th International Conference on High Performance Computing. Berlin: Springer, 2007. 197-208.
  5. Hong S., Kim S.K., Oguntebi T., Olukotun K. Accelerating CUDA graph algorithms at maximum warp // Proc. of the 16th ACM Symposium on Principles and Practice of Parallel Programming. New York: ACM Press, 2011. 267-276.
  6. 10th DIMACS Implementation Challenge (http://www.cc.gatech.edu/dimacs10/index.shtml).
  7. Leskovec J., Chakrabarti D., Kleinberg J.M., Faloutsos C. Realistic, mathematically tractable graph generation and evolution using Kronecker multiplication // Proc. of the Conference on Principles and Practice of Knowledge Discovery in Databases. Porto, 2005. 133-145.
  8. Murphy R., Wheeler K., Barrett B., Ang J. Introducing the Graph 500. Cray User’s Group (CUG). Albuquerque: Sandia National Laboratory, 2010.