Implementation of an associative-computing model on GPU: a basic procedure library of the STAR language
Keywords:
vertical data processing
model of associative parallel processor
GPU
high-performance computing
Abstract
The associative (content addressable) parallel processors of the SIMD type with vertical data processing are oriented on solving problems of non-numeric data processing. The simulation of such systems is described using an abstract SIMD-type model of a STAR machine. On the basis of this model, a number of efficient algorithms are developed to solve many graph problems. Since the associative architectures are not widely available, however, these algorithms cannot be used in practice. With advances in the production of GPU, the possibilities to implement the associative parallel models without significant loss of efficiency are increased. As the first stage in the implementation of the STAR-machine on GPU in the form of a CUDA library, specific data types and simple operations of the STAR language were developed. In this paper, we consider an efficient GPU implementation of the standard associative procedure library. The runtime of this implementation is compared with the runtime of similar procedures in the standard libraries (STL on CPU and CUDA thrust on GPU). We plan to use our library implementation to solve graph problems.
Section
Section 1. Numerical methods and applications
References
- J. A. Rudolph, “A Production Implementation of an Associative Array Processor: STARAN,” in Proc. Fall Joint Computer Conf., Part I, Anaheim, USA, December 5-7, 1972 (ACM Press, New York, 1972), pp. 229-241.
- C. C. Foster, Content Addressable Parallel Processors (Reinhold, New York, 1976).
- K. E. Batcher, “Bit-Serial Parallel Processing Systems,” IEEE Trans. Comput. 31 (5), 377-384 (1982).
- L. M. Uhr, Algorithm-Structured Computer Arrays and Networks: Architectures and Processes for Images, Precepts, Models, Information (Academic, Orlando, 1984).
- T. Higuchi, H. Kitano, T. Furuya, et al., “IXM2: A Parallel Associative Processor for Knowledge Processing,” in Proc. 9th Nat. Conf. on Artificial Intelligence, Anaheim, USA, July 14-19, 1991 (AAAI Press, Palo Alto, 1991), Vol. 1, pp. 296-303.
- D. E. Smith, J. S. Hall, and K. Miyake, Rutger’s CAM2000 Chip Architecture , Technical Report LCSR-TR-196 (Rutgers University, New Brunswick, 1993).
- C.-H. Hsu, D. Smith, and S. Levy, Linear-C: A Data-Parallel Extension to C , Technical Report LCSR-TR-273 (Rutgers University, New Brunswick, 1996).
- G. Aad, E. Abat, J. Abdallah, et al., “The ATLAS Experiment at the CERN Large Hadron Collider,” J. Instr. 3 (2008).
doi 10.1088/1748-0221/3/08/S08003
- A. Annovi, M. Beretta, E. Bossini, et al., “Associative Memory Design for the FastTrack Processor (FTK) at ATLAS,” in Proc. 17th IEEE-NPSS Real Time Conference, Lisbon, Portugal, May 24-28, 2010 , IEEE Press,
doi 10.1109/RTC.2010.5750451
- H. Kitano, “An Implementation on the IXM2 Associative Memory,” in Speech-to-Speech Translation: A Massively Parallel Memory-Based Approach (Springer, Boston, 1994), pp. 135-155.
- H. Kitano, T. Higuchi, and M. Tomita, “Massively Parallel Spoken Language Processing uusing a Parallel Associative Processor IXM2,” in Proc. First Int. Conf. on Spoken Language Processing, Kobe, Japan, November 18-22, 1990 ,
http://www.isca-speech.org/archive/icslp_1990 . Cited February 20, 2018.
- K. Oi, E. Sumita, O. Furuse, et al., “Real-Time Spoken Language Translation Using Associative Processors,” in Proc. Fourth Conference on Applied Natural Language Processing, Stuttgart, Germany, October 13-15, 1994 (ACL Press, Stroudsburg, 1994), pp. 101-106.
- A. Sh. Nepomniaschaya and M. A. Vladyko, “A Comparison of Associative Computation Models,” Programmirovanie, No. 6, 41-50 (1997) [Program. Comput. Software 23 (6), 319-324 (1997)].
- A. Sh. Nepomniaschaya, “Language STAR for Associative and Parallel Computation with Vertical Data Processing,” in Parallel Computing Technologies (World Scientific, Singapore, 1991), pp. 258-265.
- J. L. Potter, Associative Computing: A Programming Paradigm for Massively Parallel Computers (Perseus Publ., Basel, 1991).
- J. Potter, J. Baker, S. Scott, et al., “ASC: An Associative-Computing Paradigm,” Computer 27 (11), 19-25 (1994).
- A. S. Nepomniaschaya, “Solution of Path Problems Using Associative Parallel Processors,” in Proc. Int. Conf. on Parallel and Distributed Systems, Seoul, South Korea, December 10-13, 1997 (IEEE Press, New York, 1997), pp. 610-617.
- A. S. Nepomniaschaya and M. A. Dvoskina, “A Simple Implementation of Dijkstra’s Shortest Path Algorithm on Associative Parallel Processors,” Fundam. Inform. 43 (1-4), 227-243 (2000).
- A. S. Nepomniaschaya, “An Associative Version of the Bellman-Ford Algorithm for Finding the Shortest Paths in Directed Graphs,” in Lecture Notes in Computer Science (Springer, Berlin, 2001), Vol. 2127, pp. 285-292.
- A. S. Nepomniaschaya, “Efficient Implementation of Edmonds’ Algorithm for Finding Optimum Branchings on Associative Parallel Processors,” in Proc. 8th Int. Conf. on Parallel and Distributed Systems, Kyongju City, South Korea, June 29-29, 2001 (IEEE Press, New York, 2001),
doi 10.1109/ICPADS.2001.934794
- A. S. Nepomniaschaya, “Comparison of the Prim-Dijkstra and Kraskal Algorithms on an Associative Parallel Processor,” Kibernet. Sist. Anal., No. 2, 19-27 (2000) [Cybernet. Systems Anal. 36 (2), 162-169 (2000)].
- A. S. Nepomniaschaya, “Representation of the Gabow Algorithm for Finding Smallest Spanning Trees with a Degree Constraint on Associative Parallel Processors,” in Lecture Notes in Computer Science (Springer, Berlin, 2001), Vol. 1123, pp. 813-817.
- A. S. Nepomniaschaya, “Associative Parallel Algorithms for Dynamic Edge Update of Minimum Spanning Trees,” in Lecture Notes in Computer Science (Springer, Berlin, 2003), Vol. 2763, pp. 141-150.
- A. Nepomniaschaya, “Associative Parallel Algorithm for Dynamic Reconstruction of a Minimum Spanning Tree After Deletion of a Vertex,” in Lecture Notes in Computer Science (Springer, Berlin, 2005), Vol. 3606, pp. 159-173.
- A. Sh. Nepomniaschaya, “Associative Parallel Algorithm for Dynamic Update of a Minimum Spanning Tree after Addition of a New Node to a Graph,” Kibernet. Sist. Anal., No. 2, 19-31 (2006) [Cybernet. Systems Anal. 36 (2), 162-169 (2006)].
- A. Nepomniaschaya, “Efficient Implementation of the Italiano Algorithms for Updating the Transitive Closure on Associative Parallel Processors,” Fundam. Inform. 89 (2-3), 313-329 (2008).
- A. Nepomniaschaya, “Associative Version of the Ramalingam Decremental Algorithm for Dynamic Updating the Single-Sink Shortest-Paths Subgraph,” in Lecture Notes in Computer Science (Springer, Berlin, 2009), Vol. 5698, pp. 257-268.
- A. S. Nepomniaschaya, “Associative Version of the Ramalingam Algorithm for Dynamically Updating the Shortest-Path Subgraph after Inserting a New Edge into a Graph,” Kibernet. Sist. Anal., No. 3, 45-57 (2012) [Cybernet. Systems Anal. 48 (3), 358-368 (2012)].
- R. A. Walker, J. L. Potter, Y. Wang, and M. Wu, “Implementing Associative Processing: Rethinking Earlier Architectural Decisions,” in Proc. 15th Int. Parallel and Distributed Processing Symposium, San Francisco, USA, April 23-27, 2000 (IEEE Press, New York, 2001), pp. 2092-2100.
- J. L. Trahan, M. Jin, W. Chantamas, and J. W. Baker, “Relating the Power of the Multiple Associative Computing (MASC) Model to That of Reconfigurable Bus-Based Models,” J. Parallel Distrib. Comput. 70 (5), 458-466 (2010).
- M. Jin, “Associative Operations from MASC to GPU,” in Proc. 21st Int. Conf. on Parallel and Distributed Processing Techniques and Applications, Las Vegas, USA, July 27-30, 2015 (CSREA Press, Red Hook, 2015), pp. 388-393.
- T. V. Snytnikova and A. Sh. Nepomniaschaya, “Solution of Graph Problems by Means of the STAR-Machine Being Implemented on GPUs,” Prikl. Diskr. Mat. 3 (33), 98-115 (2016).
- A. S. Nepomniaschaya, “Basic Associative Parallel Algorithms for Vertical Processing Systems,” Bull. Novosibirsk Comput. Center, No. 9, 63-77 (2009).
- A. S. Nepomniaschaya and M. A. Dvoskina, “A Simple Implementation of Dijkstra’s Shortest Path Algorithm on Associative Parallel Processors,” Fundam. Inform. 43 (1-4), 227-243 (2000).
- B. Stroustrup, The C++ Programming Language, Special Edition (Addison-Wesley, Reading, 2000; Binom, Moscow, 2012).
- Realeases-thrust/thrust-GitHub.
https://github.com/thrust/thrust/releases . Cited February 20, 2018.
- S. Masi, Periodic Report Summary 1 - FTK (Fast Tracker for Hadron Collider Experiments) , Technical Report 324318.
http://cordis.europa.eu/result/rcn/167746_en.html.
- A. Sh. Nepomniaschaya and T. V. Snytnikova, “Associative Parallel Algorithm for Dynamic Reconstruction of the Shortest Paths Tree after Insertion of a Vertex,” Bull. Novosibirsk Comput. Center, No. 24, 89-103 (2006).