A parallel discord discovery algorithm for time series on many-core accelerators
Keywords:time series, discord discovery, parallel algorithm, vectorization, OpenMP, OpenAcc, Intel Xeon Phi, NVIDIA GPU
Discord is a refinement of the concept of anomalous subsequence of a time series. The discord discovery problem frequently occurs in a wide range of application areas related to time series: medicine, economics, climate modeling, etc. In this paper we propose a new parallel discord discovery algorithm for many-core systems in the case when the input data fit in the main memory. The algorithm exploits the ability to independently calculate the Euclidean distances between the subsequences of the time series. Computations are paralleled using OpenMP and OpenAcc for the Intel MIC (Many Integrated Core) and NVIDIA GPU platforms, respectively. The algorithm consists of two stages, namely precomputations and discovery. At the precomputation stage, we construct the auxiliary matrix data structures to ensure the efficient vectorization of computations on an accelerator. At the discovery stage, the algorithm searches for a discord based on the constructed structures. A number of numerical experiments confirm a high scalability of the proposed algorithm.
- V. V. Voevodin and Vl. V. Voevodin, The Parallel Computing (BHV-Petersburg, St. Petersburg, 2002) [in Russian].
- Hardware Specifications of the Siberian Supercomputing Center.
http://www.sscc.icmmg.nsc.ru/hardware.html . Cited June 5, 2019.
- J. Ameen and R. Basha, “Mining Time Series for Identifying Unusual Sub-sequences with Applications,” in Proc. 1st Int. Conf. on Innovative Computing, Information and Control (ICICIC 2006), Beijing, China, August 30-September 1, 2006 (IEEE Press, New York, 2006). 574-577.
- D. F. Bacon, S. L. Graham, and O. J. Sharp, “Compiler Transformations for High-Performance Computing,” ACM Comput. Surv. 26 (4), 345-420 (1994).
- M. C. Chuah and F. Fu, “ECG Anomaly Detection via Time Series Analysis,” in Lecture Notes in Computer Science (Springer, Berlin, 2007), Vol. 4743, pp. 123-135.
- J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Commun. ACM 51 (1), 107-113 (2008).
- A. Duran and M. Klemm, “The Intel Many Integrated Core Architecture,” in Proc. 2012 Int. Conf. on High Performance Computing and Simulation, Madrid, Spain, July 2-6, 2012 (IEEE Press, New York, 2012), pp. 365-366.
- E. Fredkin, “Trie Memory,” Commun. ACM 3 (9), 490-499 (1960).
- A.W.-C. Fu, O.T.-W. Leung, E. Keogh, and J. Lin, “Finding Time Series Discords Based on Haar Transform,” in Lecture Notes in Computer Science (Springer, Berlin, 2006), Vol. 4093, pp. 31-41.
- O. Goldreich, Computational Complexity: A Conceptual Perspective (Cambridge Univ. Press, New York, 2008).
- T. Huang, Y. Zhu, Y. Mao, et al., “Parallel Discord Discovery,” in Lecture Notes in Computer Science (Springer, Cham, 2016), Vol. 9652, pp. 233-244.
- E. Keogh, J. Lin, and A. Fu, “HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence,” in Proc. 5th IEEE Int. Conf. on Data Mining, Houston, USA, November 27-30, 2005 (IEEE Press, New York, 2005), pp. 226-233.
- E. Keogh, S. Lonardi, and C. A. Ratanamahatana, “Towards Parameter-Free Data Mining,” in Proc. 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Seattle, USA, August 22-25, 2004 (ACM Press, New York, 2004), pp. 206-215.
- D. E. Knuth, The Art of Computer Programming , Vol. 4: Fascicle 3: Generating All Combinations and Partitions (Addison-Wesley, Reading, 2005).
- G. Li, O. Br854ysy, L. Jiang, et al., “Finding Time Series Discord Based on Bit Representation Clustering,” Knowl. Based Syst. 54, 243-254 (2013).
- J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,” in Proc. 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, USA, June 13, 2003 (ACM Press, New York, 2004), pp. 2-11.
- T. Mattson, “Introduction to OpenMP,” in Proc. 2006 ACM/IEEE Conf. on Supercomputing, Tampa, USA, November 11-17, 2006 (ACM Press, New York, 2006),
- J. Owens, “GPU Architecture Overview,” in Proc. Int. Conf. on Computer Graphics and Interactive Techniques, San Diego, USA, August 5-9, 2007 (ACM Press, New York, 2007),
- R. Reyes, I. López, J. J. Fumero, and F. de Sande, “A Preliminary Evaluation of OpenACC Implementations,” J. Supercomput. 65 (3), 1063-1075 (2013).
- H. T. T. Thuy, D. T. Anh, and V. T. N. Chau, “An Effective and Efficient Hash-Based Algorithm for Time Series Discord Discovery,” in Proc. 3rd National Foundation for Science and Technology Development Conf. on Information and Computer Science, Danang, Vietnam, September 14-16, 2016 (IEEE Press, New York, 2016), pp. 85-90.
- Y. Wu, Y. Zhu, T. Huang, et al., “Distributed Discord Discovery: Spark Based Anomaly Detection in Time Series,” in Proc. 17th IEEE Int. Conf. on High Performance Computing and Communications, New York, USA, August 24-26, 2015 (IEEE Press, New York, 2015), pp. 154-159.
- D. Yankov, E. Keogh, and U. Rebbapragada, “Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets,” in Proc. 7th IEEE Int. Conf. on Data Mining, Omaha, October 28-31, 2007 (IEEE Press, New York, 2007), pp. 381-390.
- D. Yankov, E. Keogh, and U. Rebbapragada, “Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets,” Knowl. Inf. Syst. 17 (2), 241-262 (2008).
- M. Zaharia, M. Chowdhury, M. J. Franklin, et al., “Spark: Cluster Computing with Working sets,” in Proc. 2nd USENIX Workshop on Hot Topics in Cloud Computing, Boston, USA, June 22-25, 2010 (USENIX Association, Berkeley, 2010),
https://www.usenix.org/legacy/event/hotcloud10/tech/full_papers/Zaharia.pdf|. Cited June 5, 2019.