A parallel discord discovery algorithm for time series on many-core accelerators

Authors

  • M.L. Zymbler South Ural State University

DOI:

https://doi.org/10.26089/NumMet.v20r320

Keywords:

time series, discord discovery, parallel algorithm, vectorization, OpenMP, OpenAcc, Intel Xeon Phi, NVIDIA GPU

Abstract

Discord is a refinement of the concept of anomalous subsequence of a time series. The discord discovery problem frequently occurs in a wide range of application areas related to time series: medicine, economics, climate modeling, etc. In this paper we propose a new parallel discord discovery algorithm for many-core systems in the case when the input data fit in the main memory. The algorithm exploits the ability to independently calculate the Euclidean distances between the subsequences of the time series. Computations are paralleled using OpenMP and OpenAcc for the Intel MIC (Many Integrated Core) and NVIDIA GPU platforms, respectively. The algorithm consists of two stages, namely precomputations and discovery. At the precomputation stage, we construct the auxiliary matrix data structures to ensure the efficient vectorization of computations on an accelerator. At the discovery stage, the algorithm searches for a discord based on the constructed structures. A number of numerical experiments confirm a high scalability of the proposed algorithm.

Author Biography

M.L. Zymbler

References

  1. Воеводин В.В., Воеводин Вл.В. Параллельные вычисления. СПб.: БХВ-Петербург, 2002.
  2. Перечень оборудования Центра коллективного пользования Сибирского суперкомпьютерного центра ИВМиМГ СО РАН. http://www.sscc.icmmg.nsc.ru/hardware.html.
  3. Ameen J., Basha R. Mining time series for identifying unusual sub-sequences with applications // Proc. of the 1st Int. Conf. on Innovative Computing, Information and Control (ICICIC 2006). New York: IEEE Press, 2006. 574-577.
  4. Bacon D.F., Graham S.L., Sharp O.J. Compiler transformations for high-performance computing // ACM Comput. Surv. 1994. Vol. 26, N 4. 345-420.
  5. Chuah M.C., Fu F. ECG anomaly detection via time series analysis // Lecture Notes in Computer Science. Vol. 4743. Berlin: Springer, 2007. 123-135.
  6. Dean J., Ghemawat S. MapReduce: simplified data processing on large clusters // Commun. ACM. 2008. Vol. 51, N 1. 107-113.
  7. Duran A., Klemm M. The Intel Many Integrated Core architecture // Proc. of the 2012 Int. Conf. on High Performance Computing and Simulation. New York: IEEE Press, 2012. 365-366.
  8. Fredkin E. Trie memory // Commun. of the ACM. 1960. Vol. 3, N 9. 490-499.
  9. Fu A.W.-C., Leung O.T.-W., Keogh E., Lin J. Finding time series discords based on Haar transform // Lecture Notes in Computer Science. Vol. 4093. Berlin: Springer, 2006. 31-41.
  10. Goldreich O. Computational complexity: a conceptual perspective. New York: Cambridge University Press, 2008.
  11. Huang T., Zhu Y., Mao Y., et al. Parallel discord discovery // Lecture Notes in Computer Science. Vol. 9652. Cham: Springer, 2016. 233-244.
  12. Keogh E., Lin J., Fu A. HOT SAX: efficiently finding the most unusual time series subsequence // Proc. of the 5th IEEE Int. Conf. on Data Mining. New York: IEEE Press, 2005. 226-233.
  13. Keogh E., Lonardi S., Ratanamahatana C.A. Towards parameter-free data mining // Proc. of the 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2004. 206-215.
  14. Knuth D.E. The art of computer programming. Vol. 4. Fascicle 3: generating all combinations and partitions. Reading: Addison-Wesley, 2005.
  15. Li G., Braysy O., Jiang L., et al. Finding time series discord based on bit representation clustering // Knowl.-Based Syst. 2013. Vol. 54. 243-254.
  16. Lin J., Keogh E., Lonardi S., Chiu B. A symbolic representation of time series, with implications for streaming algorithms // Proc. of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. New York: ACM Press, 2004. 2-11.
  17. Mattson T. Introduction to OpenMP // Proc. of the ACM/IEEE SC2006 Conf. on Supercomputing. New York: ACM Press, 2006. doi 10.1145/1188455.1188673.
  18. Owens J. GPU architecture overview // Proc. of the Int. Conf. on Computer Graphics and Interactive Techniques. New York: ACM Press, 2007. doi 10.1145/1281500.1281643.
  19. Reyes R., Lopez I., Fumero J.J., de Sande F. A preliminary evaluation of OpenACC implementations // The Journal of Supercomputing. 2013. Vol. 65, N 3. 1063-1075.
  20. Thuy H.T. T., Anh D.T., Chau V.T. N. An effective and efficient hash-based algorithm for time series discord discovery // Proc. of the 2016 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science. New York: IEEE Press, 2016. 85-90.
  21. Wu Y., Zhu Y., Huang T., et al. Distributed discord discovery: Spark based anomaly detection in time series // Proc. of the 17th IEEE Int. Conf. on High Performance Computing and Communications. New York: IEEE Press, 2015. 154-159.
  22. Yankov D., Keogh E.J., Rebbapragada U. Disk aware discord discovery: finding unusual time series in terabyte sized datasets // Proc. of the 7th IEEE Int. Conf. on Data Mining. New York: IEEE Press, 2007. 381-390.
  23. Yankov D., Keogh E.J., Rebbapragada U. Disk aware discord discovery: finding unusual time series in terabyte sized datasets // Knowl. Inf. Syst. 2008. Vol. 17, N 2. 241-262.
  24. Zaharia M., Chowdhury M., Franklin M.J., et al. Spark: Cluster computing with working sets // Proc. of the 2nd USENIX Workshop on Hot Topics in Cloud Computing. Berkeley: USENIX Association, 2010.

Published

2019-06-25

How to Cite

Цымблер М.Л. A Parallel Discord Discovery Algorithm for Time Series on Many-Core Accelerators // Numerical methods and programming. 2019. 20. 211-223. doi 10.26089/NumMet.v20r320

Issue

Section

Section 1. Numerical methods and applications