Anomaly detection in long time series on high-performance cluster with GPUs
Keywords:time series, anomaly detection, discord, parallel algorithm, computer cluster, GPU, CUDA, DRAG, MERLIN, PD3, PALMAD
Currently, the discovery of anomalies in long time series occurs in a wide range of subject areas: digital industry, healthcare, climate modeling, financial analytics, etc. Discord formalizes the anomaly concept being defined as a time series subsequence that has a distance of at least r to its non-overlapping nearest neighbor, where r is a prespecified threshold. This article presents a new algorithm for discord discovery on a high-performance computing cluster, where each cluster node is equipped with a GPU. The algorithm employs the data parallelism concept: the time series is divided into disjoint fragments that are processed separetely by GPUs of the cluster nodes. Using a parallel algorithm previously developed by the authors, local candidates for discords are selected at each node. Further, through the data exchanges, a set of global candidates is formed at each node as a union of all local candidates. Then each node performs a global refinement, removing false-positive discords from the global candi date set. Global refinement is parallelized based on block multiplication of the candidate matrix and the subsequence matrix of the fragment. The resulting set of discords is formed as the intersection of the sets obtained by the nodes as a result of global refinement. Computational experiments with synthetic and real time series, carried out on the Lomonosov-2 and Lobachevsky supercomputers equipped with 48–64 GPUs, show the high scalability of the developed algorithm.
- A. Blázquez-Garcıa, A. Conde, U. Mori, and J. A. Lozano, “A Review on Outlier/Anomaly Detection in Time Series Data,” ACM Comput. Surv. 54 (3), 56: 1-56: 33 (2021).
- V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection: A Survey,” ACM Comput. Surv. 41 (3), 15: 1-15: 58 (2009).
- V. Chandola, D. Cheboli, and V. Kumar, Detecting Anomalies in a Time Series Database , Technical Report TR 09-004 (University of Minnesota, Minneapolis, 2009).
https://hdl.handle.net/11299/215791 . Cited August 23, 2023.
- J. Lin, E. Keogh, A. Fu, and H. V. Herle, “Approximations to Magic: Finding Unusual Medical Time Series,” in Proc. 18th IEEE Symposium on Computer-Based Medical Systems (CBMS 2005), Dublin, Ireland, June 23-24, 2005 (IEEE Press, Washington D.C., 2005), pp. 329-334.
- A. L. Goldberger, L. A. N. Amaral, L. Glass, et al., “PhysioBank, PhysioToolkit, and PhysioNet Components of a New Research Resource for Complex Physiologic Signals,” Circulation 101 (23), e215-e220 (2000).
- T. Nakamura, M. Imamura, R. Mercer, and E. Keogh, “MERLIN: Parameter-Free Discovery of Arbitrary Length Anomalies in Massive Time Series Archives,” in Proc. 20th IEEE Int. Conf. on Data Mining (ICDM 2020), Sorrento, Italy, November 17-20, 2020 (IEEE Press, New York, 2020), pp. 1190-1195.
- M. Zymbler and Y. Kraeva, “High-Performance Time Series Anomaly Discovery on Graphics Processors,” Mathematics 11 (14), Article Number 3193 (2023).
- J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,” in Proc. of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2003), San Diego, California, USA, June 13, 2003 (ACM Press, New York, 2003), pp. 2-11.
- M. L. Zymbler, “A Parallel Discord Discovery Algorithm for Time Series on Many-Core Accelerators,” Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie) 20 (3), 211-223 (2019).
- D. Yankov, E. Keogh, and U. Rebbapragada, “Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets,” in Proc. of the 7th IEEE Int. Conf. on Data Mining (ICDM 2007), Omaha, Nebraska, USA, October 28-31, 2007 (IEEE Press, New York, 2007), pp. 381-390.
- D. Yankov, E. Keogh, and U. Rebbapragada, “Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets,” Knowl. Inf. Syst. 17 (2), 241-262 (2008).
- Ya. A. Kraeva and M. L. Zymbler, “A Parallel Discord Discovery Algorithm for a Graphics Processor,” Pattern Recognit. Image Anal. 33 (2), 101-112 (2023).
- M. Zymbler, A. Grents, Y. Kraeva, and S. Kumar, “A Parallel Approach to Discords Discovery in Massive Time Series Data,” Comput. Mater. Contin. 66 (2), 1867-1878, 2021.
- Y. Wu, Y. Zhu, T. Huang, et al., “Distributed Discord Discovery: Spark Based Anomaly Detection in Time Series,” in 17th IEEE Int. Conf. on High Performance Computing and Communications (HPCC 2015), 7th IEEE Int. Symposium on Cyberspace Safety and Security (CSS 2015), and 12th IEEE Int. Conf. on Embedded Software and Systems (ICESS 2015), New York, USA, August 24-26, 2015 (IEEE Press, New York, 2015), pp. 154-159.
- T. Huang, Y. Zhu, Y. Mao, et al., “Parallel Discord Discovery,” in Lecture Notes in Computer Science (Springer, Cham, 2016), Vol. 9652, pp. 233-244.
- A. Mueen, S. Nath, and J. Liu, “Fast Approximate Correlation for Massive Time-Series Data,” in Proc. of the ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 2010), Indianapolis, Indiana, USA, June 6-10, 2010 (ACM Press, New York, 2010), pp. 171-182.
- Individual Household Electric Power Consumption.
https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption/. Cited August 24, 2023.
- M. Thill, W. Konen, and T. Bäck, “MarkusThill/MGAB: The Mackey-Glass Anomaly Benchmark. Version v1.0.1,” Zenodo, 2020.
- M. C. Mackey and L. Glass, “Oscillation and Chaos in Physiological Control Systems,” Science 197 (4300), 287-289 (1977).
- Vl. V. Voevodin, A. S. Antonov, D. A. Nikitenko, et al., “Supercomputer Lomonosov-2: Large Scale, Deep Monitoring and Fine Analytics for the User Community,” Supercomput. Front. Innov. 6 (2), 4-11 (2019).
- Z. Zimmerman, K. Kamgar, N. S. Senobari, et al., “Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond,” in Proc. of the ACM Symposium on Cloud Computing (SoCC 2019), Santa Cruz, CA, USA, November 20-23, 2019. (ACM Press, New York, 2019), pp. 74-86.
- C.-C. M. Yeh, Y. Zhu, L. Ulanova, et al., “Time Series Joins, Motifs, Discords and Shapelets: A Unifying View that Exploits the Matrix Profile,” Data Min. Knowl. Discov. 32 (1), 83-123 (2018).
How to Cite
Copyright (c) 2023 Я. А. Краева, М. Л. Цымблер
This work is licensed under a Creative Commons Attribution 4.0 International License.