Anomaly detection in long time series on high-performance cluster with GPUs
Authors
-
Yana A. Kraeva
-
Mikhail L. Zymbler
Keywords:
time series
anomaly detection
discord
parallel algorithm
computer cluster
GPU
CUDA
DRAG
MERLIN
PD3
PALMAD
Abstract
Currently, the discovery of anomalies in long time series occurs in a wide range of subject areas: digital industry, healthcare, climate modeling, financial analytics, etc. Discord formalizes the anomaly concept being defined as a time series subsequence that has a distance of at least r to its non-overlapping nearest neighbor, where r is a prespecified threshold. This article presents a new algorithm for discord discovery on a high-performance computing cluster, where each cluster node is equipped with a GPU. The algorithm employs the data parallelism concept: the time series is divided into disjoint fragments that are processed separetely by GPUs of the cluster nodes. Using a parallel algorithm previously developed by the authors, local candidates for discords are selected at each node. Further, through the data exchanges, a set of global candidates is formed at each node as a union of all local candidates. Then each node performs a global refinement, removing false-positive discords from the global candi date set. Global refinement is parallelized based on block multiplication of the candidate matrix and the subsequence matrix of the fragment. The resulting set of discords is formed as the intersection of the sets obtained by the nodes as a result of global refinement. Computational experiments with synthetic and real time series, carried out on the Lomonosov-2 and Lobachevsky supercomputers equipped with 48–64 GPUs, show the high scalability of the developed algorithm.
Section
Parallel software tools and technologies
References
- A. Blázquez-Garcıa, A. Conde, U. Mori, and J. A. Lozano, “A Review on Outlier/Anomaly Detection in Time Series Data,” ACM Comput. Surv. 54 (3), 56: 1-56: 33 (2021).
doi 10.1145/3444690
- V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection: A Survey,” ACM Comput. Surv. 41 (3), 15: 1-15: 58 (2009).
doi 10.1145/1541880.1541882
- V. Chandola, D. Cheboli, and V. Kumar, Detecting Anomalies in a Time Series Database , Technical Report TR 09-004 (University of Minnesota, Minneapolis, 2009).
https://hdl.handle.net/11299/215791 . Cited August 23, 2023.
- J. Lin, E. Keogh, A. Fu, and H. V. Herle, “Approximations to Magic: Finding Unusual Medical Time Series,” in Proc. 18th IEEE Symposium on Computer-Based Medical Systems (CBMS 2005), Dublin, Ireland, June 23-24, 2005 (IEEE Press, Washington D.C., 2005), pp. 329-334.
doi 10.1109/CBMS.2005.34
- A. L. Goldberger, L. A. N. Amaral, L. Glass, et al., “PhysioBank, PhysioToolkit, and PhysioNet Components of a New Research Resource for Complex Physiologic Signals,” Circulation 101 (23), e215-e220 (2000).
doi 10.1161/01.CIR.101.23.e215.
- T. Nakamura, M. Imamura, R. Mercer, and E. Keogh, “MERLIN: Parameter-Free Discovery of Arbitrary Length Anomalies in Massive Time Series Archives,” in Proc. 20th IEEE Int. Conf. on Data Mining (ICDM 2020), Sorrento, Italy, November 17-20, 2020 (IEEE Press, New York, 2020), pp. 1190-1195.
doi 10.1109/ICDM50108.2020.00147
- M. Zymbler and Y. Kraeva, “High-Performance Time Series Anomaly Discovery on Graphics Processors,” Mathematics 11 (14), Article Number 3193 (2023).
doi 10.3390/math11143193
- J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,” in Proc. of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2003), San Diego, California, USA, June 13, 2003 (ACM Press, New York, 2003), pp. 2-11.
doi 10.1145/882082.882086
- M. L. Zymbler, “A Parallel Discord Discovery Algorithm for Time Series on Many-Core Accelerators,” Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie) 20 (3), 211-223 (2019).
doi 10.26089/NumMet.v20r320
- D. Yankov, E. Keogh, and U. Rebbapragada, “Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets,” in Proc. of the 7th IEEE Int. Conf. on Data Mining (ICDM 2007), Omaha, Nebraska, USA, October 28-31, 2007 (IEEE Press, New York, 2007), pp. 381-390.
doi 10.1109/ICDM.2007.61
- D. Yankov, E. Keogh, and U. Rebbapragada, “Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets,” Knowl. Inf. Syst. 17 (2), 241-262 (2008).
doi 10.1007/s10115-008-0131-9
- Ya. A. Kraeva and M. L. Zymbler, “A Parallel Discord Discovery Algorithm for a Graphics Processor,” Pattern Recognit. Image Anal. 33 (2), 101-112 (2023).
doi 10.1134/S1054661823020062
- M. Zymbler, A. Grents, Y. Kraeva, and S. Kumar, “A Parallel Approach to Discords Discovery in Massive Time Series Data,” Comput. Mater. Contin. 66 (2), 1867-1878, 2021.
doi 10.32604/cmc.2020.014232
- Y. Wu, Y. Zhu, T. Huang, et al., “Distributed Discord Discovery: Spark Based Anomaly Detection in Time Series,” in 17th IEEE Int. Conf. on High Performance Computing and Communications (HPCC 2015), 7th IEEE Int. Symposium on Cyberspace Safety and Security (CSS 2015), and 12th IEEE Int. Conf. on Embedded Software and Systems (ICESS 2015), New York, USA, August 24-26, 2015 (IEEE Press, New York, 2015), pp. 154-159.
doi 10.1109/HPCC-CSS-ICESS.2015.228
- T. Huang, Y. Zhu, Y. Mao, et al., “Parallel Discord Discovery,” in Lecture Notes in Computer Science (Springer, Cham, 2016), Vol. 9652, pp. 233-244.
doi 10.1007/978-3-319-31750-2_19
- A. Mueen, S. Nath, and J. Liu, “Fast Approximate Correlation for Massive Time-Series Data,” in Proc. of the ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 2010), Indianapolis, Indiana, USA, June 6-10, 2010 (ACM Press, New York, 2010), pp. 171-182.
doi 10.1145/1807167.1807188
- Individual Household Electric Power Consumption.
https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption/. Cited August 24, 2023.
- M. Thill, W. Konen, and T. Bäck, “MarkusThill/MGAB: The Mackey-Glass Anomaly Benchmark. Version v1.0.1,” Zenodo, 2020.
doi 10.5281/ZENODO.3762385
- M. C. Mackey and L. Glass, “Oscillation and Chaos in Physiological Control Systems,” Science 197 (4300), 287-289 (1977).
doi 10.1126/science.267326
- Vl. V. Voevodin, A. S. Antonov, D. A. Nikitenko, et al., “Supercomputer Lomonosov-2: Large Scale, Deep Monitoring and Fine Analytics for the User Community,” Supercomput. Front. Innov. 6 (2), 4-11 (2019).
doi 10.14529/jsfi190201
- Z. Zimmerman, K. Kamgar, N. S. Senobari, et al., “Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond,” in Proc. of the ACM Symposium on Cloud Computing (SoCC 2019), Santa Cruz, CA, USA, November 20-23, 2019. (ACM Press, New York, 2019), pp. 74-86.
doi 10.1145/3357223.3362721
- C.-C. M. Yeh, Y. Zhu, L. Ulanova, et al., “Time Series Joins, Motifs, Discords and Shapelets: A Unifying View that Exploits the Matrix Profile,” Data Min. Knowl. Discov. 32 (1), 83-123 (2018).
doi 10.1007/s10618-017-0519-9