Imputation of missing values of a time series based on joint application of analytical algorithms and neural networks
Authors
-
Mikhail L. Zymbler
-
Alexey A. Yurtin
Keywords:
time series
imputation of missing values
time series snippets
MPdist measure
recurrent neural network
Abstract
Currently, time series data are processed in a wide range of scientific and practical applications, where the imputation of points or blocks missing due to hardware/software failures or the human factor is topical. In the article, we present the SANNI (Snippet and Artificial Neural Network-based Imputation) method to recover the missing values of the time series processed offline. SANNI includes two neural network models, namely Recognizer and Reconstructor. The Recognizer determines the snippet (typical subsequence) of the time series that a given subsequence with a missing point is the most similar to. The Recognizer consists of the three groups of layers: convolutional, recurrent, and fully connected. The Reconstructor, using the Recognizer’s output and a subsequence with a missing point, restores the missing point. The Reconstructor consists of three groups of layers: convolutional, recurrent, and fully connected. The topology of the Recognizer and Reconstructor layers is parameterized with respect to the snippet length. We also present a way to prepare training sets for the Recognizer and Reconstructor. Our computational experiments showed that among the state-of-the-art analytical and neural network imputation methods, SANNI is among the top three.
Section
Methods and algorithms of computational mathematics and their applications
References
- S. A. Ivanov, K. Yu. Nikolskaya, G. I. Radchenko, et al., “Digital Twin of a City: Concept Overview,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 9 (4), 5-23 (2020).
doi 10.14529/cmse200401
- M. L. Zymbler, Ya. A. Kraeva, E. A. Latypova, et al., “Cleaning Sensor Data in Intelligent Heating Control System,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 10 (3), 16-36 (2021).
doi 10.14529/cmse210302
- V. V. Epishev, A. P. Isaev, R. M. Miniakhmetov, et al., “Physiological Data Mining System for Elite Sports,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 2 (1), 44-54 (2013).
doi 10.14529/cmse130105
- S. M. Abdullaev, O. Yu. Lenskaia, A. O. Gayazova, et al., “Short-Range Forecasting Algorithms Using Radar Data: Translation Estimate and Life-Cycle Composite Display,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 3 (1), 17-32 (2014).
doi 10.14529/cmse140102
- M. M. Dyshaev and I. M. Sokolinskaya, “Representation of Trading Signals Based on Kaufman Adaptive Moving Average as a System of Linear Inequalities,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 2 (4), 103-108 (2013).
doi 10.14529/cmse130408
- M. Khayati, A. Lerner, Z. Tymchenko, and P. Cudré-Mauroux, “Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series,” Proc. VLDB Endow. 13 (5), 768-782 (2020).
doi 10.14778/3377369.3377383
- F. A. Adnan, K. R. Jamaludin, W. Z. A. W. Muhamad, and S. Miskon, “A Review of the Current Publication Trends on Missing Data Imputation over Three Decades: Direction and Future Research,” Neural Comput. Appl. 34 (21), 18325-18340 (2022).
doi 10.1007/s00521-022-07702-7
- M. L. Zymbler, V. A. Polonsky, and A. A. Yurtin, “On One Method of Imputation Missing Values of a Streaming Time Series in Real Time,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 10 (4), 5-25 (2021).
doi 10.14529/cmse210401
- S. Imani, F. Madrid, W. Ding, et al., “Matrix Profile XIII: Time Series Snippets: A New Primitive for Time Series Data Mining,” in Proc. 9th IEEE Int. Conf. on Big Knowledge (ICBK), Singapore, November 17-18, 2018 (IEEE Press, New York, 2018), pp. 382-389.
doi 10.1109/ICBK.2018.00058
- W. Cao, D. Wang, J. Li, et al., “BRITS: Bidirectional Recurrent Imputation for Time Series,” in Proc. 32nd Conf. on Neural Inf. Proc. Systems (NeurIPS 2018), Montréal, Canada, December 3-8, 2018.
https://proceedings.neurips.cc/paper_files/paper/2018/file/734e6bfcd358e25ac1db0a4241b95651-Paper.pdf . Cited June 18, 2023.
- J. Yoon, W. R. Zame, and M. van der Schaar, “Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks,” IEEE Trans. Biomed. Eng. 66 (5), 1477-1490 (2019).
doi 10.1109/TBME.2018.2874712
- Y. Liu, R. Yu, S. Zheng, et al., “NAOMI: Non-Autoregressive Multiresolution Sequence Imputation,” in Proc. 33rd Int. Conf. on Neural Inf. Proc. Systems (NeurIPS 2019), Vancouver, Canada, December 8-14, 2019.
https://proceedings.neurips.cc/paper_files/paper/2019/file/50c1f44e426560f3f2cdcb3e19e39903-Paper.pdf . Cited June 18, 2023.
- W. Du, D. Côté, and Y. Liu, “SAITS: Self-Attention-Based Imputation for Time Series,” Expert Syst. Appl. 219, Article Number 119619 (2023).
doi 10.1016/j.eswa.2023.119619
- J. Yoon, J. Jordon, and M. van der Schaar, GAIN: Missing Data Imputation Using Generative Adversarial Nets , arXiv preprint: 1806.02920v1 [cs.LG] (Cornell Univ. Library, Ithaca, 2018).
https://arxiv.org/abs/1806.02920 . Cited June 18, 2023.
- Y. Luo, X. Cai, Y. Zhang, et al., “Multivariate Time Series Imputation with Generative Adversarial Networks,” in Proc. 32nd Conf. on Neural Inf. Proc. Systems (NeurIPS 2018), Montréal, Canada, December 3-8, 2018.
https://proceedings.neurips.cc/paper/2018/file/96b9bff013acedfb1d140579e2fbeb63-Paper.pdf . Cited June 18, 2023.
- Z. Guo, Y. Wan, and H. Ye, “A Data Imputation Method for Multivariate Time Series Based on Generative Adversarial Network,” Neurocomputing 360, 185-197 (2019).
doi 10.1016/j.neucom.2019.06.007
- Y. Luo, Y. Zhang, X. Cai, and X. Yuan, “E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation,” in Proc. 28th Int. Joint Conf. on Artificial Intelligence, Macao, China, August 10-16, 2019 (AAAI Press, Washington, DC, 2019), pp. 3094-3100.
doi 10.24963/ijcai.2019/429
- S. Gharghabi, S. Imani, A. Bagnall, et al., “An Ultra-Fast Time Series Distance Measure to Allow Data Mining in More Complex Real-World Deployments,” Data Min. Knowl. Disc. 34, 1104-1135 (2020).
doi 10.1007/s10618-020-00695-8
- C.-C. M. Yeh, Y. Zhu, L. Ulanova, et al., “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets,” in Proc. IEEE 16th Int. Conf. on Data Mining (ICDM), Barcelona, Spain, December 12-15, 2016 (IEEE Press, New York, 2017), pp. 1317-1322.
doi 10.1109/ICDM.2016.0179
- M. L. Zymbler and A. I. Goglachev, “Discovery of Typical Subsequences of Time Series on Graphical Processor,” Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie). 22 (4), 344-359 (2021).
doi 10.26089/NumMet.v22r423
- J. Sola and J. Sevilla, “Importance of Input Data Normalization for the Application of Neural Networks to Complex Industrial Problems,” IEEE Trans. Nucl. Sci. 44 (3), 1464-1468 (1997).
doi 10.1109/23.589532
- L. Huang, Normalization Techniques in Deep Learning (Springer, Cham, 2022).
doi 10.1007/978-3-031-14595-7
- E. M. Reingold, J. Nievergelt, and N. Deo, Combinatorial Algorithms: Theory and Practice (Prentice-Hall, Englewood Cliffs, 1977; Mir, Moscow, 1980).
- S. Hochreiter, “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 6 (2), 107-116 (1998).
doi 10.1142/S0218488598000094
- L. Lu, Y. Shin, Y. Su, and G. E. Karniadakis, “Dying ReLU and Initialization: Theory and Numerical Examples,” Commun. Comput. Phys. 28, 1671-1706 (2020).
doi 10.4208/cicp.OA-2020-0165
- R. V. Bilenko, N. Yu. Dolganina, E. V. Ivanova, and A. I. Rekachinsky, “High-Performance Computing Resources of South Ural State University,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 11 (1), 15-30 (2022).
doi 10.14529/cmse220102
- I. Laña, I. Olabarrieta, M. Vélez, and J. Del Ser, “On the Imputation of Missing Data for Road Traffic Forecasting: New Insights and Novel Techniques,” Transp. Res. Part C Emerg. Technol. 90, 18-33 (2018).
doi 10.1016/j.trc.2018.02.021
- A. Reiss and D. Stricker, “Introducing a New Benchmarked Dataset for Activity Monitoring,” in Proc. 16th Int. Symposium on Wearable Computers, Newcastle, United Kingdom, June 18-22, 2012 (IEEE Press, New York, 2012), pp. 108-109.
doi 10.1109/ISWC.2012.13
- L. Biewald, “Experiment Tracking with Weights and Biases,” Software available from wandb.com:
https://docs.wandb.ai/.Cited June 15, 2023.
- X. Shu, F. Porikli, and N. Ahuja, “Robust Orthonormal Subspace Learning: Efficient Recovery of Corrupted Low-Rank Matrices,” in 2014 IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, USA, June 23-28, 2014 (IEEE Press, New York, 2014), pp. 3874-3881.
doi 10.1109/CVPR.2014.495
- L. Li, J. McCann, N. S. Pollard, and C. Faloutsos, “DynaMMo: Mining and Summarization of Coevolving Sequences with Missing Values,” in Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Paris, France, June 28-July 1, 2009 (ACM Press, New York, 2009), pp. 507-516.
doi 10.1145/1557019.1557078
- M. Khayati, P. Cudré-Mauroux, and M. H. Böhlen, “Scalable Recovery of Missing Blocks in Time Series with High and Low Cross-Correlations,” Knowl. Inf. Syst. 62 (6), 2257-2280 (2020).
doi 10.1007/s10115-019-01421-7
- D. Zhang and L. Balzano, “Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation,” in Proc. 19th Int. Conf. on Artificial Intelligence and Statistics, Cadiz, Spain, May 9-11, 2016. Volume 51, 1460-1468 (2016).
http://proceedings.mlr.press/v51/zhang16b.pdf . Cited June 15, 2023.
- R. Mazumder, T. Hastie, and R. Tibshirani, “Spectral Regularization Algorithms for Learning Large Incomplete Matrices,” J. Mach. Learn. Res. 11, Article Number 80, 2287-2322 (2010).
https://www.jmlr.org/papers/volume11/mazumder10a/mazumder10a.pdf . Cited June 15, 2023.
- O. Troyanskaya, M. Cantor, G. Sherlock, et al., “Missing Value Estimation Methods for DNA Microarrays,” Bioinformatics 17 (6), 520-525 (2001).
doi 10.1093/bioinformatics/17.6.520
- J. Mei, Y. de Castro, Y. Goude, and G. Hébrail, “Nonnegative Matrix Factorization for Time Series Recovery from a Few Temporal Aggregates,” in Proc. 34th Int. Conf. on Machine Learning, Sydney, Australia, August 6-11, 2017. Volume 70, 2382-2390 (2017).
https://dl.acm.org/doi/10.5555/3305890.3305927 . Cited June 15, 2023.
- H.-F. Yu, N. Rao, and I. S. Dhillon, “Temporal Regularized Matrix Factorization for High-Dimensional Time Series Prediction,” in Proc. Annual Conf. on Neural Information Processing Systems, Barcelona, Spain, December 5-10, 2016.
https://dl.acm.org/doi/abs/10.5555/3157096.3157191 . Cited June 15, 2023.
- B. D. Minor, J. R. Doppa, and D. J. Cook, “Learning Activity Predictors from Sensor Data: Algorithms, Evaluation, and Applications,” IEEE Trans. Knowl. Data Eng. 29 (12), 2744-2757 (2017).
doi 10.1109/TKDE.2017.2750669