DOI: https://doi.org/10.26089/NumMet.v24r318

Imputation of missing values of a time series based on joint application of analytical algorithms and neural networks

Authors

  • Mikhail L. Zymbler
  • Alexey A. Yurtin

Keywords:

time series
imputation of missing values
time series snippets
MPdist measure
recurrent neural network

Abstract

Currently, time series data are processed in a wide range of scientific and practical applications, where the imputation of points or blocks missing due to hardware/software failures or the human factor is topical. In the article, we present the SANNI (Snippet and Artificial Neural Network-based Imputation) method to recover the missing values of the time series processed offline. SANNI includes two neural network models, namely Recognizer and Reconstructor. The Recognizer determines the snippet (typical subsequence) of the time series that a given subsequence with a missing point is the most similar to. The Recognizer consists of the three groups of layers: convolutional, recurrent, and fully connected. The Reconstructor, using the Recognizer’s output and a subsequence with a missing point, restores the missing point. The Reconstructor consists of three groups of layers: convolutional, recurrent, and fully connected. The topology of the Recognizer and Reconstructor layers is parameterized with respect to the snippet length. We also present a way to prepare training sets for the Recognizer and Reconstructor. Our computational experiments showed that among the state-of-the-art analytical and neural network imputation methods, SANNI is among the top three.


Published

2023-06-28

Issue

Section

Methods and algorithms of computational mathematics and their applications

Author Biographies

Mikhail L. Zymbler

South Ural State University (National Research University),
Scientific and Educational Center “Artificial Intelligence and Quantum Technologies”,
• Associate Professor, Deputy Director of the Center

Alexey A. Yurtin

South Ural State University (National Research University),
Big Data and Machine Learning Laboratory,
• Programmer


References

  1. S. A. Ivanov, K. Yu. Nikolskaya, G. I. Radchenko, et al., “Digital Twin of a City: Concept Overview,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 9 (4), 5-23 (2020).
    doi 10.14529/cmse200401
  2. M. L. Zymbler, Ya. A. Kraeva, E. A. Latypova, et al., “Cleaning Sensor Data in Intelligent Heating Control System,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 10 (3), 16-36 (2021).
    doi 10.14529/cmse210302
  3. V. V. Epishev, A. P. Isaev, R. M. Miniakhmetov, et al., “Physiological Data Mining System for Elite Sports,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 2 (1), 44-54 (2013).
    doi 10.14529/cmse130105
  4. S. M. Abdullaev, O. Yu. Lenskaia, A. O. Gayazova, et al., “Short-Range Forecasting Algorithms Using Radar Data: Translation Estimate and Life-Cycle Composite Display,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 3 (1), 17-32 (2014).
    doi 10.14529/cmse140102
  5. M. M. Dyshaev and I. M. Sokolinskaya, “Representation of Trading Signals Based on Kaufman Adaptive Moving Average as a System of Linear Inequalities,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 2 (4), 103-108 (2013).
    doi 10.14529/cmse130408
  6. M. Khayati, A. Lerner, Z. Tymchenko, and P. Cudré-Mauroux, “Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series,” Proc. VLDB Endow. 13 (5), 768-782 (2020).
    doi 10.14778/3377369.3377383
  7. F. A. Adnan, K. R. Jamaludin, W. Z. A. W. Muhamad, and S. Miskon, “A Review of the Current Publication Trends on Missing Data Imputation over Three Decades: Direction and Future Research,” Neural Comput. Appl. 34 (21), 18325-18340 (2022).
    doi 10.1007/s00521-022-07702-7
  8. M. L. Zymbler, V. A. Polonsky, and A. A. Yurtin, “On One Method of Imputation Missing Values of a Streaming Time Series in Real Time,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 10 (4), 5-25 (2021).
    doi 10.14529/cmse210401
  9. S. Imani, F. Madrid, W. Ding, et al., “Matrix Profile XIII: Time Series Snippets: A New Primitive for Time Series Data Mining,” in Proc. 9th IEEE Int. Conf. on Big Knowledge (ICBK), Singapore, November 17-18, 2018 (IEEE Press, New York, 2018), pp. 382-389.
    doi 10.1109/ICBK.2018.00058
  10. W. Cao, D. Wang, J. Li, et al., “BRITS: Bidirectional Recurrent Imputation for Time Series,” in Proc. 32nd Conf. on Neural Inf. Proc. Systems (NeurIPS 2018), Montréal, Canada, December 3-8, 2018.
    https://proceedings.neurips.cc/paper_files/paper/2018/file/734e6bfcd358e25ac1db0a4241b95651-Paper.pdf . Cited June 18, 2023.
  11. J. Yoon, W. R. Zame, and M. van der Schaar, “Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks,” IEEE Trans. Biomed. Eng. 66 (5), 1477-1490 (2019).
    doi 10.1109/TBME.2018.2874712
  12. Y. Liu, R. Yu, S. Zheng, et al., “NAOMI: Non-Autoregressive Multiresolution Sequence Imputation,” in Proc. 33rd Int. Conf. on Neural Inf. Proc. Systems (NeurIPS 2019), Vancouver, Canada, December 8-14, 2019.
    https://proceedings.neurips.cc/paper_files/paper/2019/file/50c1f44e426560f3f2cdcb3e19e39903-Paper.pdf . Cited June 18, 2023.
  13. W. Du, D. Côté, and Y. Liu, “SAITS: Self-Attention-Based Imputation for Time Series,” Expert Syst. Appl. 219, Article Number 119619 (2023).
    doi 10.1016/j.eswa.2023.119619
  14. J. Yoon, J. Jordon, and M. van der Schaar, GAIN: Missing Data Imputation Using Generative Adversarial Nets , arXiv preprint: 1806.02920v1 [cs.LG] (Cornell Univ. Library, Ithaca, 2018).
    https://arxiv.org/abs/1806.02920 . Cited June 18, 2023.
  15. Y. Luo, X. Cai, Y. Zhang, et al., “Multivariate Time Series Imputation with Generative Adversarial Networks,” in Proc. 32nd Conf. on Neural Inf. Proc. Systems (NeurIPS 2018), Montréal, Canada, December 3-8, 2018.
    https://proceedings.neurips.cc/paper/2018/file/96b9bff013acedfb1d140579e2fbeb63-Paper.pdf . Cited June 18, 2023.
  16. Z. Guo, Y. Wan, and H. Ye, “A Data Imputation Method for Multivariate Time Series Based on Generative Adversarial Network,” Neurocomputing 360, 185-197 (2019).
    doi 10.1016/j.neucom.2019.06.007
  17. Y. Luo, Y. Zhang, X. Cai, and X. Yuan, “E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation,” in Proc. 28th Int. Joint Conf. on Artificial Intelligence, Macao, China, August 10-16, 2019 (AAAI Press, Washington, DC, 2019), pp. 3094-3100.
    doi 10.24963/ijcai.2019/429
  18. S. Gharghabi, S. Imani, A. Bagnall, et al., “An Ultra-Fast Time Series Distance Measure to Allow Data Mining in More Complex Real-World Deployments,” Data Min. Knowl. Disc. 34, 1104-1135 (2020).
    doi 10.1007/s10618-020-00695-8
  19. C.-C. M. Yeh, Y. Zhu, L. Ulanova, et al., “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets,” in Proc. IEEE 16th Int. Conf. on Data Mining (ICDM), Barcelona, Spain, December 12-15, 2016 (IEEE Press, New York, 2017), pp. 1317-1322.
    doi 10.1109/ICDM.2016.0179
  20. M. L. Zymbler and A. I. Goglachev, “Discovery of Typical Subsequences of Time Series on Graphical Processor,” Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie). 22 (4), 344-359 (2021).
    doi 10.26089/NumMet.v22r423
  21. J. Sola and J. Sevilla, “Importance of Input Data Normalization for the Application of Neural Networks to Complex Industrial Problems,” IEEE Trans. Nucl. Sci. 44 (3), 1464-1468 (1997).
    doi 10.1109/23.589532
  22. L. Huang, Normalization Techniques in Deep Learning (Springer, Cham, 2022).
    doi 10.1007/978-3-031-14595-7
  23. E. M. Reingold, J. Nievergelt, and N. Deo, Combinatorial Algorithms: Theory and Practice (Prentice-Hall, Englewood Cliffs, 1977; Mir, Moscow, 1980).
  24. S. Hochreiter, “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 6 (2), 107-116 (1998).
    doi 10.1142/S0218488598000094
  25. L. Lu, Y. Shin, Y. Su, and G. E. Karniadakis, “Dying ReLU and Initialization: Theory and Numerical Examples,” Commun. Comput. Phys. 28, 1671-1706 (2020).
    doi 10.4208/cicp.OA-2020-0165
  26. R. V. Bilenko, N. Yu. Dolganina, E. V. Ivanova, and A. I. Rekachinsky, “High-Performance Computing Resources of South Ural State University,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 11 (1), 15-30 (2022).
    doi 10.14529/cmse220102
  27. I. Laña, I. Olabarrieta, M. Vélez, and J. Del Ser, “On the Imputation of Missing Data for Road Traffic Forecasting: New Insights and Novel Techniques,” Transp. Res. Part C Emerg. Technol. 90, 18-33 (2018).
    doi 10.1016/j.trc.2018.02.021
  28. A. Reiss and D. Stricker, “Introducing a New Benchmarked Dataset for Activity Monitoring,” in Proc. 16th Int. Symposium on Wearable Computers, Newcastle, United Kingdom, June 18-22, 2012 (IEEE Press, New York, 2012), pp. 108-109.
    doi 10.1109/ISWC.2012.13
  29. L. Biewald, “Experiment Tracking with Weights and Biases,” Software available from wandb.com:
    https://docs.wandb.ai/.Cited June 15, 2023.
  30. X. Shu, F. Porikli, and N. Ahuja, “Robust Orthonormal Subspace Learning: Efficient Recovery of Corrupted Low-Rank Matrices,” in 2014 IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, USA, June 23-28, 2014 (IEEE Press, New York, 2014), pp. 3874-3881.
    doi 10.1109/CVPR.2014.495
  31. L. Li, J. McCann, N. S. Pollard, and C. Faloutsos, “DynaMMo: Mining and Summarization of Coevolving Sequences with Missing Values,” in Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Paris, France, June 28-July 1, 2009 (ACM Press, New York, 2009), pp. 507-516.
    doi 10.1145/1557019.1557078
  32. M. Khayati, P. Cudré-Mauroux, and M. H. Böhlen, “Scalable Recovery of Missing Blocks in Time Series with High and Low Cross-Correlations,” Knowl. Inf. Syst. 62 (6), 2257-2280 (2020).
    doi 10.1007/s10115-019-01421-7
  33. D. Zhang and L. Balzano, “Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation,” in Proc. 19th Int. Conf. on Artificial Intelligence and Statistics, Cadiz, Spain, May 9-11, 2016. Volume 51, 1460-1468 (2016).
    http://proceedings.mlr.press/v51/zhang16b.pdf . Cited June 15, 2023.
  34. R. Mazumder, T. Hastie, and R. Tibshirani, “Spectral Regularization Algorithms for Learning Large Incomplete Matrices,” J. Mach. Learn. Res. 11, Article Number 80, 2287-2322 (2010).
    https://www.jmlr.org/papers/volume11/mazumder10a/mazumder10a.pdf . Cited June 15, 2023.
  35. O. Troyanskaya, M. Cantor, G. Sherlock, et al., “Missing Value Estimation Methods for DNA Microarrays,” Bioinformatics 17 (6), 520-525 (2001).
    doi 10.1093/bioinformatics/17.6.520
  36. J. Mei, Y. de Castro, Y. Goude, and G. Hébrail, “Nonnegative Matrix Factorization for Time Series Recovery from a Few Temporal Aggregates,” in Proc. 34th Int. Conf. on Machine Learning, Sydney, Australia, August 6-11, 2017. Volume 70, 2382-2390 (2017).
    https://dl.acm.org/doi/10.5555/3305890.3305927 . Cited June 15, 2023.
  37. H.-F. Yu, N. Rao, and I. S. Dhillon, “Temporal Regularized Matrix Factorization for High-Dimensional Time Series Prediction,” in Proc. Annual Conf. on Neural Information Processing Systems, Barcelona, Spain, December 5-10, 2016.
    https://dl.acm.org/doi/abs/10.5555/3157096.3157191 . Cited June 15, 2023.
  38. B. D. Minor, J. R. Doppa, and D. J. Cook, “Learning Activity Predictors from Sensor Data: Algorithms, Evaluation, and Applications,” IEEE Trans. Knowl. Data Eng. 29 (12), 2744-2757 (2017).
    doi 10.1109/TKDE.2017.2750669