DOI: https://doi.org/10.26089/NumMet.v25r328

Applying regularization to calculate split criterion for survival models

Authors

  • Iulii A. Vasilev
  • Mikhail I. Petrovskiy
  • Igor V. Mashechkin

Keywords:

survival analysis
informative censoring
splitting criteria
regularization

Abstract

Survival analysis methods solve the problem of describing and predicting events. Models account for cases of censoring in which the true time of the event is unknown due to the withdrawal of the observation from the study. Statistical methods assume that censoring is uninformative and there is no relationship between the reason for the observation withdrawal and the study. This paper investigates the effect of informativeness on the performance of statistical methods. In particular, the log-rank criterion is used to compare hazard functions and has low sensitivity in the case of small samples or multimodal event time distribution. To overcome the shortcomings, we propose a method to compute regularized criteria that use a priori information about the distribution of events over time and evaluate the differences between risk functions for all time points. The regularization method was integrated into the survival tree method and resulted in improved prediction quality on four medical datasets. Also, the proposed method outperformed the existing statistical methods and survival tree realization on all datasets.


Published

2024-09-19

Issue

Section

Methods and algorithms of computational mathematics and their applications

Author Biographies

Iulii A. Vasilev

Lomonosov Moscow State University,
Faculty of Computational Mathematics and Cybernetics
Department of Intelligent Information Technologies
• First Category Mathematician

Mikhail I. Petrovskiy

Lomonosov Moscow State University,
Faculty of Computational Mathematics and Cybernetics
Department of Intelligent Information Technologies
• Associate Professor

Igor V. Mashechkin

Lomonosov Moscow State University,
Faculty of Computational Mathematics and Cybernetics
Department of Intelligent Information Technologies
• Professor


References

  1. S. Gilboa, Y. Pras, A. Mataraso, et al., “Informative Censoring of Surrogate End-Point Data in Phase 3 Oncology Trials,” Eur. J. Cancer 153, 190-202 (2021).
    doi 10.1016/j.ejca.2021.04.044
  2. A. J. Turkson, F. Ayiah-Mensah, and V. Nimoh, “Handling Censoring and Censored Data in Survival Analysis: A Standalone Systematic Literature Review,” Int. J. Math. Math. Sci. 2021 (1), Article Number 9307475 (2021).
    doi 10.1155/2021/9307475
  3. A. J. Templeton, E. Amir, and I. F. Tannock, “Informative Censoring -- a Neglected Cause of Bias in Oncology Trials,” Nat. Rev. Clin. Oncol. 17 (6), 327-328 (2020).
    doi 10.1038/s41571-020-0368-0
  4. W. A. Knaus, F. E. Harrell, J. Lynn, et al., “The SUPPORT Prognostic Model: Objective Estimates of Survival for Seriously Ill Hospitalized Adults,” Ann. Intern. Med. 122 (3), 191-203 (1995).
    doi 10.7326/0003-4819-122-3-199502010-00007
  5. L. Yan, H.-T. Zhang, J. Goncalves, et al., “An Interpretable Mortality Prediction Model for COVID-19 Patients,” Nat. Mach. Intell. 2 (5), 283-288 (2020).
    doi 10.1038/s42256-020-0180-7
  6. P. Royston and P. C. Lambert, Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model (Stata Press, College Station, 2011).
  7. M. C. Castelijns, M. A. G. Helmink, S. H. J. Hageman, et al., “Cohort Profile: the Utrecht Cardiovascular Cohort-Second Manifestations of Arterial Disease (UCC-SMART) Study-an Ongoing Prospective Cohort Study of Patients at High Cardiovascular Risk in the Netherlands,” BMJ Open 13 (2), Article Number e066952 (2023).
    doi 10.1136/bmjopen-2022-066952
  8. D. M. Hawkins, “Quantile-Quantile Methodology - Detailed Results,”
    https://arxiv.org/abs/2303.03215 . Cited September 13, 2024.
  9. H. D. Nguyen, “A Two-Sample Kolmogorov-Smirnov-like Test for Big Data,” in Proc. Data Mining: 15th Australasian Conf. (AusDM 2017), Melbourne, Australia, August 19-20, 2017.
    doi 10.1007/978-981-13-0292-3_6.
    https://espace.library.uq.edu.au/view/UQ: f921d22 . Cited September 13, 2024
  10. E. L. Kaplan and P. Meier, “Nonparametric Estimation from Incomplete Observations,” J. Am. Stat. Assoc. 53 (282), 457-481 (1958).
    doi 10.2307/2281868
  11. O. O. Aalen, O. Borgan, and H. K. Gjessing, Survival and Event History Analysis: A Process Point of View (Springer, New York, 2008).
    doi 10.1007/978-0-387-68560-1
  12. D. R. Cox, “Regression Models and Life-Tables,” J. R. Stat. Soc. Ser. B Methodol. 34 (2), 187-202 (1972).
    doi 10.1111/j.2517-6161.1972.tb00899.x
  13. L. J. Wei, “The Accelerated Failure Time Model: A Useful Alternative to the Cox Regression Model in Survival Analysis,” Stat. Med. 11 (14-15), 1871-1879 (1992).
    doi 10.1002/sim.4780111409
  14. A. Shimokawa, Y. Kawasaki, and E. Miyaoka, “Comparison of Splitting Methods on Survival Tree,” Int. J. Biostat. 11 (1), 175-188 (2015).
    doi 10.1515/ijb-2014-0029
  15. L. Gordon and R. A. Olshen, “Tree-Structured Survival Analysis,” Cancer Treat. Rep. 69 (10), 1065-1069 (1985).
  16. S.-H. Lee, “Weighted Log-Rank Statistics for Accelerated Failure Time Model,” Stats 4 (2), 348-358 (2021).
    doi 10.3390/stats4020023
  17. S. Buyske, R. Fagerstrom, and Z. Ying, “A Class of Weighted Log-Rank Tests for Survival Data when the Event is Rare,” J. Am. Stat. Assoc. 95 (449), 249-258 (2000).
    doi 10.1080/01621459.2000.10473918
  18. S. B. Kotsiantis, “Decision Trees: A Recent Overview,” Artif. Intell. Rev. 39 (4), 261-283 (2013).
    doi 10.1007/s10462-011-9272-4
  19. M. Leblanc and J. Crowley, “Survival Trees by Goodness of Split,” J. Am. Stat. Assoc. 88 (422), 457-467 (1993).
    doi 10.1080/01621459.1993.10476296
  20. V. G. Costa and C. E. Pedreira, “Recent Advances in Decision Trees: An Updated Survey,” Artif. Intell. Rev. 56 (5), 4765-4800 (2023).
    doi 10.1007/s10462-022-10275-5
  21. F. E. Harrell, K. L. Lee, and D. B. Mark, “Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors,” Stat. Med. 15 (4), 361-387 (1996).
    doi 10.1002/(SICI)1097-0258(19960229)15: 4<361: : AID-SIM168>3.0.CO;2-4
  22. P. J. Heagerty and Y. Zheng, “Survival Model Predictive Accuracy and ROC Curves,” Biometrics 61 (1), 92-105 (2005).
    doi 10.1111/j.0006-341X.2005.030814.x
  23. H. Hung and C.-T. Chiang, “Estimation Methods for Time-Dependent AUC Models with Survival Data,” Can. J. Stat. 38 (1), 8-26 (2010).
    doi 10.1002/cjs.10046
  24. J. Lambert and S. Chevret, “Summary Measure of Discrimination in Survival Models Based on Cumulative/Dynamic Time-Dependent ROC Curves,” Stat. Methods Med. Res. 25 (5), 2088-2102 (2016).
    doi 10.1177/0962280213515571
  25. I. Vasilev, M. Petrovskiy, and I. Mashechkin, “Sensitivity of Survival Analysis Metrics,” Mathematics 11 (20), Article Number 4246 (2023).
    doi 10.3390/math11204246
  26. A. H. Murphy, “A New Vector Partition of the Probability Score,” J. Appl. Meteorol. Climatol. 12 (4), 595-600 (1973).
    doi 10.1175/1520-0450(1973)012<0595: ANVPOT>2.0.CO;2
  27. H. Haider, B. Hoehn, S. Davis, and R. Greiner, “Effective Ways to Build and Evaluate Individual Survival Distributions,” J. Mach. Learn. Res. 21 (1), Article Number 85, 3289-3351 (2020).
  28. A. Avati, T. Duan, S. Zhou, et al., “Countdown Regression: Sharp and Calibrated Survival Predictions,” in Proc. 35th Uncertainty in Artificial Intelligence Conf. PMLR, 2020.
    https://proceedings.mlr.press/v115/avati20a.html . Cited September 13, 2024.
  29. T. R. Fleming, D. P. Harrington, and M. O’sullivan, “Supremum Versions of the Log-Rank and Generalized Wilcoxon Statistics,” J. Am. Stat. Assoc. 82 (397), 312-320 (1987).
    doi 10.1080/01621459.1987.10478435
  30. S.-H. Lee, “On the Versatility of the Combination of the Weighted Log-Rank Statistics,” Comput. Stat. Data Anal. 51 (12), 6557-6564 (2007).
    doi 10.1016/j.csda.2007.03.006
  31. I. Vasilev, M. Petrovskiy, and I. Mashechkin, “Survival Analysis Algorithms Based on Decision Trees with Weighted Log-Rank Criteria,” in Proc. 11th Int. Conf. on Pattern Recognition Applications and Methods (ICPRAM), Online, February 3-5, 2022.
    doi 10.5220/0000155500003122
  32. I. J. Good, “Weight of Evidence: A Brief Survey,” Bayesian Stat. 2, 249-270 (1985).
  33. , “What is the Bonferroni Correction?’’
    https://docs.ufpr.br/ giolo/LivroADC/Material/S3_Bonferroni Cited September 15, 2024.
  34. P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-Validation,” Encyclopedia of Database Systems (Springer, Boston, 2009), pp. 532-538.
    doi 10.1007/978-0-387-39940-9_565
  35. S. Pölsterl, “scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn,” J. Mach. Learn. Res. 21 (1), Article Number 212, 8747-8752 (2020).
  36. C. Davidson-Pilon, “Lifelines: Survival Analysis in Python,” J. Open Source Softw. 4 (40), Article Number 1317 (2019).
    doi 10.21105/joss.01317
  37. I. A. Vasilev, “Developing Library of Tree-Based Models for Survival Analysis,” Vestn. Mosk. Univ., Ser. 15: Vychisl. Mat. Kibern., No. 3, 60-72 (2024).