INTEGRATING CO-CLUSTERING AND INTERPRETABLE MACHINE LEARNING FOR THE PREDICTION OF INTRAVENOUS IMMUNOGLOBULIN RESISTANCE IN KAWASAKI DISEASE

Abstract

Identifying intravenous immunoglobulin-resistant patients is essential for the prompt and optimal treatment of Kawasaki disease, suggesting the need for effective risk assessment tools. Data-driven approaches have the potential to identify the high risk individuals by capturing the complex patterns of real world data. To enable clinically applicable prediction of intravenous immunoglobulin resistance addressing the incompleteness of clinical data and the lack of interpretability of machine learning models, a multistage method is developed by integrating data missing pattern mining and intelligible models. First, coclustering is adopted to characterize the block-wise data missing patterns by simultaneously grouping the clinical features and patients to enable (a) group-based feature selection and missing data imputation and (b) patient subgroup specific predictive models considering the availability of data. Second, feature selection is performed using the group Lasso to uncover group specific risk factors. Third, the Explainable Boosting Machine, which is an interpretable learning method based on generalized additive models, is applied for the prediction of each patient subgroup. The experiments using real-world Electronic Health Records demonstrate the superior performance of the proposed framework for predictive modeling compared with a set of benchmark methods. This project highlights the integration of co-clustering and supervised learning methods for incomplete clinical data mining, and promotes data-driven approaches to investigate predictors and effective algorithms for decision making in healthcare.

Let's Talk