Machine Learning for In-hospital Mortality Prediction in Critically Ill Patients With Acute Heart Failure: A Retrospective Analysis Based on the MIMIC-IV Database

Acute heart failure (AHF) remains a critical global health issue due to its high mortality and hospitalization rates, especially in patients admitted to intensive care units (ICUs). Leveraging the power of artificial intelligence, researchers from various hospitals in China have applied machine learning (ML) methods to improve the prediction of in-hospital mortality in AHF patients. Their work, published in the Journal of Cardiothoracic and Vascular Anesthesia, demonstrates the superiority of the XGBoost algorithm over conventional clinical scoring systems.

Study Design and Objectives

The retrospective study utilized the Medical Information Mart for Intensive Care IV (MIMIC-IV) database, encompassing ICU admissions at Beth Israel Deaconess Medical Center from 2008 to 2019. A total of 5,114 adult AHF patients were included, with in-hospital mortality observed in 12.5% of cases. Patients were split into a training set (70%) and a validation set (30%).

The main objective was to develop predictive models using multiple machine learning algorithms and identify the one with the highest performance. The study also compared the predictive accuracy of ML models against traditional ICU severity scores such as SOFA, LODS, and SAPS II.

Methodology and Data Processing

Data extraction focused on the first 24 hours of ICU admission, collecting 143 clinical and laboratory variables. Variables with more than 30% missing data were excluded, and those with less were imputed using multiple imputation techniques in R. Eventually, 54 features were retained for analysis.

To reduce overfitting and select the most relevant predictors, Least Absolute Shrinkage and Selection Operator (LASSO) regression was used. This technique identified 18 predictive features, including age, white blood cell count (WBC_min), anion gap, blood urea nitrogen (BUN), international normalized ratio (INR), urine output, vital signs, and use of vasopressor drugs like norepinephrine.

Model Development and Performance Evaluation

Five widely used ML algorithms were tested: Logistic Regression (LR), K-nearest neighbor (KNN), Decision Tree (DT), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). Their performance was assessed using metrics such as accuracy, sensitivity, specificity, F1-score, and the area under the receiver operating characteristic curve (AUC).

Among these, the XGBoost model performed best with an AUC of 0.82 and was chosen as the final predictive model. It surpassed traditional ICU scoring systems, which had AUCs of 0.70 (SOFA), 0.75 (SAPS II), and 0.78 (LODS). XGBoost also provided better clinical net benefit as confirmed by decision curve analysis.

Key Predictors of Mortality

The top ten predictors identified by the XGBoost model were:

  1. Norepinephrine use
  2. Urine output
  3. Age
  4. WBC_min
  5. SpO2_min (oxygen saturation)
  6. Respiratory rate (RR_mean)
  7. BUN_min
  8. Systolic blood pressure (SBP_mean)
  9. Partial thromboplastin time (PTT_min)
  10. Heart rate (HR_mean)

These indicators reflect a mix of systemic inflammation, organ perfusion, and hemodynamic instability, all of which are known to influence AHF outcomes.

Discussion and Clinical Implications

This study reinforces the utility of machine learning in high-stakes clinical decision-making. XGBoost, with its capability to handle nonlinear relationships and missing data, provides a robust alternative to traditional statistical models. It is particularly well-suited for ICU environments where rapid and precise risk stratification is essential.

Moreover, the findings emphasize the prognostic importance of age, inflammation markers, and vasopressor usage. For instance, norepinephrine emerged as the top predictor, likely reflecting severe cardiovascular compromise in patients needing pharmacologic support.

Limitations and Future Directions

Despite its strengths, the study acknowledges several limitations:

  • The retrospective nature introduces potential selection bias.
  • The single-center MIMIC-IV database limits external generalizability.
  • Multiple imputation may not fully reflect true values.
  • As with many ML models, interpretability remains limited (“black box”).

Future studies should validate the model across diverse populations and clinical settings, and work on enhancing model explainability for real-world implementation.

Conclusion

In conclusion, this research provides compelling evidence that machine learning—particularly the XGBoost algorithm—can significantly enhance in-hospital mortality prediction in ICU patients with AHF. This advancement could aid clinicians in implementing early interventions, optimizing resource allocation, and ultimately reducing mortality rates in a highly vulnerable population.

4
(High quality) Strong retrospective cohort study with ML models and large sample size, but lacking external validation and randomization.