Matteo Bonvini, Daniele Ramazzotti, Robert Stretch, Leo Celi, and Aaron Kaufman
Abstract: Lack of access to pre-admission data (including laboratory values and vital signs) contributes to difficulty estimating illness severity in newly admitted ICU patients. Baseline serum creatinine is an important example: the value is commonly unknown, yet it informs many decisions made by intensivists. This study evaluated methods of imputing baseline creatinine values using demographic variables and laboratory data on hospital and ICU admission. Data on patients admitted to ICUs at Beth Israel Deaconess Medical Center (BIDMC) from 2002-2012 was extracted from MIMIC III. Baseline creatinine values (obtained outpatient 2 to 365 days prior to admission) were available for patients receiving care at BIDMC clinics. Three methods of imputing missing baseline creatinine values were evaluated: predictive mean matching (PMM), classification and regression trees (CART) and Bayesian normal linear models (NLM). Inputs were age, gender, race, Elixhauser comorbidities, OASIS, Angus criteria, and creatinine on hospital and ICU admission. Patterns of missingness for baseline creatinine values were simulated as missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). Imputation methods were evaluated using two metrics: standardized root mean squared error (RMSE) and detection of increases in serum creatinine of 0.3 mg/dL from baseline. Among 37292 cases, 7085 (19%) had known baseline creatinine values. PMM, CART and NLM performed similarly in terms of RMSE when data was MCAR (0.22 for all) or MAR (0.13, 0.10, 0.13 respectively), whereas NLM was superior with MNAR data (0.40, 0.40, 0.31 respectively). The distribution of covariates differed between cases with and without known baseline values so MAR was deemed most appropriate. NLM exhibited the best sensitivity and specificity in detecting increases of 0.3 mg/dL from baseline creatinine under the MAR assumption (Sn 0.72, Sp 0.79).