Top Links
Journal of Advances in Radiology and Medical Imaging
ISSN: 2456-5504
Comparative Assessment of Chest X-ray Interpretations by AI Model and Radiologist Vs Pulmonologist in Predicting the Clinical Status of Covid-19 Pneumonia Patients
Copyright: © 2023 Usama Albastaki. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at Pubmed, Google Scholar
Pneumonia is most prevalent and acute respiratory disorder. Chest radiography is the gold standard to confirm the clinical condition and the progress. The use of AI in diagnostic workflow proved to be useful. In this study, quantitative assessment by the AI device is compared with the qualitative assessment by the radiologists.
Methods:This retrospective cohort study involved 100 patients and 535 chest radiographs of the admitted COVID-19 pneumonia. The patients selected with at least one follow-up chest x-ray. The level of agreement between radiologists & the software were defined, and the patients are classified into three categories: deteriorating, stable, and improving. The minor criteria include respiratory rate ≥ 30 breaths/min, PaO2/FiO2 ratio ≤ 250, t count < 100,000/μl, hypothermia (core temperature < 36°C), hypotension and the major criteria include septic shock, respiratory failure, infection. The other possible co-morbid or demographic factors like age, gender, body mass index, cigarette smoking status, patient’s history of diabetes, chronic obstructive pulmonary disease (COPD), cerebrovascular disease, etc. are also analyzed to further subdivide the groups and study the impact of them on the patients.
Results:Among the patients in this study, 51% had diabetes, 43% had a history of cardiovascular disease and 44.21% were obese. 17 patients had other respiratory complications at baseline.. No obvious trend is noted, and software-radiologist had higher agreement similar to analysis by visit. Analysis of factors related to death showed only age to be associated with mortality. Patients who died were significantly older (59.25 vs 48.26; p 0.001).
Conclusion:The higher agreement was found between the radiologist and the software. The obtained agreement was also consistent even across the subgroups, only patient age appears to be associated with high-risk mortality. Obesity and diabetes were independent predictors of mortality with susceptibility to develop acute respiratory failure and pulmonary embolism.
Keywords: Chest X-Ray; AI; Pneumonia
Pneumonia is the most prevalent and acute respiratory tract disorder which often emerge as a mild congestion and lead to irreversible lung damage in several cases with significantly high mortality and morbidity rate [1, 2]. The clinical manifestation of pneumonia is similar to other inflammatory infections like common cold or flu and it is usually diagnosed at a much later stage, leading to poor treatment effectiveness and patient outcomes [3, 4]. If not diagnosed in time, the persistent infection of lungs can also increase the fluid accumulation in alveolar air cavities and block the distal bronchial airways, expediting dyspnoea and progressive respiratory failure [5]. Therefore, for early diagnosis and disease management, chest radiograph is often considered as the gold standard to confirm the clinical condition and progress in patients with suspected respiratory disorders. The diagnostic precision can be improved by incorporating the non-specific radiological findings of opacities with the clinical signs of pulmonary infiltrations. Further differentiating the bacterial and viral etiology of the disease can prevent the unnecessary exposure of antibiotics on suspected patients with viral pneumonia [6, 7]. However, in case of low resource settings, the rising population and scarcity of qualified radiologists can lead to heavy radiographic workload, delayed interpretation and discrepancies related to subjective radiographic conclusions [8].
To tackle these shortcomings and improve the diagnostic accuracy of pneumonia, the adoption of Artificial Intelligence (AI) based deep learning analysis and integrating it into the diagnostic workflow has been proven helpful. AI implementation can standardize the reporting workflow, minimize diagnostic errors, and prioritize the critically ill patients in emergency settings [9]. In addition to that, the deep learning-based pneumonia classification has shown greater accuracy of 98.7% along with high sensitivity (0.99) and specificity (0.98) [10]. The presence of pulmonary opacities, airspace consolidation, white spots, nodules, lung lesions, cavitation, etc. in the chest X-rays can also quantitatively indicate pneumonia [11]. In this retrospective cohort study, the accuracy of the AI-based qXR v2.1 algorithm (known to be able of detecting cardiomegaly, pleural effusions, hilar enlargement, cavities and nodules, calcification, etc. [12]) to monitor the disease progress has been determined. Quantitative localization of the pulmonary opacities by AI device are compared with the qualitative interpretations performed by radiologists. This study would also clearly collate whether the interpretations by AI and the radiologist are in agreement with the patient’s clinical status.
This retrospective, longitudinal, and observational study used a convenient sample of 535 chest X-rays from 100 patients. The case selection in this study utilized only COVID 19 pneumonia patients with at least of one follow-up chest X-ray or serial chest X-rays taken along with their complete medical record containing the patient progress over time. The patient’s chest X-rays and medical records are obtained from the hospital’s Picture Archiving and Communicating System (PACS) and electronic health records or case sheets respectively between 30th March 2020 to 16th June 2020. The clinical and radiographic assessments were determined by pulmonologists and radiologists with several years of experience respectively. The qXR v2.1 software with CE class II approval was employed for AI based radiographic interpretation. The qXR v2.1. software accepts and processes de-identified input data in the DICOM format (.dcm) satisfying its compatibility criteria.
Imaged were received from the local PACS to on a standlone onsite system where AI was integrated. The CXR was processed by AI and a secondary capture output is seen alongside the original CXR demonstrating the findings the AI has found.
The images were conveniently acquired based on the below inclusion and exclusion criteria. Covid+Pneumonia CXR were selected as part of this study.-
The inclusion criteria used in the study are based on the following specifications.
Exclusion Criteria
CXR with significant artefact and poor quality were excluded from this study.
The level of agreement between radiologists & qXR with pulmonologist for predicting the clinical progress of patients were assessed in this study.To note, the clinical data of patients and the qXR interpretation was not provided to the radiologists during the time of assessment to avoid incorporation bias. In this study, to check for positive and negative agreements, a set of minor and major validation criteria are defined, and the patients are classified into three categories: deteriorating, stable, and improving. The minor criteria include respiratory rate ≥ 30 breaths/min, PaO2/FiO2 ratio ≤ 250, t count < 100,000/μl, hypothermia (core temperature < 36°C), hypotension requiring aggressive fluid resuscitation and the major criteria include septic shock with the need for vasopressors, respiratory failure requiring mechanical ventilation, infection that are not chemotherapy-induced.
The other possible co-morbid or demographic factors like age, gender, body mass index, cigarette smoking status, patient’s history of diabetes, chronic obstructive pulmonary disease (COPD), cerebrovascular disease, etc. are also analyzed to further subdivide the groups and study the impact of them on the clinical progress of the patients.
Descriptive statistics like number, percentage, mean, and standard deviation are used to summarize the patient population- demographics, co-morbid conditions, and clinical course of the patients in this study. Overall agreements between modalities are reported and the 95% confidence was constructed using modified Wilson score method. When exploring whether agreement is better in certain subset of patients and the type of disagreements, the dependent variables (qXR, clinical and radiologist call) were pooled from across all visits.
The study protocol was reviewed and approved by Institutional Scientific Research Ethics Committee (DSREC-09/2021_28).
100 patients with COVID pneumonia were included in the study. The mean age of the patients was 49.58 (11.84) years and majority (91%) were males. About half of the patients in this study had a significant co-morbidity (51% had diabetes and 43% had a history of cardiovascular disease). Of the 95 patients records with BMI, 42 (44.21%) were obese. 17 patients had other respiratory complications at baseline.
The patients with at least one follow-up chest X-ray or serial chest X-rays taken along with their complete medical record were conveniently chosen in this study to correlate the radiological signs with the patient's progress over time. This study also confirms the capability of software analysis system to predict the status of the disease through multiple follow ups and by evaluating the clinical status of admitted patients.
The mean duration of follow-up after the first chest X-ray was 35.35 (29.25) days. All patients had at least one follow-up CXR with a minimum follow-up duration of 1 day and a maximum of 159 days. The median and mid-spread interquartile range (IQR) values are 28 days and 24 days respectively. 28 patients required intubation and 12 patients died. Number of patients with two, three, four, five, six, and seven follow-up CXR were 97, 96, 82, 66, 55, and 41 respectively. The overall agreement between radiologist, and qXR with pulmonologist on patient’s clinical status and the corresponding 95% confidence interval across visits are shown in Figure 1.
From the figure 1, in all but one visit, qXR-radiologist had a higher agreement than either of the observer agreement with clinical observer. Though this difference is not significant, this was expected, and lack of significance is probably due to small number of observations in each visit.
Table 2 shows the agreement between the three observers (pulmonologist, radiologist, and qXR) in different subset of the patients. No obvious trend is noted, and qXR-radiologist had higher agreement similar to analysis by visit. 121 (42.3%) of the 286 incorrect agreements of qXR with clinical were stable-improving misclassification. This number was slightly lower for radiologists, i.e.,104 of 331 incorrect calls (31.4%). Analysis of factors related to death showed only age to be associated with mortality. Patients who died were significantly older (59.25 vs 48.26; p 0.001).
The author discussed the level of agreement between the AI model (qXR), radiologist, and pulmonologist in predicting the clinical status of the patients. As the radiologists did not have access to the clinical symptoms of the patients, correlation with clinical data was not allowed.
Hence, the usefulness of X-rays to evaluate the clinical status in patients with pneumonia will be difficult as the correlation between the radiographic resolution and symptom resolution is not linear.
The qXR software did show precise and consistent agreement to that of the radiologists especially in the case of patients with deteriorating clinical status. In addition to identifying the size and number of opacities and consolidation, other interstitial X-ray markings related to pleural effusion or pneumothorax can also determine clinical deterioration and further improve diagnostic accuracy of respiratory syndromes.
The clinical data of the patients was not given to the radiologists while assessment and this may be the reason for higher agreement between the qXR and radiologists. Observers who use only radiographic information (qXR and radiologists) had a low to moderate agreement with observers who use radiographic and clinical data. However, higher agreement was noticed between qXR and radiologists. In prior studies using qXR [13], the reported accuracy and agreement between the algorithm and single radiologist is much higher than what observed in this study. Keeping the radiologist as reference standard, the overall sensitivity of qXR in detecting pulmonary abnormality was about 0.879 (95% confidence interval: 0.867–0.889). The obtained agreement was also consistent even across the subgroups like age, gender, etc.
Among the various demographic factors considered in the study like BMI, DM, CVD, obesity, etc., only patient age appears to be associated with high-risk mortality or death. Low number of events and the small sample size would be the reason for not observing such associations. However, obesity and diabetes seem to be independent predictors of mortality with worsening clinical outcomes and susceptibility to develop acute respiratory failure and pulmonary embolism, especially in COVID 19 pneumonia patients [14, 15].
The disagreements between the qXR/ radiologist with the pulmonologist can be explained for the following reasons.
Few disagreements like miscalculating stable for improving or vice versa does not lead to any patient risks or complications, and so though it is incorrect, they can be considered. Then, if these stable- improving miscalculations (42.3% for qXR and 31.4% for radiologists) are considered as correct calls, the overall agreement between qXR/radiologist with pulmonologist would increase substantially. However, misinterpreting a stable patient to be deteriorating or vice versa can be an unlikely and unacceptable situation. Similarly, misreading or ruling out the extreme deteriorating situations by providing improving interpretation is also highly disagreeable.
The qXR software show precise and consistent agreement to that of the radiologists especially in the case of patients with deteriorating clinical status. This study also confirmed the capability of software analysis system to predict the status of the disease through multiple follow ups and by evaluating the clinical status of admitted patients. In addition to identifying the size and number of opacities and consolidation, other interstitial X-ray markings related to pleural effusion or pneumothorax can also determine clinical deterioration and further improve diagnostic accuracy of respiratory syndromes.
We would like to thank the efforts of Dr. Marwan Abdelrahim Abdelghafar Zidan for his support in providing the statistical framework for this project and We would like to Acknowledge Mr Tariq Fazal Urahman and Manoj Madhavan Nair for their technical support
Figure 1: Overall agreement across visits |
Investigation by |
Patient Status or Condition |
||
Deteriorating |
Stable |
Improving |
|
Pulmonologist (Clinical Aspect) |
Use one major or >=3 minor criteria for validation Or Presence of 1 major criterion or > 3 minor criteria indicate patient deterioration |
Use one major or >=3 minor criteria for validation Or Absence of any major or minor criteria indicate patient stability |
Use one major or >=3 minor criteria for validation |
Radiologist |
Presence of radiological signs of pneumonia, i.e., opacity, consolidation, and white spots indicate patient deterioration and disease progress. Clinical status is also considered. |
Absence of radiological signs of pneumonia, i.e., opacity, consolidation, and white spots indicate patient stability |
Disappearing radiological signs of pneumonia, i.e., opacity, consolidation, and white spots indicate patient improvement |
qXR v2.1 |
Opacity and Consolidation Score: +10% indicate patient deterioration |
Opacity and Consolidation Score: +/-10% indicate patient stability |
Opacity and Consolidation Score: -10% indicate patient improvement |
Characteristics |
qXR-clinical |
Radiologist-clinical |
qXR-radiologist |
Age |
|
|
|
< 50 years (n = 311) |
47.91 (42.41 - 53.45) |
47.91 (42.41 - 53.45) |
56.91 (51.36 - 62.30) |
> 50 years (n = 224) |
44.64 (38.28 - 51.19) |
48.21 (41.75 - 54.73) |
60.71 (54.18 - 66.88) |
Sex |
|
|
|
Male (n= 490) |
46.32 (41.95 - 50.75) |
48.77 (44.38 - 53.19) |
58. 57 (54.15 - 62.84) |
Female (n=45) |
48.89 (34.95 - 62.99) |
40 (27.02 - 54.54) |
57.78 (43.30 - 71.03) |
CVD |
|
|
|
No (n=292) |
48.29 (42.61 - 54.00) |
48.63 (42.95 - 54.34) |
54.45 (48.71 - 60.06) |
Yes (n=243) |
44.44 (38.33 - 50.73) |
47.32 (41.13 - 53.6) |
63.37 (57.15 - 69.18) |
DM |
|
|
|
No (n=256) |
44.14 (38.18 - 50.26) |
49.21 (43.15 - 55.30) |
57.03 (50.9 - 62.94) |
Yes (n=279) |
48.74 (42.93 - 54.59) |
46.95 (41.17 - 52.81) |
59.86 (54 - 65.43) |
Obese |
|
|
|
No (n=331) |
45.01 (39.74 - 50.40) |
51.05 (45.69 - 56.40) |
61.32 (55.98 - 66.41) |
Yes (n=187) |
48.13 (41.08 - 55.25) |
41.71 (34.88 - 48.87) |
51.34 (44.21 - 58.40) |
Other respiratory disease |
|
|
|
No (n=452) |
46.42 (41.71 - 51.20) |
49.28 (44.53 - 54.05) |
59.04 (54.28 - 63.64) |
Yes (n=83) |
49.1 (39.93 - 58.30) |
43.63 (34.73 - 52.96) |
56.36 (47.03 - 65.26) |
Intubation |
|
|
|
No (n=350) |
46.57 (41.41 - 51.80) |
48.28 (43.09 - 53.51) |
59.42 (54.20 - 64.44) |
Yes (n=185) |
46.49 (39.44 - 53.67) |
47.56 (40.49 - 54.74) |
56.75 (49.55 - 63.68) |
Death |
|
|
|
No (n=452) |
46.90 (42.34 - 51.51) |
47.56 (43.00 - 52.17) |
58.40 (53.81 - 62.86) |
Yes (n=83) |
44.58 (34.36 - 55.27) |
50.60 (40.06 - 61.09) |
59.03 (48.28 - 68.98) |