Abstract
Objective To assess the variation in bleeding risk estimates and risk stratification among Web and mobile applications for patients with atrial fibrillation.
Design Cross-sectional study.
Setting Simulated patient population.
Participants Hypothetical patient cohorts that encompassed all possible binary risk factor combinations for each clinical prediction model.
Interventions Twenty-five bleeding risk calculators (18 Web and 7 mobile apps), each of which used 1 of 4 clinical prediction models to predict an individual’s 12-month bleed risk: ATRIA (Anticoagulation and Risk Factors in Atrial Fibrillation), HAS-BLED (hypertension [systolic blood pressure >160 mm Hg], abnormal renal or liver function, stroke [caused by bleeding], bleeding, labile international normalized ratio, elderly [age >65 years], drugs [acetylsalicylic acid or nonsteroidal anti-inflammatory drugs] or alcohol [≥8 drinks per week]), HEMORR2HAGES (hepatic or renal disease, ethanol abuse, malignancy, older [age >75 years], reduced platelet count or function, rebleeding risk [history of past bleeding], hypertension [uncontrolled], anemia, genetic factors, excessive fall risk, and stroke), and mOBRI (modified Outpatient Bleeding Risk Index).
Main outcome measures Four simulated cohorts were constructed. The coefficient of variation, relative difference (RD), and 95% CI for annual bleeding risk estimates were calculated for all hypothetical patient cohorts. Additionally, pairwise agreement between calculators across low- (<10%), moderate- (10% to 20%), and high-risk (>20%) categories of patients was determined.
Results The risk estimates the calculators generated were imprecise, with coefficients of variation ranging from 14% for HEMORR2HAGES to 64% for mOBRI. Wide variation was observed in annual risk estimates for calculators using the mOBRI (maximum RD=4.3) and HAS-BLED (maximum RD=3.1) models. The 95% CI of mean annual bleeding risk varied among models; 1 calculator using the HAS-BLED model had a 95% CI of mean annual risk estimates of 5.4% to 6.2%, while another HAS-BLED calculator reported a 95% CI of 17.7% to 18.5%. Concordance for risk category stratification among calculators was high for those based on mOBRI and ATRIA (=1 for both). Poor agreement was observed in 1 calculator using HEMORR2HAGES (
=0.54) and another using HAS-BLED (
range=-0.11 to 0.35).
Conclusion Inconsistencies and a lack of precision were observed in annual risk estimates and risk stratification produced by Web and mobile bleeding risk calculators for patients with atrial fibrillation. Clinicians should refer to annual bleeding risks observed in major randomized controlled trials to inform risk estimates communicated to patients.
Close to one-third of Canadian adults (32%) have one or more applications downloaded on their mobile devices that they use to monitor their health, according to a 2017 survey.1 The widespread use of Web and mobile health care applications, along with the greater integration of artificial intelligence and machine learning in software with a medical purpose, prompted Health Canada and the US Food and Drug Administration (FDA) to develop guidance documents outlining the regulation of software as a medical device.2–4 Health Canada and the FDA recognize that software, including Web and mobile health applications, should be regulated given the potential risk of harm if clinicians and patients use such software for diagnosing and treating a disease or for driving and informing therapeutic decisions. One such example of frequently used Web and mobile health care applications are risk stratification or risk assessment calculators that provide users (clinicians or patients) with estimates of their future risk of a health care event, such as a major cardiovascular event, a fragility fracture, thrombosis, or bleeding.5–8 These risk calculators are based on mathematical models known as clinical prediction models, and they are intended to guide or inform therapeutic decision making.
Clinical prediction models that estimate the risk of major bleeding in patients with atrial fibrillation (AF) have been integrated into Web and mobile health applications and are easily accessible to anyone with an Internet connection. These risk prediction models perform poorly outside of their derivation cohorts; however, ease of use has contributed to their ongoing implementation in clinical practice.9,10
Clinical practice guidelines vary in their endorsement of bleeding risk scores. The Canadian Cardiovascular Society’s 2018 guideline update for managing AF recommends using the HAS-BLED (hypertension [systolic blood pressure >160 mm Hg], abnormal renal or liver function, stroke [caused by bleeding], bleeding, labile international normalized ratio, elderly [age >65 years], drugs [acetylsalicylic acid or nonsteroidal anti-inflammatory drugs] or alcohol [≥8 drinks per week]) model to assess bleeding risk when considering warfarin initiation for patients with AF.11 However, the European Society of Cardiology’s 2016 AF guideline discusses the use of HAS-BLED, ORBIT (Outcomes Registry for Better Informed Treatment of Atrial Fibrillation), or ABC (age, biomarkers, clinical history) bleeding risk scores to identify patients with modifiable risk factors, as opposed to withholding the initiation of anticoagulant treatment.12
Manually calculating bleeding risk scores for individual patients can be time consuming and error prone, so there is a practical argument to be made for the utility of these bleeding risk calculators for practitioners in busy clinical settings. Furthermore, clinicians may use these tools to support shared decision making. For example, Kunneman et al have published a protocol for a randomized controlled trial proposing to use the HAS-BLED model to predict bleeding risk in patients with AF; their scores would then be used as part of a conversation tool to promote anticoagulation therapy adherence.13 Given the increasing prevalence and popularity of Web and mobile health applications with embedded clinical prediction models, it is important to assess performance characteristics such as agreement among risk estimates produced by software applications. Therefore, we sought to assess the variation in annual bleeding risk estimates among currently available Web and mobile bleeding risk calculators used for patients with AF.
METHODS
We identified clinical prediction models for bleeding risk in patients with AF based on models reported in clinical practice guidelines,14,15 review articles,8,9 and author knowledge. An initial PubMed search was conducted to identify systematic reviews discussing bleeding risk prediction models used for patients with AF using the following search strategy: (“hemorrhage”[MeSH terms] or “hemorrhage”[all fields] or “bleeding”[all fields]) and (“risk”[MeSH terms] or “risk” [all fields]) and (“patients”[MeSH terms] or “patients”[all fields]) and (“anticoagulants”[pharmacological action] or “anticoagulants”[MeSH terms] or “anticoagulants”[all fields] and review[ptyp]). The American Heart Association and European Society of Cardiology AF guidelines were also searched for additional bleeding risk prediction model recommendations.12,15 We then identified Web and mobile bleeding risk calculators that were based on the clinical prediction models. Web calculators were identified via Google searches, while the App Store and Google Play store were searched for mobile calculators. For inclusion there had to be at least 2 unique calculators accessible for each clinical prediction model, as we were interested in comparing consistency among risk calculators using the same clinical prediction algorithm.
We generated multiple hypothetical patient cohorts that encompassed all possible binary risk factor combinations for each clinical prediction model (Table 1). Each model calculated annual bleeding risk estimates that were dependent on the presence or absence of risk factors specified for that model (ie, 0 or 1). To produce all possible combinations of n binary risk factors for a given model, we coded a combination-without-repetition program in Python 2.7 with n defined as the number of risk factors in the model (details are available from the corresponding author upon request). Risk scores were then calculated for all simulated patients by inputting generated risk factor data into each Web or mobile calculator between September 14 and 17, 2018. A random 10% sample of the calculations were verified independently (=1;
scores test interrater reliability and range from -1 to 1, with 1 indicating perfect agreement).
Binary bleeding risk factors and simulated cohort sizes used for each included risk prediction model
We extracted the annual bleeding risk estimate and category (low [<10%], moderate [10% to 20%], or high risk [>20%]) for each patient and calculator. Risk categories for each model were defined as follows:
mOBRI (modified Outpatient Bleeding Risk Index; 4-point scale) low risk=0 points, moderate risk=1 to 2 points, high risk=3 to 4 points;
HEMORR2HAGES (hepatic or renal disease, ethanol abuse, malignancy, older [age >75 years], reduced platelet count or function, rebleeding risk [history of past bleeding], hypertension [uncontrolled], anemia, genetic factors, excessive fall risk, and stroke; 12-point scale) low risk=0 to 1 point, moderate risk=2 to 6 points, high risk=7 or more points;
HAS-BLED (9-point scale) low risk=0 to 1 point, moderate risk=2 points, high risk=3 or more points; and
ATRIA (Anticoagulation and Risk Factors in Atrial Fibrillation; 10-point scale) low risk=0 to 4 points, moderate risk=4 points, high risk=more than 4 points.
For calculators reporting annual bleeding risk, we calculated the coefficient of variation (CV) of annual risk estimates, the 95% CI of mean annual risk estimates, and the relative difference (RD) between the highest and lowest percentage risk estimates among all patients and within risk categories.
The CV of annual bleeding risk estimates was calculated by dividing the standard deviation by the mean bleeding risk estimates for each of the included risk calculators and is presented as a percentage. The CV provides a metric of the spread of the data for each calculator. Calculators with a larger CV will have more variation in their risk estimates despite being based on a standard hypothetical patient population within the same clinical prediction models. We calculated RD by dividing the mean annual bleeding risk estimate for each calculator (eg, calculator A, B, C) in a given clinical prediction model (eg, HAS-BLED) by the lowest mean annual bleeding risk estimate among all calculators using that respective model. For instance, if calculator A for a clinical prediction model reported an annual bleeding risk estimate of 10%, while the lowest annual bleeding risk estimate for all calculators using the same clinical prediction model was 5%, the RD between these calculators would be 2. Pairwise agreement between calculators across low-, moderate-, and high-risk categories was calculated. Analysis was conducted using SAS, version 9.4.
DISCUSSION
We observed inconsistencies and a lack of precision in annual risk estimates and risk stratification reported by Web and mobile bleeding risk calculators for patients with AF. For instance, risk estimates generated by mOBRI calculators varied by as much as 4 times the lowest value for the same risk score. Additionally, inconsistencies in risk categorization were observed in the HEMORR2HAGES and HAS-BLED calculators. While calculators based on the same prediction rules generated identical scores, the translation from scores to risk estimates was problematic for some calculators. Three of the calculators based on mOBRI appeared to be based on a different validation cohort such that the risk estimates most likely grossly overestimate high risk.
Among the available clinical prediction models used to assess bleeding risk in patients with AF, the HAS-BLED model has been mentioned by both Canadian Cardiovascular Society and American Heart Association–American College of Cardiology guidelines as the bleeding risk assessment tool of choice in a shared decision-making approach with patients.11,15 A systematic review by Borre et al found that the HAS-BLED model has demonstrated evidence of statistically significantly higher predictive ability for bleeding risk in patients with AF versus other models (ie, ATRIA, HEMORR2HAGES); however, this evidence is limited at best.16 Only 2 of the 38 included studies reported a significantly higher predictive ability for HAS-BLED versus other models, while all remaining studies showed no statistically significant differences in bleeding risk prediction.
RESULTS
We identified 4 clinical prediction rules (mOBRI, HAS-BLED, HEMORR2HAGES, and ATRIA) that estimated bleeding risk in patients with AF and 25 calculators based on such rules (Table 2). Although other clinical prediction rules for bleeding were identified, there were either fewer than 2 calculators (eg, ORBIT, ABC) that used them or they were designed for patients without AF (eg, QBleed, CRUSADE [Can Rapid Risk Stratification of Unstable Angina Patients Suppress Adverse Outcomes with Early Implementation of the ACC/AHA Guidelines], IMPROVE [International Medical Prevention Registry on Venous Thromboembolism]). The most commonly used clinical prediction model in the Web and mobile applications assessed was HAS-BLED (n=8), followed by mOBRI (n=7), HEMORR2HAGES (n=5), and ATRIA (n=5).
Characteristics of included bleeding risk calculators
The spread or variation of annual risk estimates for bleeding was highest for mOBRI (CV=64%), followed by ATRIA (CV=59%), HAS-BLED (CV=50%), and HEMORR2HAGES (CV=14%). Calculators based on HEMORR2HAGES and ATRIA had consistent estimates of annual bleeding risk (Table 3). The 95% CIs of mean annual bleeding risk varied among risk prediction models (Table 3). We observed wide variation in annual risk estimates for calculators using mOBRI (maximum RD=4.3) and HAS-BLED (maximum RD=3.1) models. Concordance for risk category stratification among calculators was high for those based on mOBRI and ATRIA (=1 for both). Poor agreement was observed in 1 calculator using HEMORR2HAGES (
=0.54) and another using HAS-BLED (
range=-0.11 to 0.35).
Annual bleeding risk estimates and relative differences in annual bleeding risk of included bleeding risk calculators
These observed inconsistencies in annual bleeding risk estimates are concerning. The decision to prescribe an anticoagulant for AF is based on weighing the benefits of stroke risk reduction against major bleeding risk increase. If a Web or mobile risk calculator provides an inaccurate bleeding risk estimate, clinicians and patients may make harmful decisions. The FDA outlines a clinical evaluation process that should be followed throughout the life cycle of a mobile health application. An assessment of application performance via analytical and clinical validation (ie, sensitivity, specificity, and odds ratio calculations) is recommended, along with risk management and maintenance after the application has been released.4 Our findings highlight the importance of the clinical evaluation process, including verifying and validating the performance of Web and mobile software applications.
Several factors may explain the inconsistencies and lack of precision in estimates produced by the bleeding risk calculators investigated in this study. Certain calculators produced different risk estimates from those noted in the original publications for their respective models. It appears the website or mobile app for these calculators had cited alternative validation studies as data sources for the calculators’ risk estimate outputs. Also, inaccurate translation from risk scores to risk estimates occurred because the website coding algorithms for some calculators were incorrect. This resulted in some patients being placed in the wrong risk categories for their calculated scores. This suggests a common programming error or misunderstanding of data definitions by the calculators’ designers. This observation underscores the importance of rigorous programming when building such tools that could affect therapeutic decision making.
Limitations
Our study has limitations. Data obtained from online risk calculators have the potential to change over time, as a Web or mobile application’s coding algorithms may be updated. Furthermore, artificially generated patients may not be representative of the real-world distribution of risk factors for bleeding. However, all possible combinations of risk factors for each clinical prediction model were included to represent all possible risk scores and diversity in bleeding risk potentially observed in clinical settings. Bleeding risk is highly dynamic and changes over time, so this can be difficult to replicate in hypothetical patient populations.17 Since no real patient data were used in this study, it can be unreliable to compare our calculated bleeding risk estimates with those reported for real patients in the randomized controlled trials for our included clinical prediction models. Furthermore, we used the risk score calculating algorithm provided in the randomized controlled trials (ie, calculator P for each model in Table 3) to calculate bleeding risk in our hypothetical cohort for comparison with bleeding risk estimates produced by Web and mobile calculators.
Conclusion
Considering the popularity and convenience of mobile health applications, these results have important clinical implications for drug therapy decisions. The decision to initiate or defer anticoagulation may change depending on the bleeding risk calculator used because of variation in risk categorization and bleeding risk estimation. Considering the poor predictive accuracy of bleeding risk scores and the inconsistency and imprecision in risk estimation of various Web and mobile risk score calculators, clinicians should instead refer to the annual bleeding risks observed in major randomized controlled trials to inform bleeding risk estimates for individual patients.
Notes
Editor’s key points
▸ Risk prediction models perform poorly outside of their derivation cohorts, but their ease of use has contributed to widespread adoption in clinical practice and inclusion in some guidelines.
▸ Annual bleeding risk estimates produced by Web and mobile bleeding risk calculators varied by 3- to 4-fold among popular risk prediction models for patients with atrial fibrillation. Concordance for risk category stratification among calculators varied from perfect agreement to low agreement among Web and mobile calculators based on the same clinical prediction models.
▸ This study found some risk calculator designers made programming errors and incorrectly interpreted data while building their tools. Clinicians should instead refer to bleeding risk estimates documented in major randomized controlled trials to counsel patients about their individual risks.
Points de repère du rédacteur
▸ Les modèles de prédiction du risque manquent d’exactitude en dehors de leurs cohortes de dérivation, mais leur facilité d’usage a contribué à leur adoption généralisée dans la pratique clinique et à leur inclusion dans certaines lignes directrices.
▸ Les estimations du risque annuel d’hémorragie produites par des calculateurs du risque d’hémorragie sur le Web et sur les mobiles variaient du triple au quadruple dans des modèles populaires de prédiction du risque pour les patients atteints de fibrillation auriculaire. La concordance entre les calculateurs pour la stratification des catégories de risque allait d’un accord complet à une faible concordance entre des calculateurs sur le Web et sur les mobiles fondés sur les mêmes modèles de prédiction clinique.
▸ Cette étude a révélé que certains concepteurs de calculateurs du risque avaient commis des erreurs de programmation et interprété incorrectement des données dans la production de leurs outils. Les cliniciens devraient plutôt se référer à des estimations du risque d’hémorragie documentées dans de grands essais randomisés contrôlés pour conseiller les patients au sujet de leurs risques individuels.
Footnotes
Contributors
All authors contributed to the concept and design of the study; data gathering, analysis, and interpretation; and preparing the manuscript for submission.
Competing interests
None declared
This article has been peer reviewed.
Cet article a fait l’objet d’une révision par des pairs.
- Copyright © 2022 the College of Family Physicians of Canada