Abstract
Objective To critically review and evaluate the psychometric properties and practical considerations of administering generic and diabetes-specific quality-of-life (QoL) tools in the clinical environment and provide recommendations.
Data sources and tool selection A MEDLINE search was carried out from January 1950 to August 2015 using the MeSH terms diabetes, quality of life, and questionnaires. Four generic and 4 diabetes-specific tools were selected based on the frequency of their use and the existence of published evidence of strong psychometric properties in patients with diabetes (either type 1 or 2). The generic tools included the Short Form-36 (SF-36), Short Form-12 (SF-12), Sickness Impact Profile, and EuroQol EQ-5D instruments. Diabetes-specific tools included the Audit of Diabetes-Dependent Quality of Life, Diabetes Quality of Life, Appraisal of Diabetes Scale (ADS), and Diabetes Health Profile instruments.
Synthesis The SF-36 is one of the most widely used general health measures in QoL research and it has proven reliability and validity. However, the SF-12 is a better option for a family practice owing to its shorter length. The SF-12 has been shown to be closely correlated with the SF-36. Of the diabetes-specific measures, the ADS is known be valid, short, and relatively straightforward in terms of scoring, thereby increasing its usefulness in routine clinical practice. The Audit of Diabetes-Dependent Quality of Life and Diabetes Quality of Life tools have been widely tested and have generally been found to be more valid and reliable than the ADS, but specific issues with feasibility make them unappealing for the clinical setting. The rationale was to find the most rigorously tested instrument within the scientific literature in terms of validity, reliability, and responsiveness. However, this was not done, as judging the quality of a measure is not simply a matter of determining its psychometric properties but rather requires qualitative judgment about the entirety of the evidence.
Conclusion Finding ideal tools and procedures for routine data collection in the clinic setting requires organization and groundwork that will eventually assist both clinicians and researchers by providing reliable information on QoL for patients with diabetes. Further research is necessary to assess the validity and responsiveness of these tools specifically relating to evaluation of QoL for those with diabetes.
Countless tools have been used to assess quality of life (QoL) in patients with diabetes. However, which instruments are the most valid and feasible for evaluating patient outcomes has not been determined. Assessing the value of such tools can help improve the interpretation of results and allow comparisons across studies. We must remember that in selecting the most ideal instrument, any conclusions drawn about its usage will only be applicable to the study population in which its psychometric properties have been tested. Therefore, any conclusions drawn when changing study populations without properly testing the psychometric properties are strictly conjecture. Diabetes is a devastating condition that negatively affects a patient’s QoL and results in long-term problems like cardiovascular disease, renal disease, retinopathy, stroke, and ulcers.1 While there has been an increase in the use of outcome measures to evaluate QoL,2,3 there is no consensus regarding the most appropriate tools to use. It is important to identify such tools within the setting of daily clinical practice.
The tools that have been previously used in studies assessing diabetes and QoL vary in terms of validity, reliability, responsiveness, and feasibility. It would be useful to standardize the reporting process in order to allow clinicians to make informed treatment decisions. We will compare the psychometric and practical properties of 4 commonly used generic and 4 diabetes-specific instruments. The purpose of this article is to critically review the psychometric and practical properties of these tools to identify the most appropriate choices and provide recommendations for implementation in a clinical setting. The findings of this review will provide answers that could be used in both patient care and research settings.
DATA SOURCES
A MEDLINE search was carried out from January 1950 to August 2015 using MeSH terms diabetes, quality of life, and questionnaires. Four generic and 4 diabetes-specific tools were selected based on frequent usage and published evidence of strong psychometric properties in patients with diabetes (either type 1 or 2). The generic tools included the Short Form-36 (SF-36), Short Form-12 (SF-12), Sickness Impact Profile (SIP), and EuroQol EQ-5D instruments. Diabetes-specific tools included the Audit of Diabetes-Dependent Quality of Life (ADDQoL), Diabetes Quality of Life (DQoL), Appraisal of Diabetes Scale (ADS), and Diabetes Health Profile (DHP-1) instruments.
Assessment
Instrument suitability is propelled by psychometric theory (reliability, validity, and responsiveness) and practical properties (feasibility). Reliability refers to the ability of an instrument to yield consistent and reproducible results. Test-retest analyses evaluate the stability of an instrument when it is repeatedly administered to a patient or group of patients over a period of time without any real change (Cronbach α > 0.70). Internal consistency is the extent to which items comprising a scale measure the same construct and it is assessed by Cronbach α and item-total correlations. These measures gauge the reliability of an instrument. Cronbach α scores of greater than 0.70 and item-total correlations greater than 0.20 are generally considered acceptable for a tool. Validity refers to whether an instrument truly measures what it aims to measure. Criterion validity refers to the correlation of a measure with a criterion standard. Construct validity is evidence that the scale is correlated with other measures of similar construct in the hypothesized direction. Content and construct validity are most relevant when evaluating patient self-evaluation instruments. Responsiveness refers to the ability of an instrument to detect change when change occurs. Floor and ceiling effects describe the ability of an instrument to measure accurately across the full spectrum of a construct (summary scores < 15%). Practical properties (feasibility) include the time to complete the instrument, the burden on the patient, the acceptability of the questions, the financial resources needed to implement the tool in practice, personnel training, scoring, data analysis, and clinical relevance.
SYNTHESIS
The characteristics of the 4 generic and 4 diabetes-specific QoL tools are shown in Table 1, and a summary of the assessment of their properties appears in Table 2. Measures of general health status are designed to assess a range of outcomes but are less sensitive to change in individuals with a specific disease. The SF-36 and SF-12 are 2 validated generic QoL instruments that assess a range of general health status measures.4–7 The SF-36 has 36 items that assess health across 8 domains. There are categorical responses, weighted scoring algorithm transformations (rated on a scale from 0 to 100, with 100 denoting the best health), and physical and mental component summary scores (PCS and MCS) that require scoring software. The most substantial evidence exists for the SF-36 to capture the broader aspects of health for people with diabetes, including internal consistency, content and construct validity, and responsiveness.8–11 No evidence has been reported for reproducibility. The SF-36 has several issues with its feasibility in daily practice, including subject burden and time to completion for the elderly population, extra staff training for its implementation and use, purchase of computer software for scoring, and a lack of components that assess outcomes in patients with diabetes specifically. The SF-12 was constructed as a shorter, validated version of the SF-36 that could be applied in a clinical setting.6,7 The SF-12 uses the same domains as the SF-36 and has similar PCS and MCS scores generated using normative-based scores, with higher scores indicating better health. Direct comparisons of both the PCS and the MCS between the SF-36 and SF-12 have indicated very good correlation and agreement.6,7 Some evidence has demonstrated construct and content validity among patients with diabetes,12,13 but none has been reported for reliability or responsiveness. The SF-12 is more attractive than the SF-36 for use in busy family practice clinics because of its reduced number of questions and completion time.
The SIP was developed by Bergner et al14 to evaluate self-assessed health-related behaviour. The SIP has 136 items across 12 domains. Higher scores represent increased impairment (0 is better health and 100 is worse health), and 2 summary scores can be calculated for physical function and psychosocial function. There is some evidence indicating that the SIP is valid in patients with diabetes, but further study is warranted.15,16 The SIP is not feasible for the clinical setting because of its length, subject burden, and time to completion.
The EQ-5D was developed in 1990 by a multidisciplinary European team for use in outcomes related to a specific health condition or treatment.17 The first part consists of 5 dimensions measuring mobility, self-care, usual activity, pain, and depression. The second part has a 20-cm visual analogue scale with end points labeled “best imaginable health state” and “worst imaginable health state,” anchored at 100 and 0, respectively. The EQ-5D has good evidence of content and construct validity18,19 and a moderate level of responsiveness in patients with diabetes.20 Yet, some ceiling effects have been noted with the use of this tool.20 Of the generic tools reviewed, the EQ-5D has the shortest completion time and the lowest burden on patients and staff.
Diabetes-specific instruments are designed to be more sensitive to changes within this patient group compared with generic tools. The ADDQoL questionnaire is a condition-specific outcome measure suitable for patients with either type 1 or 2 diabetes. It consists of 18 items.21 Each item is scored on a 7-point scale from − 3 (much better) to + 3 (very much worse). The scores for all items are multiplied by importance ratings to calculate a final score ranging from − 9 to + 9. The average time taken by patients to complete the questionnaire is less than 10 minutes. Good internal consistency (Cronbach α of 0.92),12 content and construct validity,12 and responsiveness22 have been demonstrated with the ADDQoL.
The DQoL measure consists of 46 items (forming 4 domains) ranked on a 5-point Likert scale.23 Individual domain and DQoL total scores (average of 4 domains) range from 0 (lowest possible QoL) to 100 (highest possible QoL). Evidence of reliability (Cronbach α of 0.47 to 0.87)24 and validity has been reported.24,25 Limited evidence has been published about its responsiveness.26–28 Feasibility in terms of length and respondent burden can be issues in the outpatient setting, as the DQoL on its own takes up to 10 minutes to complete, and that time doubles if it is used alongside a generic instrument like the SF-36.
The ADS is a standardized diabetes-specific tool developed by Carey and colleagues in 1991 to evaluate a person’s thoughts about coping with diabetes.29 It consists of 7 items that use a 5-point adjectival scale, and scores are calculated by summing up each component with 0 representing the least effect of diabetes and 35 the greatest effect of diabetes. Sufficient reliability (Cronbach α of 0.73 and item-total correlations in the range of 0.28 to 0.59)29 and validity30–32 have been demonstrated. The ADS can be completed in less than 5 minutes, which makes it a strong candidate for use in the outpatient setting.
The DHP-1 was created in 1996 to assess the psychosocial aspects of having diabetes.33 The DHP-1 encompasses 32 items in 3 domains (ie, psychological distress, barriers to activity, and disinhibited eating) and uses a 4-point adjectival scale. Items are summed and transformed into a score ranging from 0 (no dysfunction) to 100. Cronbach α has been assessed in 2 groups for each of the 3 domains (0.85 and 0.86 for psychological distress, 0.82 and 0.85 for barriers to activity, and 0.77 and 0.80 for disinhibited eating),33 and the tool has been shown to have good convergent and discriminant validity34 and responsiveness within the domains of psychological distress and barriers to activity.35 Issues have been reported with the questions, as they might be considered out of date and more useful to measure distress. Some people might consider it lengthy to complete if used with a generic measure like the SF-36.
DISCUSSION
An array of tools, some with unknown psychometric properties, have been used in assessing diabetes-related QoL.4–7,14,17,21,23,29,33 The range of these instruments and the lack of high-quality evidence showing strong psychometric properties confounds the generalization of QoL trials. It would clearly be useful to identify appropriate choices that could be standardized and used in research trials.
Despite the lack of validity studies, various authors have reviewed both generic and diabetes-specific tools used in diabetes-related QoL trials.36,37 The SF-36 is one of the most widely used general health measures used in QoL research and it has proven reliability and validity. It is widely available and has been validated in many languages, which would support multinational clinical collaboration. Furthermore, age-matched and sex-matched population normative data are available. Although it should be the tool of choice, we believe the SF-12 is a better option for use in family practice. There have been issues with reliability in smaller sample sizes, but this could be negated with the use of a diabetes-specific tool. The SF-36 is relatively long. The SF-12 has been shown to be closely correlated with the SF-366,7 and is short enough for easy completion. The EQ-5D is another generic tool that is shorter than the SF-12, but ceiling effects have been noted among patients with diabetes20 and it might be considered too general in content. For practical reasons, we see no advantage to using the SIP in any QoL studies among patients with diabetes, as it is fairly lengthy and causes respondent burden.
Of the diabetes-specific measures, the ADS is known to be valid, short, and relatively straightforward in terms of scoring, thereby increasing its usefulness in routine clinical practice. Two diabetes-specific measures (ADDQoL and DQoL) have been widely tested and generally found to be more valid and reliable than the ADS, but specific issues with feasibility make them unappealing for use in the clinical setting. The ADDQoL has been widely tested in patients with diabetes and has generally been found to be valid and reliable, but its questions are fairly complex and lengthy. Compared with the ADDQoL, the DQoL had some extra questions that were deemed to be more acceptable to patients, but it is still fairly lengthy and there are issues with the complexity of certain opening questions that could potentially affect choices on the remainder of the items. These 2 measures are frequently cited comparator measures in other reviews. The other tool (DHP-1) had feasibility issues and thereby limited usefulness in clinical practice. Our rationale was to find the most rigorously tested instrument within the scientific literature in terms of validity, reliability, and responsiveness. However, we chose not to do this, as judging the quality of a measure is not simply a matter of determining its psychometric properties but requires qualitative judgment about the entirety of the evidence. Given the complexity of many of the studies, it is unlikely physicians will use the research findings in an informed process, especially in a fast-paced clinical setting. However, new scales are being developed and further evidence will become available.
Generic and diabetes-specific instruments measure different domains. A generic tool is necessary to evaluate overall health and comorbidities. Additionally, generic tools like the SF-36 and SF-12 have age- and sex-matched data for comparison. Nevertheless, general health status tools are not designed to be sensitive to changes in health for patients with diabetes; a diabetes-specific tool is required to differentiate among patients in the study population when various treatments are being examined.
Based on our review of the literature, we recommend the SF-12 and ADS for evaluation of diabetes-related QoL. Both instruments together are ideal for a complete assessment and they are feasible for use in a busy family practice clinic. In implementing the SF-12 and ADS for research in the clinical setting, the first step is to obtain permission from the developers to use their instruments (through direct contact with the authors) and obtain the users’ manuals. The investigator needs to be familiar with the psychometric properties, scoring, and guidelines for administration. Next, practical issues must be considered for using the tool in a specific practice. These include some of the questions that were used in our review: the cost of implementation, the method of administration (eg, patient or staff, computer or manually), extra staff required for administering the instrument, and the relative sample size of the study population. Ultimately, the limiting factor in any study is the cost. A proper flowchart can address the methodology and sequential steps for any specific problem. The final part is data analysis for trends and statistical significance. This can be a challenge in a private family practice, especially without research funding. One solution is including a biostatistician in the research team as a co-author.
Limitations
All measurements in the study were planned and selected to protect the integrity of the study results, but there were potential limitations inherent in the design. The study employed a qualitative design rather than systematic review owing to the complexity of the task and partly pragmatism (lack of time). We have made several observations regarding methodologic issues: the patient population is often poorly described in terms of comorbidities; there can be a lack of clarity between response and nonresponse groups in terms of rates; many studies had small patient numbers and had to be excluded; we sometimes questioned whether the correct questionnaire was used for a particular study; and various instruments had weak evidence for psychometric properties, and further research needs to be directed in this area.
Conclusion
Finding ideal tools and procedures for routine data collection in the clinic requires organization and groundwork that will eventually assist both clinicians and researchers by providing reliable information on diabetes-related QoL. Assimilation of QoL outcome assessment into routine care provides the best clinical practice guidance. This article provides recommendations on using the SF-12 and ADS for assessing QoL based on a critical review of the literature. The SF-36 and SF-12 are the only tools that require scoring software among all the reviewed scales. Further research is necessary to assess the validity and responsiveness of these tools specifically relating to evaluation of QoL in patients with diabetes.
Notes
EDITOR’S KEY POINTS
Diabetes can be a devastating condition that negatively affects a patient’s quality of life (QoL) and results in long-term problems like cardiovascular disease, renal disease, retinopathy, stroke, and ulcers. While there has been an increase in the use of outcome measures to evaluate QoL in patients with diabetes, there is no consensus regarding the most appropriate tools to use.
The purpose of this article was to critically review the psychometric and practical properties of commonly used generic and diabetes-specific QoL instruments.
The strongest evidence exists for the Short Form-36, but the authors recommend using the Short Form-12 (SF-12). The SF-12 has been shown to have very good correlation and agreement with the Short Form-36, but its shorter length makes it more practical in the busy clinical setting, although it does require scoring software. Concerns exist about its reliability in smaller sample sizes, so the authors recommend using the Appraisal of Diabetes Scale in combination with the SF-12.
POINTS DE REPÈRE DU RÉDACTEUR
Le diabète est une maladie qui peut avoir des effets dévastateurs sur la qualité de vie (QdV) et qui, à long terme, entraîne des problèmes de santé comme la maladie cardiovasculaire, une maladie rénale, une rétinopathie, un accident vasculaire cérébral et des ulcères cutanés. Bien que les méthodes d’évaluation de la QdV des diabétiques soient de plus en plus utilisées, il n’y a pas encore de consensus sur les outils les plus appropriés à utiliser.
Le but de cet article était de faire une revue critique des propriétés pratiques et psychométriques des outils de mesure de la QdV des patients en général et de celle des diabétiques en particulier.
C’est pour le Short Form-36 (SF-36) qu’on trouve les preuves les plus convaincantes, mais les auteurs recommandent aussi d’utiliser le Short Form-12 (SF-12). Une bonne corrélation et un bon accord entre le SF-12 et le SF-36 ont déjà été démontrés; toutefois, parce qu’il est moins long, le SF-36 est plus pratique dans le contexte d’une clinique très achalandée, quoiqu’il nécessite un logiciel pour calculer le score. Comme il existe des doutes sur sa fiabilité pour des petits groupes, les auteurs recommandent d’utiliser plutôt le SF-12 en combinaison avec l’Appraisal of Diabetes Scale (ADS).
Footnotes
This article has been peer reviewed.
Cet article a fait l’objet d’une révision par des pairs.
Contributors
Both authors contributed to the concept and design of the study; data gathering, analysis, and interpretation; and preparing the manuscript for submission.
Competing interests
None declared
- Copyright© the College of Family Physicians of Canada