Case description
Ms Glow asks you why the radiology practice where she had her mammogram last year recommends annual screening, while the provincial recommendations are to screen every 2 years. She asks which is correct.
The audit program in which you participate measures the proportion of age-eligible patients who have had mammograms within 2 years and shows that your rate, along with the rates of a high proportion of comparable colleagues, has only just reached 50%. Many of your patients were tested a few months “late,” so were regarded as not screened. You wonder whether these are reasonable expectations. After all, the Canadian Task Force on Preventive Health Care (CTFPHC) recommended every 2 to 3 years.1
When someone has a normal screening test result, the question arises whether to repeat the test, and if so, how often.
A few screening tests are performed once only (eg, ultrasound for abdominal aortic aneurysm2), but we repeat most tests at varying intervals. The default frequency is based on the earth’s orbit around the sun. This annual interval is easy for us and our patients to remember, but only has a distant relationship with most biological processes. We must do better, as repeating tests at too-short intervals provides minimal benefit, while possibly causing harms from the screening test itself, follow-up management, false-positive results, or overdiagnosis.3 On the other hand, repeating tests at too-long intervals can miss new disease for which the course could be modified for the better by earlier detection.
Our decisions must find the path between Scylla and Charybdis. This idiom from Greek mythology inspired the main thesis of this article. Family physicians must attempt “to choose the lesser of 2 evils.” Some colleagues focus on the sad end stages of disease and tend to urge more frequent screening, even when the evidence does not support this. In primary care, on the other hand, we also see the harms caused by too much screening—unnecessary investigations, overdiagnosis, and unnecessary treatments cause great anxiety and real injury to patients. We must understand and balance the probabilities, severity of harm, and emotional effects of these alternate outcomes.
How should we choose intervals?
Sometimes we have direct empirical evidence for which intervals to choose, whether from trials or from other forms of analysis. More often, guideline groups must make choices based on the characteristics of the tests and the tests’ interaction with the natural history of the disease.4 Family physicians must understand the implications of these choices and that no solution will be perfect. We must ensure that decisions on screening intervals are based on patient benefit, not on physician preferences, convenience, or income.
Characteristics of screening tests
Screening tests are imperfect; each result is subject to biological or analytical variation,5 which should be distinguished from real change in pathology. A change larger than the sum of these variations might represent real progression in pathology; if it does not, we are likely chasing a shadow. Thus, understanding measurement precision is crucial. For example, a patient had a hemoglobin A1c level of 6.2% 3 years ago and now has a level of 6.5% (ie, above the threshold of 6.4% for classification as having diabetes). This change is within the measurement error range of 6% to 10%5; therefore, this patient cannot yet be labeled as having advancing disease. Other problems arise from poor clinical technique. Not waiting to measure blood pressure after the patient hurried into the office, or using a cuff that is too small, produces measurements that are misleading.
Understanding test variation applies not only to screening, but also to repeating tests for monitoring disease, although this article will not discuss that issue. Decisions about follow-up evaluation of patients with disease are different from the best interval for screening in those presumed still healthy.
Characteristics of disease and implications for screening intervals
Figure 1 shows a range of possible progressions of cancers that can be intercepted by screening tests performed at regular intervals.6 Type A cancers develop so slowly that they might never grow beyond the microscopic level in the person’s lifetime. Type B cancers grow slowly. These will never cause symptoms and might even regress. When screening finds these cancers, the patients are treated, even though it is unnecessary, because they are overdiagnosed. Type C cancers can also be detected by screening and, for some, treatment makes a difference in outcome. Type D cancers grow very rapidly and are difficult to detect by screening. They most often present clinically or even as disseminated disease in the interval before the next screen.
Screening detection capability based on tumour biology and growth rates: Growth rates of 4 tumours are displayed from the time the first tumour cells appear while the tumour is not yet detectable (microscopic); when it can be detected as localized (confined to the organ) and most likely to be curable; regional (after the tumour spreads beyond the organ) where it might not be curable; and to the point when metastases and death occur. Tumour A remains undetectable and without morbidity during the patient’s lifetime. Tumour B grows until it becomes detectable but never causes symptoms or leads to death. Tumours A and B represent low-risk indolent tumours or IDLEs (indolent lesions of epithelial origin). Tumour C is destined to become metastatic and fatal but can be detected while still curable. Tumour D is destined to become metastatic but grows so quickly that by the time it can be detected, it might no longer be curable. Among these 4 tumours, only the patient with tumour C benefits from screening.
The effectiveness of cancer screening depends on the distribution of cancers across these groups and the susceptibility of each of these types of cancer to treatment. Some cancers, such as most testicular cancers, grow very rapidly (type D) but are very responsive to treatment. On the other hand, breast cancers that present clinically in the intervals between screening mammograms (type D) are less responsive.7 Thus, more frequent screening might find more of them, but make little difference in outcome. While doctors who see patients with such cancers are dismayed and want to screen more frequently, disease biology will preclude these efforts from having any effect. A different example is thyroid cancers. Most papillary and follicular thyroid cancers respond well to treatment, so they do not need to be detected very early. On the other hand, medullary and anaplastic thyroid cancers respond poorly to treatment no matter what stage they are found at. Thus, screening is very unlikely to improve thyroid cancer outcomes.
The same principles apply to noncancer diseases found by screening, such as diabetes and hypertension. As most diseases change in incidence with age, it is often appropriate to change intervals accordingly. To help understand this, Figure 2 shows a (hypothetical) condition for which screening is appropriate. The incidence is low in early- and middle-adult life but increases steadily after 50 years of age. (While most such diseases will cause some deaths, for the purpose of this model we will ignore them.) When prevalence rises to a level where screening is more beneficial than harmful, then screening can be conducted—in this example, at 60 years of age. If the sensitivity of the test is 0.75, then the test will find most of the cases and reduce the number of undetected cases in the screened population by 75%. The population prevalence in the remainder will drop. Subsequently, new cases will develop, parallel to the original incidence (shown in the blue line) and will rise back to the threshold level where benefits outweigh harms by about 67 years of age. If a second screen is done, a similar change will occur, rising to the threshold at around 73 years of age. With screening, the prevalence will again drop, then rise again more rapidly; therefore, a further and final screen might be warranted at around 77 years of age. In this purely mathematical model, because the incidence increases with age, the interval decreases between each subsequent screen.
Effect of 2 screening regimens on disease prevalence: Prevalence of a disease increases with age (green line). Prevalence is reduced by a screening test with sensitivity of 0.75 at age 60, when benefit is considered optimal compared with harms caused. Thereafter, new incidence of disease in the remaining population occurs at the original expected rate. The model shows the effects of repeat screening tests when prevalence rises to the same level (blue line) or every 2 y from age 60 to age 70 (red line). Note that in prevalence-based screening, intervals might vary. In regular frequent screening, subsequent screens have less benefit, although during this period the prevalence of undetected disease is lower; 6 screens have fewer long-term effects compared with 3 screens at longer intervals.
For most diseases, screening programs produce false-positive results, overdiagnosis, and labeling, creating unnecessary investigation and anxiety. The lower zigzag (red) line shows the effect of biennial screening between 60 and 70 years of age. Six screens in that age range require more effort, yield few extra cases, and also would produce more false-positive results. Thus, while frequent repeat testing will further reduce untreated disease, it will increase the proportion of people who experience harms.
As illustrated in Figure 2, testing at longer intervals will find most treatable cases, with less harm. In general, screening tests should only be repeated at long enough time intervals that the benefits from finding true positive results outweigh the harms caused by overdiagnosis and overinvestigation of people with false-positive results (including those identified as such solely because of random variation in measurement).
After a positive screening result, patients are usually removed from the screening population (eg, after a diagnosis of cancer or diabetes) and thereafter follow a program of surveillance (tertiary prevention). This can be a heavy burden that is justified for those whose outcomes we can improve, but not for overdiagnosed cases.
Consequently, screening will help a few patients; the proportion will depend on the distribution of rates of disease development and their relative susceptibility to treatments. The harms of treatment usually increase with age, while the benefits of treatment might be less or take longer to accrue. Based on each specific condition, it might make sense to vary the screening intervals or plan relatively uniform intervals. Depending on whether disease progression and the benefit of treatment is greater or less at older ages, it might be appropriate to screen more at older age or to cease screening.
To justify recommending annual or even biennial intervals requires either a poor-quality screening test that reduces prevalence only by a small fraction equivalent to the incidence during that interval, or a very rapidly developing disease with such high incidence and curability that frequent screening is reasonable. Sometimes it is argued that we can compensate for a poor-quality test with more frequent screening, but it is better to improve the quality of the tests than to use poor tests. Decisions about screening must be based not just on detecting disease, but on the proportion of people with disease who benefit from intervention after informed consent.
Empirical evidence
Our search for studies that provide evidence for appropriate intervals for adult screening tests encountered challenges owing to the lack of trials on this question. Most of the literature comprises studies designed to determine whether screening of previously unscreened populations should be performed at all. Because trials are costly and require long follow-up before results can be measured, most screening trials performed 1 test or a few repeat tests. Few trials are designed to measure the relative value of different screening intervals; therefore, decisions must usually be made on less direct evidence.
Table 1 summarizes our findings for conditions for which screening adults is worthwhile.8-22 We focused on outcomes that mattered to patients, rather than intermediate outcomes such as laboratory measurements or risk estimates, as using intermediate outcomes entails making assumptions about the benefit of changing the disease trajectory.
Evidence for intervals of repeat screening
Randomized trials of screening decisions can provide direct evidence, although this evidence might be confusing to interpret. For example, there are now several published trials of low-dose computed tomography in heavy smokers to detect early lung cancer.15 The National Lung Screening Trial in the United States used 3 annual screens,16 while the Dutch-Belgian NELSON trial used an initial screen, a second screen a year later, a third 2 years later, and then a screen after another 2.5 years.17 Since these trials were published, different expert groups have recommend different intervals. For example, the CTFPHC recommended annual screening among current or previous heavy smokers aged 55 to 74 years for 3 years,23 while the US Preventive Services Task Force extended this annual screening from ages 55 to 80 years.24 The European expert group suggested that after an initial screen and another screen 1 year later, if there are no nodules, repeating the annual screen twice would provide little additional information.15
Papanicolaou test screening for cervical cancer was initiated without a formal randomized controlled trial, but a case-control study by the International Agency for Research on Cancer combined data from several countries.9 This study showed that effectiveness in reducing squamous cervical cancer incidence dropped 2.7% if the screening interval increased from once a year to every 3 years, but 9.9% if the interval was increased to 5 years (Table 2).9 Consequently, most countries chose 3-year intervals; however, Canada, the United States, Germany, and Australia at the time chose annual intervals for various emotional rather than scientific reasons.25 A comparison of European countries shows little difference in mortality outcomes between countries with screening intervals of 2 to 5 years.26 When HPV (human papillomavirus) testing is instituted, 5-year intervals will be enough to intercept nearly all serious disease10,27 without causing excess harm through false-positive results and subsequent interventions, such as colposcopy and excisional biopsy.
Effectiveness of cervical cancer screening in women aged 35 to 64 y by interval since last screening
Mammography trials were not designed to determine the importance of different intervals, but a CTFPHC literature review found that outcomes appear similar for women screened at intervals of up to 33 months and at shorter intervals of 12 or 24 months.11 Consequently, the CTFPHC recommended an interval of 2 to 3 years.1 The US Preventive Services Task Force uses the same evidence to recommend 2-year intervals; however, they specifically note that there is no evidence of extra benefit from annual screening, but there are increased harms. Radiologists often recommend annual intervals, particularly for women 40 to 49 years of age, but do not provide a cogent evidence analysis for this approach.28 Few evidence-based medicine groups recommend screening for that age group.
Colorectal cancer screening recommendations initially gave annual intervals (or the option of 1 to 2 years), as the first reported trial (which was from Minnesota) reported benefit only with annual testing in the first few years of follow-up. However, in the long-term follow-up of the Minnesota trial, there was no reported statistical difference between annual testing and biennial testing.29 Repeating screening annually increases the risk of false-positive results. This result was extrapolated to FIT (fecal immunochemical testing), as it is more sensitive and more reproducible than fecal occult blood testing. Once a high-quality colonoscopy has been performed as a follow-up to the primary test or for some other reason, if no polyps are found, the chance of interval cancers is low for more than 10 years13 and some evidence suggests it could be even longer (more than 17 years).14 Thus, after normal colonoscopy findings, FIT should not be restarted for at least 10 years.
Cardiovascular and metabolic disease. Cardiovascular risk factors are best considered together, as they all lead to the same major outcomes of stroke, myocardial infarction, heart failure, and renal disease. There is a dearth of trials conducted in people not taking medication to guide the frequency of risk factor assessment. Most studies measure level of risk rather than outcomes that are important to patients.
For blood pressure, a systematic review30 demonstrated that the apparent incidence is greatly affected by the quality of measurement and whether repeat measurements were taken to make the diagnosis. Patients with initial normal blood pressure had a 2% to 9% chance of having high blood pressure after 5 years, whereas those with high-normal blood pressure (130 to 139 mm Hg systolic and 85 to 89 mm Hg diastolic) had a 28% chance of passing the diagnostic threshold of 140/90 mm Hg after 2 years. Overall, the incidence of hypertension in a population after various screening intervals was related to original blood pressure, age, obesity, and African American race.30
For lipid testing, rescreening within 3 years is likely to be confounded by test measurement variation.31 Throughout middle age, lipid levels tend to creep up, likely because of changes in diet, activity, and weight. After 65 years of age, it might no longer be worth measuring levels, as cholesterol levels seldom change much after that age32 and lipid treatment has not been shown to be helpful for primary prevention in the elderly.33
Analysis of total cardiovascular risk in the Whitehall cohort study of British public servants (mean age 50 years) used greater than 7.5% 10-year risk of cardiovascular events (myocardial infarction, death from coronary artery disease, fatal or nonfatal stroke) estimated by the ASCVD (atherosclerotic cardiovascular disease) calculator34 as the outcome.18 For those at “low risk” (< 2.5%), 10-year progression to intermediate risk (2.5% to < 5.0%) was 2%. By contrast, those with higher risk had a progressively higher probability of progression to higher risk and serious outcomes. Therefore, shorter screening intervals were appropriate. The authors recommended that screening intervals should therefore be chosen based on the previous risk category (Table 1).8-22
Diabetes. The CTFPHC19 recommends screening for diabetes every 3 to 5 years using the FINDRISC (Finnish Diabetes Risk Score) calculator. Hemoglobin A1c level screening is only recommended annually for those at very high risk. The CTFPHC assesses the evidence as low quality; therefore, the recommendation is conditional (weak). Diabetes Canada recommends screening every 3 years for people older than 40 years of age, and more frequently for those at very high risk, as determined by their CANRISK (Canadian Diabetes Risk Assessment Questionnaire) score; this is a grade D (consensus) recommendation.35
Osteoporotic fractures. Prevention of osteoporotic fractures requires assessment of fracture risk, and bone mineral density is only 1 risk factor.36 Regrettably, the best predictors of important osteoporotic fractures are increasing age, falls, and previous fragility fractures. The variance of density measurement is high compared with its slow change over time. For those with normal bone density, the chance of progression to fracture is very low. Consequently, for women aged 65 with normal or mildly low hip T-scores (> -1.5), there is no need to measure again for 15 years. For those at moderate risk (T-scores -1.5 to -1.99), remeasuring at about 5 years, and for those with lower density (T-scores -2 to -2.49) measuring at 1-year intervals will identify that 10% or more have advanced to osteoporosis, a level where treatment benefits might outweigh harms.20
A recent publication from the Women’s Health Initiative Study cohort shows that information gained from a second test 3 years after the first does not add value to predictions made using the first result.21 They argue that 1 bone density test at around 70 years of age is sufficient for screening to prevent fragility fracture.
Audit
Audit programs must allow for reasonable flexibility in screening intervals, even for preventive maneuvers for which intervals are short. Little harm is likely to result from moderate delay. Consequently, audit programs that measure adherence to recommendations should allow substantial lag time before classifying either patients or physicians as having a delayed screen.37 Focusing on making participating patients even more tightly adherent misses the important group—those who are not being screened at all. Even so, those patients might have their own valid reasons for not participating. In the case of conditional recommendations, audits of practices should also measure whether physicians have followed a process of shared decision making with patients. Audits should measure the proportions of patients who are screened excessively early, who are screened “on time,” who are screened at extended intervals, and who have had a conversation with the physician and decided to forgo testing. Such an approach would reflect the complex reality of practice.
Case resolution
Ms Glow is right to ask about the discrepancy in mammography recommendations. You explain that the best evidence suggests that less-frequent screening produces similar gains, but with less risk of false positives and harm.
Your “low” screening rate on the audit might be inappropriate, but you cannot tell, as the measurement did not assess women who chose not to screen or women whose interval was up to 3 years, as deemed acceptable by the CTFPHC. You decide to be more cautious about participating in audit programs that provide simplistic answers.
Conclusion
The risk of new disease detected by a screening program increases slowly with time since the previous test. Given the state of knowledge about intervals, we do not provide firm recommendations for many of the topics we list.38 For clarity, recommendations for screening intervals made by guideline organizations usually provide a single recommended interval, but physicians and patients need not worry if the time interval is somewhat longer.
Many patients and physicians have become accustomed to regular retesting for many potential health problems. Annual intervals are often excessive and potentially harmful. Ideally, for each screening activity physicians should involve patients in discussions about appropriate intervals for their individual risk levels and tolerance. When there is an evidence-based range of intervals (eg, 5 to 10 years), we should avoid focusing on the shortest end of the range.
Guideline development groups should provide better guidance regarding screening intervals, not just whether and how to screen or when to start and stop. Our computerized records need to help us to use variable intervals and to adjust them according to patient preferences and choices or previous results.
Acknowledgment
We thank Samiha Tarek Ah Mohsen for assistance with the figures.
Notes
Key points
▸ Screening for disease generally implies testing a population at average risk and identifying and further investigating those at high risk, thus leaving a population at lower risk. In this remaining population, risk rises over time. Repeat screening should occur when the probability of benefit from further screening is greater than the probability of harm.
▸ Evidence on how screening intervals affect outcomes is limited and generally focused on intermediate outcomes, such as biochemical or radiologic measurements, not outcomes important to patients, such as death or disability.
▸ When choosing the right intervals for repeating screening tests, one should aim to obtain the best value and produce the least harm for patients, but there is limited evidence comparing intervals and limited guidance on how to make that judgment.
▸ Guideline writing groups should search for such evidence to make better recommendations and should identify the uncertainty when good evidence is unavailable.
▸ Intervals for most screening activities can be longer than many current recommendations, but might differ as risk increases, usually with age, certain health habits, or family history.
▸ Practice audits must reflect the complexity of practice and use measures that assess errors of commission (overscreening; ie, screening too often, which is the most common error) and errors of omission (underscreening; ie, screening at longer intervals or not screening at all).
Footnotes
Competing interests
All authors have completed the International Committee of Medical Journal Editors’ Unified Competing Interest form (available on request from the corresponding author). Dr Singh reports grants from Merck Canada, personal fees from Pendopharm, and personal fees from Ferring Canada, outside the submitted work. The other authors declare that they have no competing interests.
This article is eligible for Mainpro+ certified Self-Learning credits. To earn credits, go to www.cfp.ca and click on the Mainpro+ link.
La traduction en français de cet article se trouve à www.cfp.ca dans la table des matières du numéro de février 2021 à la page e48.
- Copyright © the College of Family Physicians of Canada