The inter-rater reliability and internal consistency of a clinical evaluation exercise

Kroboth, Frank J.; Hanusa, Barbara H.; Parker, Susan; Coulehan, John L.; Kapoor, Wishwa N.; Brown, Frank H.; Karpf, Michael; Levey, Gerald S.

doi:10.1007/BF02598008

The inter-rater reliability and internal consistency of a clinical evaluation exercise

Original Articles
Published: March 1992

Volume 7, pages 174–179, (1992)
Cite this article

Journal of General Internal Medicine Aims and scope Submit manuscript

Frank J. Kroboth MD¹,
Barbara H. Hanusa PhD, MS Hyg,
Susan Parker MS,
John L. Coulehan MD, MPH,
Wishwa N. Kapoor MD, MPH,
Frank H. Brown MD,
Michael Karpf MD &
…
Gerald S. Levey MD

323 Accesses
60 Citations
Explore all metrics

Abstract

Objective:To assess the internal consistency and interrater reliability of a clinical evaluation exercise (CEX) format that was designed to be easily utilized, but sufficiently detailed, to achieve uniform recording of the observed examination.

Design:A comparison of 128 CEXs conducted for 32 internal medicine interns by full-time faculty. This paper reports alpha coefficients as measures of internal consistency and several measures of inter-rater reliability.

Setting:A university internal medicine program. Observations were conducted at the end of the internship year.

Participants:Participants were 32 interns and observers were 12 full-time faculty in the department of medicine. The entire intern group was chosen in order to optimize the spectrum of abilities represented. Patients used for the study were recruited by the chief resident from the inpatient medical service based on their ability and willingness to participate.

Intervention:Each intern was observed twice and there were two examiners during each CEX. The examiners were given a standardized preparation and used a format developed over five years of previous pilot studies.

Measurements and main results:The format appeared to have excellent internal consistency; alpha coefficients ranged from 0.79 to 0.99. However, multiple methods of determining inter-rater reliability yielded similar results; intraclass correlations ranged from 0.23 to 0.50 and generalizability coefficients from a low of 0.00 for the overall rating of the CEX to a high of 0.61 for the physical examination section. Transforming scores to eliminate rater effects and dichotomizing results into pass-fail did not appear to enhance the reliability results.

Conclusions:Although the CEX is a valuable didactic tool, its psychometric properties preclude reliable assessment of clinical skills as a one-time observation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Expectations, observations, and the cognitive processes that bind them: expert assessment of examinee performance

Article 30 November 2015

Christina St-Onge, Martine Chamberland, … Lara Varpio

Validity and reproducibility of a tool for assessing clinical competencies in physical therapy students

Article Open access 23 November 2018

Martha-Rocío Torres-Narváez, Olga-Cecilia Vargas-Pinilla & Eliana-Isabel Rodríguez-Grande

Can physician examiners overcome their first impression when examinee performance changes?

Article 20 March 2018

Timothy J. Wood, Debra Pugh, … Susan Humphrey-Murto

References

Blank LL, Grosso LJ, Benson JA Jr. A survey of clinical skills evaluation practices in internal medicine residency programs. J Med Educ. 1984;59:401–6.
PubMed CAS Google Scholar
Petersdorf RG, Beck JC. The new procedure for evaluating the clinical competence of candidates to be certified by the American Board of Internal Medicine. Ann Intern Med. 1972;76:491–6.
PubMed CAS Google Scholar
Woolliscroft JO, Stross JK, Silva J Jr. Clinical competence certification: a critical appraisal. J Med Educ. 1984;59:799–805.
PubMed CAS Google Scholar
Herbers JE Jr, Noel GL, Cooper GS, Harvey J, Pangaro LN, Weaver MJ. How accurate are faculty evaluations of clinical competence? J Gen Intern Med. 1989;4:202–8.
PubMed Google Scholar
Kroboth FJ, Kapoor W, Brown FH, Karpf M, Levey GS. A comparative trial of the clinical evaluation exercise. Arch Intern Med. 1985;145:1121–3.
Article PubMed CAS Google Scholar
Lipkin M. The medical interview and related skills. In: Branch W. Office practice of medicine, 2nd ed. Philadelphia: W. B. Saunders, 1987;1287–306.
Google Scholar
Brennan RL, Kane MT. Generalizability theory: a review. In: Traub RE (ed.). New directions for testing and measurement (no. 4): methodological developments. San Francisco: Jossey-Bass, 1979;33–51.
Google Scholar
Shrout PE, Fleiss JL. Intraclass correlations in assessing rater reliability. Psychol Bull. 1979;86:420–8.
Article PubMed CAS Google Scholar
Cohen J. Weighted kappa: nominal scale agreement with provisions for scaled disagreement or partial credit. Psychol Bull. 1968;70:213–20.
Article Google Scholar
Cohen J. A co-efficient of agreement for nominal scales. Educ Psychol Measurement. 1960;20:37–46.
Article Google Scholar
Hinz CF. Direct observation as a means of teaching and evaluating clinical skills. J Med Educ. 1966;41:150–61.
PubMed Google Scholar
Landy FJ, Farr JL. Performance rating. Psychol Bull. 1980;1:72–107.
Article Google Scholar
Thompson WG, Lipkin M Jr, Gilbert DA, Guzzo RA, Roberson L. Evaluating evaluation: assessment of the American Board of Internal Medicine resident evaluation form. J Gen Intern Med. 1990;5:214–7.
PubMed CAS Google Scholar
Benson JA Jr, Blank LL, Norcini JJ Jr. Examining the ABIM’s evaluation form [letter]. J Gen Intern Med. 1990;5:535–6.
PubMed Google Scholar
Barrows HS, Abrahamson S. The programmed patient: a technique for appraising student performance in clinical neurology. J Med Educ. 1964;39:802–5.
PubMed CAS Google Scholar
Owen A, Winkler R. General practitioners and psychosocial problems: an evaluation using pseudopatients. Med J Aust. 1974;2:393–98.
PubMed CAS Google Scholar
Godkins TR, Duffy D, Greenwood J, Stanhope WD. Utilization of simulated patients to teach the ‘routine’ pelvic examination. J Med Educ. 1974;49:1174–8.
PubMed CAS Google Scholar
Anderson KK, Meyer TC. The use of instructor-patients to teach physical examination techniques. J Med Educ. 1978;53:831–6.
PubMed CAS Google Scholar
Elliot DL, Hickman DH. Evaluation of physical examination skills: reliability of faculty observers and patient instructors. JAMA. 1987;258:3405–8.
Article PubMed CAS Google Scholar
Stillman PL, Swanson PD, Snee S, et al. Assessing clinical skills of residents with standardized patients. Ann Intern Med. 1986;105:762–71.
PubMed CAS Google Scholar
Stillman P, Swanson D, Regan MB, et al. Assessment of clinical skills of residents utilizing standardized patients. Ann Intern Med. 1991;114:393–401.
PubMed CAS Google Scholar
Harden R, Stevenson M, Downie WW, Wilson GM. Assessment of clinical competence using objective structured clinical examination. Br Med J. 1975;1:447–51.
PubMed CAS Google Scholar
Robb KV, Rothman AI. The assessment of clinical skills in general medical residents—comparison of the objective structured clinical examination to a conventional oral examination. Ann R Coll Phys Surg Can. 1985;18:235–8.
Google Scholar
Petrusa ER, Blackwell TA, Rogers LP, et al. An objective measure of clinical performance. Am J Med. 1987;83:34–41.
Article PubMed CAS Google Scholar
Petrusa ER, Blackwell TA, Ainsworth MA. Reliability and validity of an objective structured clinical examination for assessing the clinical performance of residents. Arch Intern Med. 1990;150:573–7.
Article PubMed CAS Google Scholar
Newble DI, Swanson DB. Psychometric characteristics of the objective structured clinical examination. Med Educ. 1988;22:325–34.
Article PubMed CAS Google Scholar
Weiner BJ. Standardized principles and experimental design. New York: McGraw-Hill, 1971.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Pittsburgh, Internal Medicine, 190 Lothrop Street, Room 166, 15261, Pittsburgh, PA
Frank J. Kroboth MD

Authors

Frank J. Kroboth MD
View author publications
You can also search for this author in PubMed Google Scholar
Barbara H. Hanusa PhD, MS Hyg
View author publications
You can also search for this author in PubMed Google Scholar
Susan Parker MS
View author publications
You can also search for this author in PubMed Google Scholar
John L. Coulehan MD, MPH
View author publications
You can also search for this author in PubMed Google Scholar
Wishwa N. Kapoor MD, MPH
View author publications
You can also search for this author in PubMed Google Scholar
Frank H. Brown MD
View author publications
You can also search for this author in PubMed Google Scholar
Michael Karpf MD
View author publications
You can also search for this author in PubMed Google Scholar
Gerald S. Levey MD
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Received from the Division of General Internal Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.

Supported by a grant from the American Board of Internal Medicine.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kroboth, F.J., Hanusa, B.H., Parker, S. et al. The inter-rater reliability and internal consistency of a clinical evaluation exercise. J Gen Intern Med 7, 174–179 (1992). https://doi.org/10.1007/BF02598008

Download citation

Issue Date: March 1992
DOI: https://doi.org/10.1007/BF02598008

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The inter-rater reliability and internal consistency of a clinical evaluation exercise

Abstract

Access this article

Similar content being viewed by others

Expectations, observations, and the cognitive processes that bind them: expert assessment of examinee performance

Validity and reproducibility of a tool for assessing clinical competencies in physical therapy students

Can physician examiners overcome their first impression when examinee performance changes?

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

The inter-rater reliability and internal consistency of a clinical evaluation exercise

Abstract

Access this article

Similar content being viewed by others

Expectations, observations, and the cognitive processes that bind them: expert assessment of examinee performance

Validity and reproducibility of a tool for assessing clinical competencies in physical therapy students

Can physician examiners overcome their first impression when examinee performance changes?

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation