Skip to main content
Log in

The inter-rater reliability and internal consistency of a clinical evaluation exercise

  • Original Articles
  • Published:
Journal of General Internal Medicine Aims and scope Submit manuscript

Abstract

Objective:To assess the internal consistency and interrater reliability of a clinical evaluation exercise (CEX) format that was designed to be easily utilized, but sufficiently detailed, to achieve uniform recording of the observed examination.

Design:A comparison of 128 CEXs conducted for 32 internal medicine interns by full-time faculty. This paper reports alpha coefficients as measures of internal consistency and several measures of inter-rater reliability.

Setting:A university internal medicine program. Observations were conducted at the end of the internship year.

Participants:Participants were 32 interns and observers were 12 full-time faculty in the department of medicine. The entire intern group was chosen in order to optimize the spectrum of abilities represented. Patients used for the study were recruited by the chief resident from the inpatient medical service based on their ability and willingness to participate.

Intervention:Each intern was observed twice and there were two examiners during each CEX. The examiners were given a standardized preparation and used a format developed over five years of previous pilot studies.

Measurements and main results:The format appeared to have excellent internal consistency; alpha coefficients ranged from 0.79 to 0.99. However, multiple methods of determining inter-rater reliability yielded similar results; intraclass correlations ranged from 0.23 to 0.50 and generalizability coefficients from a low of 0.00 for the overall rating of the CEX to a high of 0.61 for the physical examination section. Transforming scores to eliminate rater effects and dichotomizing results into pass-fail did not appear to enhance the reliability results.

Conclusions:Although the CEX is a valuable didactic tool, its psychometric properties preclude reliable assessment of clinical skills as a one-time observation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Blank LL, Grosso LJ, Benson JA Jr. A survey of clinical skills evaluation practices in internal medicine residency programs. J Med Educ. 1984;59:401–6.

    PubMed  CAS  Google Scholar 

  2. Petersdorf RG, Beck JC. The new procedure for evaluating the clinical competence of candidates to be certified by the American Board of Internal Medicine. Ann Intern Med. 1972;76:491–6.

    PubMed  CAS  Google Scholar 

  3. Woolliscroft JO, Stross JK, Silva J Jr. Clinical competence certification: a critical appraisal. J Med Educ. 1984;59:799–805.

    PubMed  CAS  Google Scholar 

  4. Herbers JE Jr, Noel GL, Cooper GS, Harvey J, Pangaro LN, Weaver MJ. How accurate are faculty evaluations of clinical competence? J Gen Intern Med. 1989;4:202–8.

    PubMed  Google Scholar 

  5. Kroboth FJ, Kapoor W, Brown FH, Karpf M, Levey GS. A comparative trial of the clinical evaluation exercise. Arch Intern Med. 1985;145:1121–3.

    Article  PubMed  CAS  Google Scholar 

  6. Lipkin M. The medical interview and related skills. In: Branch W. Office practice of medicine, 2nd ed. Philadelphia: W. B. Saunders, 1987;1287–306.

    Google Scholar 

  7. Brennan RL, Kane MT. Generalizability theory: a review. In: Traub RE (ed.). New directions for testing and measurement (no. 4): methodological developments. San Francisco: Jossey-Bass, 1979;33–51.

    Google Scholar 

  8. Shrout PE, Fleiss JL. Intraclass correlations in assessing rater reliability. Psychol Bull. 1979;86:420–8.

    Article  PubMed  CAS  Google Scholar 

  9. Cohen J. Weighted kappa: nominal scale agreement with provisions for scaled disagreement or partial credit. Psychol Bull. 1968;70:213–20.

    Article  Google Scholar 

  10. Cohen J. A co-efficient of agreement for nominal scales. Educ Psychol Measurement. 1960;20:37–46.

    Article  Google Scholar 

  11. Hinz CF. Direct observation as a means of teaching and evaluating clinical skills. J Med Educ. 1966;41:150–61.

    PubMed  Google Scholar 

  12. Landy FJ, Farr JL. Performance rating. Psychol Bull. 1980;1:72–107.

    Article  Google Scholar 

  13. Thompson WG, Lipkin M Jr, Gilbert DA, Guzzo RA, Roberson L. Evaluating evaluation: assessment of the American Board of Internal Medicine resident evaluation form. J Gen Intern Med. 1990;5:214–7.

    PubMed  CAS  Google Scholar 

  14. Benson JA Jr, Blank LL, Norcini JJ Jr. Examining the ABIM’s evaluation form [letter]. J Gen Intern Med. 1990;5:535–6.

    PubMed  Google Scholar 

  15. Barrows HS, Abrahamson S. The programmed patient: a technique for appraising student performance in clinical neurology. J Med Educ. 1964;39:802–5.

    PubMed  CAS  Google Scholar 

  16. Owen A, Winkler R. General practitioners and psychosocial problems: an evaluation using pseudopatients. Med J Aust. 1974;2:393–98.

    PubMed  CAS  Google Scholar 

  17. Godkins TR, Duffy D, Greenwood J, Stanhope WD. Utilization of simulated patients to teach the ‘routine’ pelvic examination. J Med Educ. 1974;49:1174–8.

    PubMed  CAS  Google Scholar 

  18. Anderson KK, Meyer TC. The use of instructor-patients to teach physical examination techniques. J Med Educ. 1978;53:831–6.

    PubMed  CAS  Google Scholar 

  19. Elliot DL, Hickman DH. Evaluation of physical examination skills: reliability of faculty observers and patient instructors. JAMA. 1987;258:3405–8.

    Article  PubMed  CAS  Google Scholar 

  20. Stillman PL, Swanson PD, Snee S, et al. Assessing clinical skills of residents with standardized patients. Ann Intern Med. 1986;105:762–71.

    PubMed  CAS  Google Scholar 

  21. Stillman P, Swanson D, Regan MB, et al. Assessment of clinical skills of residents utilizing standardized patients. Ann Intern Med. 1991;114:393–401.

    PubMed  CAS  Google Scholar 

  22. Harden R, Stevenson M, Downie WW, Wilson GM. Assessment of clinical competence using objective structured clinical examination. Br Med J. 1975;1:447–51.

    PubMed  CAS  Google Scholar 

  23. Robb KV, Rothman AI. The assessment of clinical skills in general medical residents—comparison of the objective structured clinical examination to a conventional oral examination. Ann R Coll Phys Surg Can. 1985;18:235–8.

    Google Scholar 

  24. Petrusa ER, Blackwell TA, Rogers LP, et al. An objective measure of clinical performance. Am J Med. 1987;83:34–41.

    Article  PubMed  CAS  Google Scholar 

  25. Petrusa ER, Blackwell TA, Ainsworth MA. Reliability and validity of an objective structured clinical examination for assessing the clinical performance of residents. Arch Intern Med. 1990;150:573–7.

    Article  PubMed  CAS  Google Scholar 

  26. Newble DI, Swanson DB. Psychometric characteristics of the objective structured clinical examination. Med Educ. 1988;22:325–34.

    Article  PubMed  CAS  Google Scholar 

  27. Weiner BJ. Standardized principles and experimental design. New York: McGraw-Hill, 1971.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Received from the Division of General Internal Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.

Supported by a grant from the American Board of Internal Medicine.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kroboth, F.J., Hanusa, B.H., Parker, S. et al. The inter-rater reliability and internal consistency of a clinical evaluation exercise. J Gen Intern Med 7, 174–179 (1992). https://doi.org/10.1007/BF02598008

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02598008

Key words

Navigation