Abstract
Objective To evaluate a new examination process for international medical graduates (IMGs) to ensure that it is able to reliably assign candidates to 1 of 4 competency levels, and to determine if a global rating scale can accurately stratify examinees into 4 levels of learners: clerks, first-year residents, second-year residents, or practice ready.
Design Validation study evaluating a 12-station objective structured clinical examination.
Setting Ontario.
Participants A total of 846 IMGs, and an additional 63 randomly selected volunteers from 2 groups: third-year clinical clerks (n = 42) and first-year family medicine residents (n = 21).
Main outcome measures The accuracy of the stratification of the examinees into learner levels, the impact of the patient-encounter ratings and postencounter oral questions, and between-group differences in total score.
Results Reliability of the patient-encounter scores, postencounter oral question scores, and the total between-group difference scores was 0.93, 0.88, and 0.76, respectively. Third-year clerks scored the lowest, followed by the IMGs. First-year residents scored highest for all 3 scores. Analysis of variance demonstrated significant between-group differences for all 3 scores (P < .05). Postencounter oral question scores differentiated among all 3 groups.
Conclusion Clinical examination scores were capable of differentiating among the 3 groups. As a group, the IMGs seemed to be less competent than the first-year family medicine residents and more competent than the third-year clerks. The scores generated by the postencounter oral questions were the most effective in differentiating between the 2 training levels and among the 3 groups of test takers.
The Canadian Task Force on Licensure of International Medical Graduates (IMGs) was established in 2002 with a mandate to aid in integrating qualified IMGs into the Canadian work force.1 The task force involved key stakeholders from the medical community—including medical students and residents; federal, provincial, and territorial government representatives; Human Resources and Skills Development Canada; Citizenship and Immigration Canada; and IMGs—in the development of the recommendations. International medical graduates were defined as individuals holding medical degrees from schools not accredited by the Committee on Accreditation of Canadian Medical Schools or the Liaison Committee on Medical Education. A number of challenges were identified for IMGs entering the work force, one of which was that they were often unable to demonstrate their skills owing to work force policies, limited access to assessment or training, and lack of support for understanding the licensure requirements.1
In September 2003 the task force presented 6 recommendations to the federal Advisory Committee on Health Delivery and Human Resources. One recommendation was to increase the capacity to assess and prepare IMGs for licensure. A report from the task force on IMG assessment and licensure identified characteristics of medical licensure policies and practices that were effective, fair, and cost-effective.1 The following characteristics were included:
-
having the capacity to reliably differentiate between competent and incompetent;
-
being based upon a reasonable standard that is applied evenly to all;
-
having sufficient capacity to evaluate all who wish to be assessed; and
-
being efficient and cost-effective, and decreasing time delays or unjustified costs so that these are not barriers.
Selection of IMGs into residency or practice has been a matter of interest for some time. While estimates vary, IMGs comprise up to one-third of the Canadian2–5 and American6,7 physician work force. Before moving to Canada, most IMGs had been in independent practice, 42.8% had practised for 1 to 5 years, and 45.6% had practised for 6 to 20 years.4
Testing of IMGs’ clinical skills began in the United States in July 1998, in response to concerns that IMGs might be lacking basic clinical skills.3,8
As regulation is provincially mandated in Canada, each province has its own process for assessing and integrating IMGs. British Columbia9 and Alberta10 assess IMGs for entry into residency programs, while Manitoba,11 Nova Scotia,12 and Quebec13 assess IMGs for entry to practice (Table 1). Ontario has a long history of programs designed to assess, educate, and integrate IMGs into medical practice. These include the Ontario Pre-internship Program (PIP) (1986–1999), the Ontario International Medical Graduate Program (1999–2003), the Assessment Program for International Medical Graduates (2003–2004), and International Medical Graduates Ontario (IMGO) (2003–2007).
Starting in 1986, PIP introduced a vigorous screening process that included both a written examination and an objective structured clinical examination (OSCE). This program provided 24 positions at the clerkship level to IMGs. Candidates then applied for residency positions in the same way as Canadian medical graduates (CMGs). This was followed by the Ontario International Medical Graduate Program, which continued with the PIP program processes while increasing the number of clerkship spots from 24 to 50.
The Assessment Program for International Medical Graduates was the beginning of practice-ready assessment (PRA) positions. This evolved with IMGO into assessments for entry at the second-year-resident level. The Clinical Examination 1 (CE1) was developed by IMGO and launched in 2003. The validation study reported here occurred under IMGO in 2005. During IMGO, the CE1 was administered at multiple sites across Ontario.
The IMGO was an organization formed under the leadership of several partners: the Council of Ontario Faculties of Medicine, the College of Physicians and Surgeons of Ontario, and the Ontario Ministry of Health and Long-Term Care. The IMGO was a provincial program developed in consultation with the Royal College of Physicians and Surgeons of Canada, the College of Family Physicians of Canada, and the Medical Council of Canada (MCC). The IMGO provided access to professional practice in Ontario for IMGs who met the Ontario regulatory requirements.
The IMGO screening process was straightforward. Candidates who met basic eligibility requirements were invited to participate in a comprehensive written examination and a comprehensive OSCE. Basic eligibility requirements included the following: being a graduate of a medical school that was, at the time of the applicant’s graduation, listed in the World Directory of Medical Schools published by the World Health Organization or the Foundation for Advancement of International Medical Education and Research; submission of transcripts and certificates; language proficiency in English or French; successful completion of the MCC Evaluating Examination (MCCEE) and the MCC Qualifying Examination 1 (MCCQE1); submission of a curriculum vitae; and payment of registration fees. Based on an evaluation of the candidates’ academic and professional histories and the results of these assessments (specifically the results of the written examination and the OSCE), candidates were classified as being in 1 of 4 competency levels: ready for entry to training at the clerkship level (pre-residency); ready for first-year residency; practice ready (ready for practice pending 6 months in practice assessment); or as being not competent or unacceptable to the IMGO program.
The OSCE provided important input to the IMGO selection process. There is an extensive body of literature attesting to the validity of OSCE-type examinations14–16 used at different levels of clinical training in medicine17 and in the assessment of IMGs.18 The use of different ratings in OSCE-type examinations has also been described.19–22 The unique feature of the IMGO comprehensive CE1 was the requirement that its results assign candidates to 4 different levels of competence (unsatisfactory, ready for clerkship, ready for residency, and practice ready). The focus of the examination is on the CanMEDS medical expert and communicator roles. Candidates who are more competent would be assessed as being ready for first-year residency or as practice ready; those less competent would be classified as unsatisfactory or as being at the clerkship level.
The next iteration of IMG assessments was the launch of the Centre for the Evaluation of Health Professionals Educated Abroad (CEHPEA) in 2007, which continues to the present day. The CEHPEA has a dedicated examination facility for administering the examinations, and the CE1 has been offered at this single site with up to 8 administrations in an application cycle. The CEHPEA administered the CE1 through 2010, as well as continued with advanced-level assessments for second-year resident or PRA positions in up to 11 specialties. Most IMG candidates taking the CE1 aspired to enter first-year family medicine residency programs. Following completion of the CE1, candidates would apply through the Canadian Residency Matching Service for residency positions. While the process is the same as that for CMGs, IMGs candidates are not competing against CMGs for positions but are applying for specific IMG positions funded by the Ministry of Health. By 2010 there were 75 family medicine positions allocated for IMGs.
Objective and hypotheses
One source of evidence of the validity of the results generated by this examination would be the demonstration that Canadian trainees are appropriately differentiated and classified by the examination. The study reported here examined the ability of the CE1 results to validly differentiate between 2 levels of competence—clerkship and first-year residency—using a common rating form. To this end, samples of third-year clerks and first-year family medicine residents from the University of Toronto in Ontario were included in the October 2005 test administration, and their results were compared with the results of the IMG candidates.
There were 3 hypotheses for this study:
-
that the residents would demonstrate significantly higher results (at least 0.5 SD) than the third-year clerks, with only modest overlap between the 2 score distributions;
-
that in the results of those components of the examination that related specifically to resident-level skills and knowledge, the differences between the results of the 2 groups would be considerable (ie, at least 0.5 SD); and
-
that a global rating form would function effectively to discriminate among different levels of performance.
Our use of a 0.5 SD seemed reasonable given the distributions involved. It appeared that a difference larger than this would be “clinically” significant (ie, would have clinical meaning).
METHODS
Participants
The study participants were randomly selected volunteers from 2 groups of students at the University of Toronto. Group 1 consisted of third-year clinical clerks and group 2 consisted of first-year family medicine residents. The recruitment notice specified that only CMGs were eligible to volunteer.
Students at these levels were selected for the study because the primary entry point for IMGs was first-year residency; hence the intent of the CE1 is to differentiate between those who are ready to enter residency (first-year residents) with those who are not (clerks who have not yet finished medical school).
The examination was administered on October 1, 2005, to 846 IMG candidates in 5 Ontario cities simultaneously: Ottawa, Kingston, Toronto, Hamilton, and London. The largest administration site was Toronto.
Fifty third-year medical students and 50 first-year family medicine residents from the 2005 to 2006 cohorts at the University of Toronto were selected randomly and invited to participate in the study. Sixty-three students accepted the invitation and participated in the clinical examination: 42 were third-year clinical clerks and 21 were first-year family medicine residents. Students were paid an honorarium for their participation. The study participants, all from the University of Toronto, attended the examination at the Toronto site. The physician examiners and IMGs were blind to whether participants were IMGs, clerks, or residents.
The third-year clinical clerks were students at the beginning of their formal hospital-based clinical clerkship. Their performance in the CE1 would define the minimum level of acceptable performance for the IMGO candidates.
Ethics approval for this project was received from the University of Toronto Ethics Review Office. Approval for this project was also provided by the appropriate program directors and associate deans of medical education. Participants were informed that individual results would not be disclosed to the Toronto Program Director or Associate Dean of Postgraduate Medical Education. Participants were assigned once-only identification numbers and these numbers were attached to examination materials and results. Participants received confidential individual performance reports.
Overview of the examination design
General features
-
Examination evaluates primary care medical knowledge, is “comprehensive,” and is generalist in nature (ie, not specialist examination)
-
Examination evaluates postgraduate and practitioner medical knowledge (ie, beyond medical clerk)
Clinical domains evaluated
-
Medicine: 2 stations
-
Pediatrics: 2 stations
-
Obstetrics and gynecology: 2 stations
-
Surgery: 2 stations
-
Psychiatry: 2 stations
-
Prevention: 1 station
-
Other: 1 station
Competencies evaluated
-
Communication: All cases
-
History: 10 cases
-
Physical: 4 cases
-
Investigation: 9 cases
-
Judgment and decision making: All cases
-
Management: 9 cases
-
Health promotion: 5 cases
Examination
The IMGO CE1 was a 12-station OSCE with standardized patients portraying the cases and experienced physician educators rating the performance of the candidates. These examiners were all faculty at 1 of the 5 Ontario medical schools. An overview of the examination design is found in Box 1.
Each station lasted 10 minutes, divided into 8 minutes for the encounter with a standardized patient and 2 minutes for postencounter oral questions posed by the physician examiner. As a rule, most of the postencounter questions addressed resident- rather than clerkship-level competencies (eg, investigations and patient management).
The objectivity of the clinical examination was achieved by using standardized guidelines for the administration of the examination, training of physician examiners and standardized patients, and common rating forms rather than station-specific checklists. Each site received all documents from a central location. This included specific instructions and material for orientation of candidates and examiners and all administrative materials (eg, candidate labels and booklets, station material, test sheets, and training materials for the standardized patients).
All examiners at the study site (Toronto) were faculty in the University of Toronto Faculty of Medicine. Orientation for examiners was conducted the morning of the examination and included a discussion of the type of ratings being used. Examiners rated the candidates’ performances in up to 11 domains and also provided an overall rating (Table 2). Possible ratings included 4 different levels of competence (unsatisfactory, clerkship, first-year resident or higher, and practice ready). The form also included the opportunity to indicate if the candidate demonstrated any unprofessional behaviour and to identify specific weaknesses.
The orientation included a discussion of the expected performance for each level of competence. Examiner notes were provided for each station, which identified the specific expectations for performance (eg, history questions that should be asked, physical maneuvers that should be conducted, or specific management or counseling that was appropriate for that station).
Mean patient-encounter, postencounter oral question, and total scores were calculated for each station and for the total test. In each case, scores could range from a minimum of 0 to a maximum of 3.
Between-group (clerks, first-year residents, and IMGs) differences in scores (total, patient-encounter, and postencounter oral questions) were investigated using ANOVA and Scheffé post hoc comparisons (α = .05).
RESULTS
A total of 909 candidates took the CE1, with more than 90% being IMGs (63 CMGs, 846 IMGs) (Table 3). Reliability of the total, patient-encounter, and postencounter oral question scores was 0.76, 0.93, and 0.88, respectively.
The total and component scores (ie, 8-minute patient encounter and 2-minute postencounter oral questions) and SDs are presented in Table 4. The third-year clerks scored the lowest, followed by the IMGs; the first-year residents had the highest scores for both component and total scores. Standard deviations were highest for the IMGs and lowest for the first-year residents.
The ANOVA demonstrated significant between-group differences, with the total and the 2 component scores all at the P < .05 level (Table 5).
Scheffé post hoc comparisons with each score indicated that the total and patient-encounter results do not differentiate between the clerks and the IMG candidates, while the postencounter question results differentiate among all 3 groups (Table 6).
DISCUSSION
There was interest in the ability of the global rating scale to accurately and reliably stratify the examinees into the benchmarked levels of postgraduate learners (ie, clerks, first-year residents, second-year residents, or practice ready). Additionally, there was interest in the relative contribution of the 2 parts of the assessment (ie, the observed and rated 8-minute patient encounter and the 2-minute standardized oral questions posed by examiners specifically exploring clinical decision making and intervention planning) in the accurate stratification of examinees into residency learner levels. The accurate assignment of candidates is what makes appropriate placement into residency programs possible.
Previous work has demonstrated that IMGs were deficient in clinical skills,8 and that they pass certification examinations at a lower rate than CMGs,23–25 so a clearer estimation of the level of the candidate’s ability, as compared with the Canadian levels of training, was desirable.
The notion of stratification into multiple levels was novel and had not been used in high-stakes examinations before; generally, checklists were used26 and there was a single bar of competent or “good enough” (ie, pass or fail). Checklists might miss important aspects of clinical care, which cannot be described in discrete checklist items,27 while global ratings have been shown to provide reliable evaluation of candidates.20,21
Additionally, checklists do not necessarily coincide with evidence-based items identified in the literature,28 and some candidates might complete many checklist items while still missing key tasks,29 decreasing the validity of checklists for assessment of clinical skills. In particular, global ratings have been shown to be more appropriate for experienced clinicians compared with students or residents30 because the experienced clinicians might not require completion of all the detailed tasks of the checklist to arrive at the correct diagnosis and treatment plan. This is an important point given that most of the IMGs who are taking the CE1 have had many years of practice in their home countries. This is an important validation study that demonstrated that it is possible to classify candidates into multiple levels of competence using a common examination instrument.
There is a considerable benefit to this type of rating form, in that the developmental effort put into creating and approving checklists is substantially decreased. This makes it easier to develop stations and increase the size of item banks. A common rating form also makes it possible to use the same stations with different levels of learners. Each level will demonstrate increasing levels of competence through including more of the key components within their patient interactions and responses to the examiner questions.
To use this rating scale, the examiners must have a strong mental model of the performance expectations for each of the 4 rating levels. For this reason, it is necessary for the examiners to be faculty members who are involved in teaching the learners at the various levels of education. The standard being set in the examination is thus equivalent to the standard set throughout the medical education process, again increasing the validity of the examination results.
For high-stakes examinations, a reliability of 0.8 or higher is desirable. The reliability of the 2 component scores (ie, patient encounter and postencounter oral questions) were well above this level and the total score reliability was just below. The assessment of skills with the patient and the use of knowledge related to the patient (as assessed by the examiner questions) were both high. While these component scores are not used in any decision making, the high reliability lends confidence to the validity and interpretation of the OSCE results.
The results of the ANOVA and Scheffé analyses suggest that the clinical examination scores were capable of differentiating among third-year clerks, first-year family medicine residents, and the IMG candidates. Before this examination, candidates were assessed as passing or failing at a single level. What is new here is that candidates were classified across multiple levels, and that the decisions made based on these classifications were appropriate ones. In addition, the results suggest that the group of IMGs are less competent than first-year Canadian family medicine residents and more competent than the clerks. These differences legitimize the continued use of the evaluation, and reinforce that the postencounter probe is more discriminating, particularly at higher levels of competence, as per our original hypothesis. There is no information currently available that would indicate a specific difference that is clinically or educationally large enough to confirm that the groups are different.
The added value of the postencounter questions in differentiating among the appropriate rating levels is important. These questions add to the validity of the stations and the examination through exploring knowledge and clinical problem solving that cannot be assessed during the patient interaction. In fact, the results demonstrate that the scores generated by the postencounter oral questions were the most effective in differentiating between the 2 training levels and among the 3 groups of test takers. Candidates at the higher levels are able to provide a more appropriate list of differential diagnoses and are more likely to identify the correct diagnosis. These candidates also provided more focused and appropriate investigations and management plans. This ability to synthesize the information obtained from the standardized patient encounter and develop an appropriate plan is observed by the examiner. As the postencounter questions inform the examiners’ decisions and are effective in differentiating among candidates, they also act to improve the reliability of the examination by assisting the examiner in selecting the more appropriate rating.
As the CMGs were trained at a single university and have a consistent and standard training program, the variation among these candidates would be expected to be low. Clerks are still early in their training and there is a broader range to their abilities at this point, resulting in a larger SD than for the first-year residents. Residents were approximately 4 months into their residency at the point of taking the CE1. There is less variability among this group than among the clerks, resulting in a lower SD. On the other hand, the IMGs have been trained in programs around the world with different curricula, different methods of teaching, different methods of assessment, and different criteria for evaluation. The IMGs are a very heterogeneous group, resulting in larger SDs.
The CE1 used a standardized global rating form for all stations. One benefit of this approach is that it simplifies item development. The standard domains are used for all stations, and item writers develop guidelines regarding expectations for that station. This eliminates the need to create a detailed checklist and the ensuing discussions regarding weighting of specific items. Checklists evaluate a limited range of competencies and tend to benefit candidates who are thorough, rather than those who approach patients from a broader perspective.21,27 Checklists might be appropriate for lower-level students, where the focus is on learning the details of procedures; however, for more advanced practitioners global ratings allow for an improved assessment of current clinical skills.
Limitations
As the study participants were all from one university and were assessed at that same site, it is possible that some examiners could have recognized the third-year students or first-year residents. This could have biased the examiners and affected the scoring for those individuals. Given the number of examiners (n = 126) and the number of students, the likelihood of this happening was low and therefore the likelihood of it having an effect on the overall results is also very low.
The examination had high stakes for the IMGs, because doing well on the examination was important to their application for a first-year residency position. The study participants were volunteers and were paid for their participation. They would not likely have put the same effort into preparing for the examination as the IMGs did. This could have affected the scores for the clerks and first-year residents, likely resulting in lower scores than if they had had a stronger interest in performing as well as possible on the examination. If this had been the case, the difference between the first-year residents and IMGs would have been accentuated and the difference between the clerks and IMGs would have been decreased.
Physician examiners were not told which candidates were IMGs, clerks, or residents. Despite this, older candidates and those for whom English was clearly a second (or third) language could have been identified by examiners as IMGs.
Finally, while the third-year clerks and first-year residents who participated were randomly invited to participate from their cohorts, it is possible that those who did volunteer were not representative of the cohort. Some might have been drawn to the study because they perceived an educational benefit from their participation (eg, having experience at an OSCE or being provided with feedback on their performance) and others might have volunteered simply to earn money. Weaker candidates might not have volunteered. These factors could have affected the score differences between the different groups. No analysis was conducted to compare these volunteers with the balance of their classes.
Conclusion
These results provide evidence of the validity of the clinical examination and the rating scale used to determine whether the level of competence of the IMG candidates was at the clerkship or first-year-resident level. It also suggests that incorporating postencounter questions that address clinical problem solving related to the patient just seen provides important information related to the candidate’s competence. The study strongly supports the use of a common rating scale for assessing across multiple levels of competence.
Owing to the large numbers and the logistical challenges of conducting a large-scale examination on a single day, plans are under way for the examination to be administered multiple times a year. Future work will investigate the equivalence of the examinations administered over a period of months. Additionally, analysis of various factors that could be predictors of scores on the examination (such as age, location of undergraduate degree, or years since graduation) could provide insight into the assessment of future IMGs.
Notes
EDITOR’S KEY POINTS
-
In 2003, the Canadian Task Force on Licensure of International Medical Graduates (IMGs) recommended that provinces increase the number of IMGs being assessed and that the assessment process be made able to reliably differentiate between various levels of competence. This study sought to evaluate a new examination process for IMGs.
-
The results of this study provide evidence of the validity of the clinical examination and the rating scale used to determine whether the level of competence of the IMG candidates was at the clerkship or first-year-resident level. The study suggests that incorporating postencounter questions that address clinical problem solving related to the patient just seen provides important information related to the candidate’s competence. The study also strongly supports the use of a common rating scale to assess multiple levels of competence.
POINTS DE REPÈRE DU RÉDACTEUR
-
En 2003, le Groupe de travail sur la diplomation des médecins diplômés à l’étranger (MDÉ) recommandait que les provinces augmentent le nombre de MDÉ évalués et que l’évaluation soit capable de différencier de façon fiable leurs divers niveaux de compétence. La présente étude visait à évaluer un nouveau type d’examen pour les MDÉ.
-
Les résultats de cette étude apportent des preuves de la validité de l’examen clinique et de l’échelle de scores utilisée pour déterminer si le niveau de compétence des MDÉ évalués correspond au niveau de stagiaire clinique ou de résident 1. Les résultats laissent entendre que le fait d’introduire des questions portant sur la solution du problème clinique présenté par le patient qui vient d’être rencontré fournit des informations importantes sur la compétence du candidat. L’étude favorise aussi fortement l’utilisation d’une échelle d’évaluation commune pour les multiples niveaux de compétence.
Footnotes
-
This article has been peer reviewed.
-
Cet article a fait l’objet d’une révision par des pairs.
-
Contributors
All authors contributed to concept and design of the study; data gathering, analysis, and interpretation; and preparing the manuscript for submission.
-
Competing interests
None declared
- Copyright© the College of Family Physicians of Canada