Why tiered clerkship grading fails medical students today

Medical students are often told that third-year clerkships are their place to shine because they will be evaluated as future physicians. Unfortunately, the evaluation systems themselves frequently undermine that goal. During an internal medicine rotation, I was told that my performance was “excellent,” to merit top marks, but awarding the highest grade would be inappropriate because it was my first rotation. In another clerkship, evaluations seemed driven less by clinical performance and more by residents’ subjective impressions. These experiences are not unusual; rather, they reflect a broader structural problem: Tiered grading systems in clinical clerkships are subjective, inconsistent, and increasingly misaligned with their intended purpose.

Clinical clerkships are typically graded using ordinal categories of honors, high pass, and pass, intended to differentiate student performance. In theory, these distinctions signal meaningful differences in clinical ability. In practice, they often represent what might be termed as “false precision.” The distinction between honors and high pass implies a level of measurement that brief, subjective, and context-dependent clinical evaluations cannot reliably support. A student’s final grade may depend on limited interactions, variable expectations, and the idiosyncrasies of individual evaluators. Across institutions and specialties, the proportion of students awarded top grades varies widely, suggesting that grading reflects local culture as much as individual merit.

These limitations are compounded by the inherent variability of clinical environments. Clerkships differ substantially in structure, expectations, and teaching quality. Some rotations emphasize mentorship and feedback, while others are service-oriented. Rotation lengths are often short, allowing a single evaluator to disproportionately influence outcomes. Students are frequently assessed against peers who have already completed multiple clerkships, introducing what might be considered a “first rotation penalty” that needs to be acknowledged or adjusted for. Under these conditions, grades risk capturing context and timing rather than competence.

Tiered grading may also incentivize forms of performance that are loosely correlated with clinical ability. Students on the wards quickly learn to align with team expectations, prioritizing confidence and fluency, traits that are rewarded in brief evaluations. While these interpersonal skills are important, their prominence in grading may encourage impression management over intellectual honesty, curiosity, or growth. The increasing availability of artificial intelligence tools capable of generating polished clinical explanations further complicates assessment. Fluency of presentation may be mistaken for depth of understanding, making it more difficult for evaluators to distinguish genuine reasoning from well-rehearsed or externally supported performance. In this context, grades risk reflecting how a student performs rather than competence itself.

The inconsistency of the evaluation process

Clerkship grading is further limited by the inconsistency of the evaluation process itself. Assessments often depend on busy residents and faculty completing forms, frequently retrospectively and with variable engagement. Some evaluators provide detailed feedback while others submit minimal or delayed assessments, leaving final grades to be constructed from incomplete or uneven data. As a result, outcomes may hinge less on sustained performance and more on which evaluators happened to complete assessments and how well they recall a student’s contributions. In addition, reliance on residents for evaluation introduces interpersonal subjectivity; student performance may be judged by whether a resident “likes” them rather than clinical competence.

The structure of clerkship evaluation can also shape student behavior in unintended ways. When grades depend heavily on team impressions, students may feel pressure to maximize visibility rather than learning by taking on excessive patient loads, forgoing breaks, and prioritizing immediate clinical tasks over study or reflection. Such efforts, while demonstrating commitment, do not reliably translate into stronger evaluations and may come at the expense of preparation for standardized assessments such as shelf exams. Students may emerge overworked, yet appear average or underprepared on paper, reinforcing the sense that clerkship grading does not capture meaningful competence.

The impact on residency selection

The transition of Step 1 to pass/fail has intensified reliance on clerkship grades as a primary differentiator in residency selection. In the absence of consistent metrics, other than clerkship scores and Step 2, programs may rely on looking at institutional prestige and mentorship networks, factors that can be unevenly distributed. Defenders of tiered grading argue that these granular distinctions are necessary to compare applicants. Yet this argument rests on the assumption that current distinctions are both valid and meaningful. If grades are heavily influenced by evaluator subjectivity, local grading cultures, and structural variability, then the appearance of differentiation may be misleading. A system that labels one student honors and another high pass suggests a level of precision that underlying assessments cannot justify.

A shift to pass/fail grading for core clinical clerkships offers a more transparent and equitable alternative if it is paired with more meaningful forms of assessment. Rather than attempting to rank students using unreliable categories, medical education should prioritize rich, longitudinal, competency-based evaluation. Structured narrative evaluations should replace generic comments. Evaluators could be guided to discuss domains of clinical reasoning, communication, teamwork, professionalism, and adaptability to improve comparability across settings. Workplace-based assessments with direct observation such as observed clinical encounters, case presentations, and procedural skills should play a larger role. These assessments can offer more direct evidence of performance rather than retrospective impressions.

Overall, students’ documented clinical encounters, reflective writing, feedback trends, and evidence of improvement over time should be noted to emphasize growth and consistency over isolated moments of performance. Greater emphasis on sub-internships, letters of recommendation, and performance on standardized examinations such as Step 2 can offer additional perspectives on readiness for residency. While none of these measures are perfect, together they provide a more comprehensive and contextually grounded assessment than tiered clerkship grades alone.

Re-examining tiered clerkship grading

The goal of clinical education is not to identify a narrow group of “honors” students, but to ensure that all graduates meet a standard of safe, effective, and compassionate care while recognizing meaningful distinctions through richer forms of evaluation. In an era of ongoing transformation in medical education, the persistence of tiered clerkship grading warrants a critical re-examination. A pass/fail system supported by structured and longitudinal assessment would better align evaluation with the realities of clinical training. Some medical schools (UCSF, UCLA, and the University of Minnesota) have already implemented the pass/fail grading for core clerkships, demonstrating that this approach is feasible. It is possible to maintain rigor and competency standards while reducing subjectivity, stress, and inequity.

Anika Pruthi is a medical student.