24-hour urine collection flaws expose clinical bias

There are few rituals in medicine as aspirational as the 24-hour urine collection. On paper we see an elegant examination. A direct measurement of renal function, grounded in physiology and liberated from the assumptions that plague our estimating equations. In practice, however, what we see is a plastic jug and a set of instructions that compete poorly with daily life. What we are left with is a quiet understanding, shared by clinicians and patients alike, that perfection is unlikely.

When the results return we perform our doctorly duty and mentally process the creatinine value. If it is “about right,” we proceed. If it is too high, we conclude the urine was overcollected. If it is too low, we conclude it was undercollected. In other words, we do not ask what the urine creatinine is. We ask whether it conforms to what we already believed it should be.

That belief is not arbitrary. Before the patient ever receives the collection jug, we have already internally estimated their creatinine production based on age, sex, ideal body weight, and a brief but surprisingly confident visual assessment of muscle mass. This estimate is so ingrained that when the measured value falls outside the expected range, the measurement is what we discard. Not the expectation.

The fallacy of precision in urine collection

At that point, it is worth pausing to ask what work the test is actually doing if the results are so readily dismissed. The patient has carried out a cumbersome task that involves the often absurd request to refrigerate an ever-increasing amount of urine in between fresh produce and the two percent milk. This often interrupts sleep, work, and normal routines while the ordering clinician sleeps soundly. We have generated a number with the appearance of precision. And then, using our prior assumptions as the reference standard, we decide whether the number is believable. When it is not, we attribute the discrepancy not to biology but to human behavior, which we anticipated all along.

The fallacy of circularity is difficult to ignore here. The problem becomes more apparent when the test is compressed. Shortened urine collections are sometimes used in the hope of improving compliance. But shortening the collection period does not eliminate variability; it magnifies it. Small deviations matter more. Timing errors loom larger. The resulting value retains its numerical authority while becoming even less interpretable.

The illusion of steady-state renal function

And then comes the matter of serum creatinine stability. Creatinine clearance calculations implicitly assume a steady state. When the serum creatinine is changing, the denominator of the equation is moving beneath our feet. We know this. We teach this. Yet the calculation is often performed anyway with the results discussed in a manner that suggests firmer ground than actually exists.

At some point, the question shifts from whether the test is flawed, as most tests tend to be, to whether we are being honest about what we are using it for. If we already know the range of creatinine excretion we consider plausible, and if values outside that range are reflexively attributed to collection error, then the test is not discovering new information. It is confirming our expectations, occasionally reassuring us but rarely changing management in any meaningful way.

Measuring behavior instead of biology

This is not unique to nephrology, nor even to laboratory medicine. The 24-hour urine collection is simply an unusually transparent example of a broader phenomenon in clinical practice and medical education where we build elaborate measurement systems, then quietly revert to pretest judgment when the measurements make us uncomfortable.

We say we are measuring renal function, but truthfully we are measuring compliance, understanding, attention, and circumstance while pretending we can disentangle them cleanly. When we cannot, we default to what we already believed.

The test persists not because it is always useful, but because it looks serious. It signals diligence. It reassures us that we have done something thorough, even when the additional information gained is often marginal.

A lesson in medical objectivity and expectations

None of this is an argument for abandoning the 24-hour urine collection entirely. There are situations where it adds value, particularly when the assumptions of estimating equations are clearly violated. Or perhaps when we are measuring something else entirely but need the urine creatinine to validate those assessments. In the case of kidney stone analysis, for example, in order to believe the oxalate excretion we must first have faith in the collection itself. Without that faith our house crumbles on a shaky foundation. But those situations are narrower than our habits suggest, and they require an honesty about uncertainty that the test itself does not supply.

Perhaps the most instructive lesson of the 24-hour urine collection is not about renal physiology at all. It is about how willingly we confuse a seemingly hard numerical data point with epistemic progress. We ask patients to collect every drop, knowing they will not. We calculate clearance, knowing the assumptions are strained. We inspect the result, knowing exactly which values we are prepared to believe. And then we call the process objective.

The problem is not that humans fail to behave like ideal study subjects. The problem is that we design systems that depend on them doing so and then act surprised when they do not. In medicine, we are rightly skeptical of patient recall. We should be at least as skeptical of our own measurements when they ask humans to behave like equations.

Ali Kashkouli is a nephrologist.