Dr. Marco V. Benavides Sánchez.
Medmultilingua.com /
In medicine, getting it wrong simply isn’t an option. A misdiagnosis can alter the entire trajectory of someone’s life. That’s why, when we discuss artificial intelligence (AI) in healthcare, it’s not enough for a system to be merely “accurate” in its predictions: it must also recognise when it might be mistaken. This capability is known as uncertainty quantification, and it’s emerging as one of the fundamental requirements for truly reliable medical AI.
Consider an experienced clinician. When faced with a challenging case, they don’t simply issue a diagnosis: they also communicate their level of certainty. They might say “I’m fairly confident this is influenza” or “we need further tests because the presentation isn’t entirely clear-cut”. This professional honesty is what safeguards the patient. Now, imagine if an algorithm could do the same: not just classify, but also flag when its answer is questionable.
A recent study published in Artificial Intelligence in Medicine tackles precisely this challenge in a particularly sensitive area: the analysis of electroencephalograms (EEG) to detect cognitive decline. The researchers developed AI systems capable of distinguishing between three mental states: normal cognitive function, mild cognitive impairment, and dementia. However, what proved novel wasn’t merely the diagnostic accuracy, but the systems’ ability to express when they weren’t certain.
The Problem of the Unexpected
An EEG is a test that records the brain’s electrical activity. It’s sensitive, varies considerably between individuals, and can be affected by numerous factors: from the patient’s age to ambient noise in the examination room. This presents an enormous challenge for any algorithm, because real-world data rarely match perfectly with those used during the model’s training phase.
This phenomenon is called dataset shift: when conditions change and the system encounters situations it didn’t see during its learning phase. What does the AI do then? Does it continue blindly trusting its answers? That’s precisely the danger.
Models That Doubt (and That’s Beneficial)
The researchers compared different strategies to enable algorithms to express their level of confidence. They tested everything from individual models to more sophisticated techniques such as Monte Carlo dropout and deep ensembles. These methods serve to train multiple models independently and combine their results, rather like several experts offering their opinion on the same case.
The systems underwent rigorous testing: data similar to the training set, completely different external databases, and simulations of degraded signals with noise, interference, or progressive alterations. The objective was to determine whether the model truly detected when it was “treading unfamiliar ground”.
The results proved highly revealing. The ensembles not only achieved better diagnostic performance, but also demonstrated calibrated uncertainty: when data quality diminished or signals became atypical, the system increased its level of doubt. In other words, its confidence aligned with reality.
Why This Changes Everything
In clinical practice, a system that recognises its own uncertainty can trigger safety mechanisms: request review by a specialist, recommend complementary studies, or simply avoid automatic decisions in ambiguous cases. It’s not about replacing the doctor, but creating a transparent collaboration where technology indicates when its judgement is sound and when it requires human support.
This philosophy marks a paradigm shift. For years, medical AI has been evaluated primarily by its accuracy: how many diagnoses it gets right out of every hundred. However, in a real hospital environment—dynamic and full of variability—what matters isn’t simply that the system responds, but that it responds with statistical honesty.
Towards Safer Digital Medicine
The lesson is clear: the best AI isn’t the one that always has an answer, but the one that knows when to stay silent and seek assistance. In a field where human life hangs in the balance, informed doubt isn’t weakness, but prudence. And this prudence, translated into algorithms capable of expressing uncertainty, could be the key to artificial intelligence progressing from a promising tool to a genuinely trustworthy ally in healthcare.
Because ultimately, in medicine as in life, acknowledging what we don’t know is the first step towards making better decisions.
Reference
Tveter, M., Tveitstøl, T., Hatlestad-Hall, C., Hammer, H. L., & Hebold Haraldsen, I. R. J. (2026). Uncertainty in deep learning for EEG under dataset shifts. Artificial Intelligence in Medicine, 103374. [https://doi.org/10.1016/j.artmed.2026.103374]
#ArtificialIntelligence #AIinMedicine #MedicalAI #DigitalHealth #Medmultilingua


Leave a Reply