Generative AI model detects blood cell abnormalities

The shape and structure of blood cells provide vital indicators for diagnosis and management of blood disease and disorders. Recognizing subtle differences in the appearance of cells under a microscope, however, requires the skills of experts with years of training, motivating researchers to investigate whether artificial intelligence (AI) could help automate this onerous task. A UK-led research team has now developed a generative AI-based model, known as CytoDiffusion, that characterizes blood cell morphology with greater accuracy and reliability than human experts.
Conventional discriminative machine learning models can match human performance at classifying cells in blood samples into predefined classes. But discriminative models, which learn to recognise cell images based on expert labels, struggle with never-before-seen cell types and images from differing microscopes and staining techniques.
To address these shortfalls, the team – headed up at the University of Cambridge, University College London and Queen Mary University of London – created CytoDiffusion around a diffusion-based generative AI classifier. Rather than just learning to separate cell categories, CytoDiffusion models the full range of blood cell morphologies to provide accurate classification with robust anomaly detection.
“Our approach is motivated by the desire to achieve a model with superhuman fidelity, flexibility and metacognitive awareness that can capture the distribution of all possible morphological appearances,” the researchers write.
Authenticity and accuracy
For AI-based analysis to be adopted in the clinic, it’s essential that users trust a model’s learned representations. To assess whether CytoDiffusion could effectively capture the distribution of blood cell images, the team used it to generate synthetic blood cell images. Analysis by experienced haematologists revealed that these synthetic images were near-indistinguishable from genuine images, showing that CytoDiffusion genuinely learns the morphological distribution of blood cells rather than using artefactual shortcuts.
The researchers used multiple datasets to develop and evaluate their diffusion classifier, including CytoData, a custom dataset containing more than half a million anonymized cell images from almost 3000 blood smear slides. In standard classification tasks across these datasets, CytoDiffusion achieved state-of-the-art performance, matching or exceeding the capabilities of traditional discriminative models.
Effective diagnosis from blood smear samples also requires the ability to detect rare or previously unseen cell types. The researchers evaluated CytoDiffusion’s ability to detect blast cells (immature blood cells) in the test datasets. Blast cells are associated with blood malignancies such as leukaemia, and high detection sensitivity is essential to minimize false negatives.
In one dataset, CytoDiffusion detected blast cells with sensitivity and specificity of 0.905 and 0.962, respectively. In contrast, a discriminative model exhibited a poor sensitivity of 0.281. In datasets with erythroblasts as the abnormal cells, CytoDiffusion again outperformed the discriminative model, demonstrating that it can detect abnormal cell types not present in its training data, with the high sensitivity required for clinical applications.
Robust model
It’s important that a classification model is robust to different imaging conditions and can function with sparse training data, as commonly found in clinical applications. When trained and tested on diverse image datasets (different hospitals, microscopes and staining procedures), CytoDiffusion achieved state-of-the-art accuracy in all cases. Likewise, after training on limited subsets of 10, 20 and 50 images per class, CytoDiffusion consistently outperformed discriminative models, particularly in the most data-scarce conditions.
Another essential feature of clinical classification tasks, whether performed by a human or an algorithm, is knowing the uncertainty in the final decision. The researchers developed a framework for evaluating uncertainty and showed that CytoDiffusion produced superior uncertainty estimates to human experts. With uncertainty quantified, cases with high certainty could be processed automatically, with uncertain cases flagged for human review.
“When we tested its accuracy, the system was slightly better than humans,” says first author Simon Deltadahl from the University of Cambridge in a press statement. “But where it really stood out was in knowing when it was uncertain. Our model would never say it was certain and then be wrong, but that is something that humans sometimes do.”
Finally, the team demonstrated CytoDiffusion’s ability to create heat maps highlighting regions that would need to change for an image to be reclassified. This feature provides insight into the model’s decision-making process and shows that it understands subtle differences between similar cell types. Such transparency is essential for clinical deployment of AI, making models more trustworthy as practitioners can verify that classifications are based on legitimate morphological features.
“The true value of healthcare AI lies not in approximating human expertise at lower cost, but in enabling greater diagnostic, prognostic and prescriptive power than either experts or simple statistical models can achieve,” adds co-senior author Parashkev Nachev from University College London.
CytoDiffusion is described in Nature Machine Intelligence.
The post Generative AI model detects blood cell abnormalities appeared first on Physics World.