August 16, 2024
Using Backbone Foundation Model for Evaluating Fairness in Chest Radiography Without Demographic Data
Dilermando Queiroz, André Anjos, Lilian Berton
💡
Question: Is it possible to evaluate bias in AI models without using sensitive attributes like gender, age, or ethnicity?
 
🌍 In today’s world, technology, particularly in healthcare, is advancing at a rapid pace. One of the significant breakthroughs is the use of Artificial Intelligence (AI) to help doctors diagnose diseases from medical images, like chest X-rays. However, there’s a big challenge: ensuring that these AI tools work fairly for everyone, no matter their gender, age, or ethnicity.
🔍 Our research focuses on a crucial issue—many medical image databases don’t include personal details like gender or age, which are essential to ensure AI tools are fair. Without this information, it’s hard to tell if the AI might be making mistakes more often for certain groups of people, which could lead to unequal healthcare outcomes.
🛠️ To address this, we explored a new way to create groups that could represent these missing personal details using what’s called a “Foundation Model.” Think of this model as a smart engine that has learned to recognize patterns in chest X-rays. By analyzing these patterns, we could form groups that might represent different genders or age ranges, even if we don’t know the exact details about the patients.
📊 We tested our method with two large chest X-ray databases. The exciting part? For gender, our method reduced the difference in how the AI performed between male and female patients by about 4-6%, making it more fair. However, the model struggled a bit more with age, especially in handling images of older or younger patients.
✨ These findings are a step forward in making sure AI tools in healthcare treat everyone fairly, even when personal details aren’t available. While there’s still work to do, particularly for age, this research opens new doors to more equitable healthcare for all. 🌈🏥

Framework

notion image
To understand and improve the fairness of AI systems in medical diagnostics, we developed a unique approach that doesn't rely on direct demographic information like age or gender. Here's how we did it:
  1. Datasets: We started by gathering two large sets of chest X-ray images from different sources. These datasets are crucial because they provide the raw material that our AI model analyzes. One set of images was similar to what the AI had seen before (in-distribution), and the other set was quite different (out-of-distribution), allowing us to test how well our methods work across various scenarios.
  1. Foundation Model: Next, we used a Foundation Model, which is an advanced AI system trained on an enormous amount of data. This model was originally designed to recognize patterns in images without needing detailed labels, like a person’s age or gender. We utilized this model to extract what we call “embeddings,” which are numerical representations of the images that capture their essential features.
  1. t-SNE: To make sense of these embeddings, we used a technique called t-SNE (t-distributed Stochastic Neighbor Embedding). This method helps us visualize the complex patterns in the data by reducing the high-dimensional embeddings into a two-dimensional space. Imagine taking a detailed 3D object and flattening it into a 2D drawing that still preserves the important details. This visualization allows us to see how the images group together based on their similarities.
  1. Clustering: With the 2D visualizations from t-SNE, we applied a clustering algorithm. This step involves grouping the images into clusters based on how similar they are to each other. Each cluster represents a collection of images that share common features, and these clusters can indirectly reflect demographic groups like age or gender, even though we didn’t use this information directly.
  1. Forming Groups: Finally, these clusters were used to form groups that represent different demographic characteristics, such as gender and age. These groups allowed us to evaluate how fairly the AI model treats different populations. If we noticed that the AI model performed better for one group over another, we could take steps to correct this imbalance, ensuring more equitable healthcare outcomes.

Explore the Code

Interested in the technical details? Our code is available on GitHub for you to explore, replicate, or contribute to our research. 🧑‍💻 Making Medical Diagnostics Fairer 👩‍💻
 

Marrakech

This work was presented at the Fairness of AI in Medical Imaging (FAIMI) workshop during MICCAI 2024 in Marrakech.
 
notion image
notion image
notion image
notion image
notion image
notion image
notion image
notion image
notion image
notion image
notion image
notion image