Pathology Ai

From BloomWiki
Revision as of 14:36, 23 April 2026 by Wordpad (talk | contribs) (BloomWiki: Pathology Ai)
Jump to navigation Jump to search

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Computational pathology applies deep learning to the analysis of digitized tissue slides — whole-slide images (WSI) captured by digital pathology scanners. Pathology is the gold standard for cancer diagnosis, but it is labor-intensive, subjective, and facing a global workforce shortage. AI can analyze WSIs to classify cancer grade, predict molecular biomarkers, identify cell types, and predict patient survival — performing tasks that would take a pathologist hours in seconds. With FDA-cleared AI tools entering clinical pathology laboratories, the field is transitioning from research to real-world impact.

Remembering

  • Whole Slide Image (WSI) — A digitized pathology slide; gigapixel images (~100,000 × 100,000 pixels) scanned at 20–40× magnification.
  • H&E staining — Hematoxylin and Eosin; the standard pathology stain coloring nuclei blue and cytoplasm pink.
  • IHC (Immunohistochemistry) — Staining technique detecting specific proteins; used for biomarker testing (HER2, PD-L1, ER/PR).
  • Tumor grading — Assessing tumor aggressiveness from histological features; e.g., Gleason score (prostate), Bloom-Richardson (breast).
  • Multiple Instance Learning (MIL) — A weakly-supervised framework handling gigapixel WSI by treating each slide as a bag of smaller patches.
  • Patch-based classification — Dividing WSI into tiles (e.g., 256×256 pixels) and classifying each; used for training with slide-level labels.
  • CLAM (Clustering-constrained Attention Multiple Instance Learning) — A widely used MIL framework for WSI classification.
  • Attention mechanism (pathology) — Identifies which patches are most diagnostically relevant within a slide.
  • PathAI — A commercial computational pathology company with FDA-cleared tools; founded by Andrew Beck.
  • Paige — First FDA-authorized AI for prostate cancer pathology; detects cancer in prostate biopsies.
  • Foundation models (pathology) — CONCH, UNI, Phikon — vision transformers pre-trained on millions of pathology images; strong feature extractors.
  • Pan-cancer classification — Predicting tumor type directly from histology across multiple cancer types.
  • Biomarker prediction from morphology — Predicting molecular alterations (MSI, BRCA mutation, TMB) from H&E histology without molecular testing.
  • Cell segmentation (pathology) — Detecting and classifying individual cells (tumor, immune, stromal) within tissue; HoverNet, StarDist, CellViT.

Understanding

Pathology AI faces a unique challenge: slides are gigapixel-scale images far too large for direct processing by neural networks (a 40× WSI can be 100,000 × 100,000 pixels = 10 billion pixels). Two dominant strategies address this:

Patch-based approaches: Extract thousands of smaller patches (256×256 or 512×512 pixels) from each slide. Train a CNN or ViT on each patch individually. Aggregate patch-level predictions to a slide-level diagnosis. This works but requires patch-level annotations, which are expensive and often unavailable.

Multiple Instance Learning (MIL): The dominant approach for slide-level labels. Each slide is a "bag" of patches. The bag label (e.g., cancer present) is known, but which patches contain cancer is unknown. MIL aggregates patch features using attention or pooling to produce a slide-level prediction. CLAM's attention mechanism additionally identifies which patches are driving the prediction — providing weak localization.

Pathology foundation models: Pre-trained on millions of pathology patches using self-supervised learning (DINO, MAE, DINOv2), models like UNI, CONCH, and Prov-GigaPath learn rich histological feature representations. These serve as feature extractors for downstream tasks with minimal labeled data — a major advance for data-scarce pathology problems.

Biomarker prediction from morphology: Neural networks trained on paired (WSI, molecular test result) data can predict molecular biomarkers from histology alone. TCGA-trained models predict microsatellite instability (MSI), BRAF mutation, HER2 amplification, and survival from H&E slides without any molecular testing. These predictions are not yet clinical-grade but suggest deep morphological correlates of molecular biology.

FDA-cleared pathology AI: Paige Prostate is the first FDA-authorized AI for prostate cancer detection. PathAI and other companies have cleared tools for various cancer types. Regulatory scrutiny is high: prospective clinical validation, algorithmic bias testing, and reader studies are required.

Applying

WSI classification with CLAM (MIL): <syntaxhighlight lang="python"> import torch import torch.nn as nn import torch.nn.functional as F

class Attn_Net_Gated(nn.Module):

   """Gated attention network for MIL aggregation."""
   def __init__(self, L=1024, D=256, dropout=0.25):
       super().__init__()
       self.attention_a = nn.Sequential(nn.Linear(L, D), nn.Tanh(), nn.Dropout(dropout))
       self.attention_b = nn.Sequential(nn.Linear(L, D), nn.Sigmoid(), nn.Dropout(dropout))
       self.attention_c = nn.Linear(D, 1)
   def forward(self, x):
       a = self.attention_a(x)
       b = self.attention_b(x)
       A = self.attention_c(a * b)  # Gated attention scores
       return A, x  # (N, 1), (N, L)

class CLAM_SB(nn.Module):

   """CLAM single-branch for binary WSI classification."""
   def __init__(self, feature_dim=1024, n_classes=2, dropout=0.25):
       super().__init__()
       self.attention_net = Attn_Net_Gated(L=feature_dim, D=256, dropout=dropout)
       self.classifiers = nn.Linear(feature_dim, n_classes)
       self.instance_classifier = nn.Linear(feature_dim, 2)  # For instance-level clustering
   def forward(self, h):
       # h: (N, feature_dim) — patch embeddings from pre-trained feature extractor
       A, h = self.attention_net(h)
       A = F.softmax(A, dim=0).transpose(0, 1)  # Softmax over patches: (1, N)
       M = torch.mm(A, h)  # Weighted aggregation: (1, feature_dim)
       logits = self.classifiers(M)  # Slide-level prediction
       Y_hat = torch.argmax(logits, dim=1)
       Y_prob = F.softmax(logits, dim=1)
       return logits, Y_prob, Y_hat, A  # A contains attention scores for visualization
  1. Feature extraction pipeline
  2. 1. Segment tissue from background (Otsu thresholding)
  3. 2. Extract non-overlapping 256×256 patches at 20× magnification
  4. 3. Extract features using pathology foundation model (UNI, CONCH, ResNet50-ImageNet)
  5. 4. Feed patch features to CLAM for WSI-level prediction
  1. Using UNI (pre-trained ViT on 100K pathology images)
  2. import timm
  3. uni = timm.create_model("hf_hub:MahmoodLab/uni", pretrained=True)

</syntaxhighlight>

Computational pathology tools
WSI viewing → QuPath (open-source), Aperio ImageScope, SlideViewer
MIL frameworks → CLAM (GitHub), TransMIL, DTFD-MIL
Foundation models → UNI, CONCH (Mahmood Lab), Prov-GigaPath (Microsoft/Providence)
Cell segmentation → HoverNet, StarDist, CellViT, CellPose
Commercial AI → Paige, PathAI, Aiforia, Ibex Medical Analytics

Analyzing

Pathology AI Clinical Applications
Application AI Performance Clinical Status
Prostate cancer detection AUC 0.97 (Paige) FDA authorized
Breast cancer mitosis counting Expert-level CE marked (several)
Colorectal cancer grading High Research → clinical
MSI prediction from H&E AUC ~0.85 Research
Cell type quantification High (specialized tools) Used in trials
Survival prediction C-index 0.65-0.75 Research

Failure modes: Scanner variability — staining protocols and scanner calibration differ; models overfit to specific scanner characteristics. Stain normalization needed but can introduce artifacts. Tumor heterogeneity — sampling bias in biopsies; AI sees only a portion of the actual tumor. Interobserver variability — ground truth labels from pathologists have significant disagreement rates. Whole-slide processing bottleneck — gigapixel images require significant compute infrastructure.

Evaluating

Pathology AI evaluation:

  1. AUC per clinical task: detection, grading, biomarker prediction.
  2. Concordance with molecular tests: for biomarker prediction models, compare to IHC/sequencing gold standards.
  3. Reader study: pathologists with and without AI assistance; measure diagnostic accuracy, time, confidence.
  4. Multi-site validation: test on slides from different labs, scanners, preparation protocols.
  5. Attention visualization: inspect which tissue regions drive predictions — should match known diagnostic criteria.

Creating

Building a pathology AI pipeline:

  1. Data: collect WSIs with slide-level labels (diagnosis, grade, biomarker status) from pathology archive.
  2. Preprocessing: tissue segmentation, patch extraction at 256×256 / 20×, feature extraction with UNI or CONCH.
  3. MIL training: CLAM with 5-fold cross-validation; attention-based pooling.
  4. Interpretability: generate attention heatmaps overlaid on WSI; pathologist verification.
  5. Bias audit: evaluate performance across patient demographics.
  6. Clinical validation: prospective reader study at target institution.
  7. Regulatory: work with regulatory consultant on FDA 510(k) or De Novo pathway.