Medical Segmentation: Difference between revisions
BloomWiki: Medical Segmentation |
BloomWiki: Medical Segmentation |
||
| Line 20: | Line 20: | ||
== Understanding == | == Understanding == | ||
Medical image segmentation is uniquely challenging: | Medical image segmentation is uniquely challenging: | ||
# '''3D data''': CT and MRI scans are 3D volumes (e.g., 512×512×400 voxels), requiring 3D models or slice-by-slice processing. | |||
# '''Rare structures''': organs and lesions occupy small fractions of the image volume, causing extreme class imbalance. | |||
# '''Annotator variability''': expert physicians disagree on exact boundaries; ground truth itself is uncertain. | |||
# '''Domain shift''': models trained on one hospital's scanner fail on another's due to acquisition differences. | |||
'''U-Net: the standard framework''': The U-Net (Ronneberger et al., 2015) revolutionized medical segmentation. Its encoder-decoder structure with skip connections was designed specifically for small datasets — typical in medical AI. The encoder extracts features at multiple scales; the decoder progressively upsamples to full resolution; skip connections inject high-resolution encoder features into the decoder to preserve spatial detail. Despite its age, U-Net variants still dominate medical segmentation benchmarks. | '''U-Net: the standard framework''': The U-Net (Ronneberger et al., 2015) revolutionized medical segmentation. Its encoder-decoder structure with skip connections was designed specifically for small datasets — typical in medical AI. The encoder extracts features at multiple scales; the decoder progressively upsamples to full resolution; skip connections inject high-resolution encoder features into the decoder to preserve spatial detail. Despite its age, U-Net variants still dominate medical segmentation benchmarks. | ||
| Line 126: | Line 130: | ||
== Evaluating == | == Evaluating == | ||
Medical segmentation evaluation: | Medical segmentation evaluation: | ||
# '''Dice coefficient''': primary metric; report per-structure for multi-class tasks. | |||
# '''Hausdorff Distance 95th percentile (HD95)''': measures boundary accuracy; complements Dice for clinical relevance. | |||
# '''Volume error''': absolute and relative volume difference; clinically important for radiotherapy. | |||
# '''Prospective clinical validation''': test in the actual clinical workflow with prospective cases. | |||
# '''Inter-observer variability''': compare model performance to human-human disagreement — model should not exceed human disagreement. | |||
== Creating == | == Creating == | ||
Deploying medical segmentation AI: | Deploying medical segmentation AI: | ||
# Start with nnU-Net — it auto-configures and routinely beats custom models. | |||
# Data: minimum 30–50 annotated cases; more for rare structures/pathology. | |||
# Multi-site validation: test on data from different hospitals/scanners than training. | |||
# Clinical integration: DICOM RT-STRUCT output for radiotherapy; FHIR integration for EHR. | |||
# QA workflow: every AI segmentation reviewed and approved by radiologist before clinical use. | |||
# Regulatory: FDA 510(k) or CE Mark required for clinical deployment in US/EU; document training data, performance, and bias analysis. | |||
[[Category:Artificial Intelligence]] | [[Category:Artificial Intelligence]] | ||
[[Category:Medical Imaging]] | [[Category:Medical Imaging]] | ||
[[Category:Segmentation]] | [[Category:Segmentation]] | ||
Revision as of 14:36, 23 April 2026
How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?
Medical image segmentation is the task of delineating anatomical structures, pathological regions, or objects of interest in medical images — identifying exactly which pixels belong to a tumor, organ, lesion, or cell. Unlike classification (which assigns one label to an entire image) or detection (which localizes objects with bounding boxes), segmentation produces pixel-wise (2D) or voxel-wise (3D) masks with exact boundaries. It is a foundational task enabling quantitative radiology, radiotherapy planning, surgical navigation, and computational pathology.
Remembering
- Segmentation mask — A pixel-wise (or voxel-wise in 3D) map labeling each image element with its class.
- Semantic segmentation — Labeling every pixel with a class (e.g., liver, tumor, background); no instance distinction.
- Instance segmentation — Distinguishing individual instances of the same class (e.g., each separate cell nucleus).
- Panoptic segmentation — Combines semantic and instance segmentation; labels all pixels with class and instance ID.
- U-Net — A encoder-decoder architecture with skip connections; the dominant framework for medical image segmentation.
- Skip connections — Direct connections from encoder to decoder that preserve high-resolution spatial features.
- V-Net — 3D extension of U-Net for volumetric medical image segmentation.
- nnU-Net — A self-configuring U-Net framework that automatically adapts to any medical imaging dataset; widely used baseline.
- Intersection over Union (IoU) — Primary segmentation metric: area of overlap / area of union between predicted and ground truth masks.
- Dice coefficient — 2 × |A ∩ B| / (|A| + |B|); equivalent to F1 score for segmentation; used in Dice loss.
- Dice loss — 1 - Dice coefficient; directly optimizes the Dice metric; better than cross-entropy for imbalanced segmentation.
- CT (Computed Tomography) — 3D medical imaging using X-rays; voxel-based volumetric data.
- MRI (Magnetic Resonance Imaging) — 3D imaging using magnetic fields; soft tissue contrast superior to CT.
- Histopathology — Microscopic study of tissue; whole-slide images (WSI) can be gigapixel-scale.
- SAM (Segment Anything Model) — Meta's foundation model for promptable segmentation; adapted to medical imaging (MedSAM, SAM-Med).
Understanding
Medical image segmentation is uniquely challenging:
- 3D data: CT and MRI scans are 3D volumes (e.g., 512×512×400 voxels), requiring 3D models or slice-by-slice processing.
- Rare structures: organs and lesions occupy small fractions of the image volume, causing extreme class imbalance.
- Annotator variability: expert physicians disagree on exact boundaries; ground truth itself is uncertain.
- Domain shift: models trained on one hospital's scanner fail on another's due to acquisition differences.
U-Net: the standard framework: The U-Net (Ronneberger et al., 2015) revolutionized medical segmentation. Its encoder-decoder structure with skip connections was designed specifically for small datasets — typical in medical AI. The encoder extracts features at multiple scales; the decoder progressively upsamples to full resolution; skip connections inject high-resolution encoder features into the decoder to preserve spatial detail. Despite its age, U-Net variants still dominate medical segmentation benchmarks.
nnU-Net (no-new-U-Net): A self-configuring framework that automatically determines preprocessing, architecture, training, and postprocessing for any new medical dataset. It achieved state-of-the-art on 23 of 23 medical segmentation tasks in a comprehensive benchmark, often outperforming task-specific models. nnU-Net is now the de facto starting point for new medical segmentation problems.
Medical SAM: Meta's Segment Anything Model provides interactive, prompt-based segmentation. MedSAM fine-tunes SAM on 1.5M medical image-mask pairs, enabling zero-shot and prompted segmentation of medical structures. SAM-Med2D and SAM-Med3D extend this to 3D volumetric medical images.
Universal medical segmentation: Models like TotalSegmentator (trained to segment 117 anatomical structures in CT) and Segment Anything in Medical Images (SAMM) aim for broad, generalizable segmentation without task-specific fine-tuning — a major step toward clinical utility.
Applying
Medical image segmentation with nnU-Net: <syntaxhighlight lang="python">
- nnU-Net: self-configuring framework for medical segmentation
- pip install nnunetv2
- Step 1: Prepare dataset in nnU-Net format
- Dataset must be organized as:
- nnUNet_raw/Dataset001_Liver/
- imagesTr/ -- training images (NIfTI format: .nii.gz)
- labelsTr/ -- training segmentation masks
- imagesTs/ -- test images
- dataset.json -- metadata file
import json dataset_info = {
"name": "LiverTumor",
"description": "Liver and tumor segmentation from CT scans",
"reference": "Medical Segmentation Decathlon",
"licence": "CC-BY-SA 4.0",
"channel_names": {"0": "CT"},
"labels": {"background": 0, "liver": 1, "tumor": 2},
"numTraining": 131,
"file_ending": ".nii.gz"
}
- Step 2: Plan and preprocess
- nnUNetv2_plan_and_preprocess -d 001 --verify_dataset_integrity
- Step 3: Train (nnU-Net auto-selects architecture: 2D, 3D full res, 3D low res, cascade)
- nnUNetv2_train 001 3d_fullres 0 --npz (fold 0 of 5-fold CV)
- Step 4: Predict on new data
- nnUNetv2_predict -i /path/to/imagesTs -o /path/to/output \
- -d 001 -c 3d_fullres --save_probabilities
- Custom PyTorch U-Net for teaching purposes
import torch import torch.nn as nn
class DoubleConv(nn.Module):
def __init__(self, in_ch, out_ch):
super().__init__()
self.conv = nn.Sequential(
nn.Conv3d(in_ch, out_ch, 3, padding=1), nn.BatchNorm3d(out_ch), nn.ReLU(inplace=True),
nn.Conv3d(out_ch, out_ch, 3, padding=1), nn.BatchNorm3d(out_ch), nn.ReLU(inplace=True)
)
def forward(self, x): return self.conv(x)
class UNet3D(nn.Module):
def __init__(self, in_ch=1, out_ch=3, features=[32, 64, 128, 256]):
super().__init__()
self.encoders = nn.ModuleList([DoubleConv(in_ch if i==0 else features[i-1], features[i]) for i in range(len(features))])
self.pool = nn.MaxPool3d(2)
self.decoders = nn.ModuleList([nn.ConvTranspose3d(features[i], features[i-1], 2, stride=2) for i in range(len(features)-1, 0, -1)])
self.dec_convs = nn.ModuleList([DoubleConv(features[i], features[i-1]) for i in range(len(features)-1, 0, -1)])
self.head = nn.Conv3d(features[0], out_ch, 1)
def forward(self, x):
skips = []
for enc in self.encoders[:-1]:
x = enc(x); skips.append(x); x = self.pool(x)
x = self.encoders[-1](x)
for up, conv, skip in zip(self.decoders, self.dec_convs, reversed(skips)):
x = up(x)
x = torch.cat([x, skip], dim=1)
x = conv(x)
return self.head(x)
</syntaxhighlight>
- Medical segmentation tools
- Self-configuring → nnU-Net v2 (start here for any new task)
- Interactive/prompted → MedSAM, SAM-Med2D, SAM-Med3D
- Universal anatomy → TotalSegmentator (117 CT structures), MONAI Label
- Pathology (WSI) → CLAM, HoverNet (nucleus segmentation), CONCH
- Research framework → MONAI (Medical Open Network for AI) — PyTorch-based
Analyzing
| Task | Best Method | Dice Score | Clinical Threshold |
|---|---|---|---|
| Liver (CT) | nnU-Net 3D | 97% | >95% |
| Cardiac (MRI) | nnU-Net 3D | 92% | >90% |
| Brain tumor (MRI) | nnU-Net + ensemble | 88% (whole tumor) | >85% |
| Lung lesion (CT) | nnU-Net cascade | 73% | >70% (task-dependent) |
| Cell nuclei (histo) | HoverNet | 82% (instance) | Task-dependent |
Failure modes: Domain shift between training and test scanners causes catastrophic performance drops (Dice drop of 20-30%). Annotator disagreement — models trained on one annotator's style fail with another's labels. Rare finding segmentation — lesions with <10 training examples are unreliable. Out-of-distribution pathology — novel disease variants not in training data.
Evaluating
Medical segmentation evaluation:
- Dice coefficient: primary metric; report per-structure for multi-class tasks.
- Hausdorff Distance 95th percentile (HD95): measures boundary accuracy; complements Dice for clinical relevance.
- Volume error: absolute and relative volume difference; clinically important for radiotherapy.
- Prospective clinical validation: test in the actual clinical workflow with prospective cases.
- Inter-observer variability: compare model performance to human-human disagreement — model should not exceed human disagreement.
Creating
Deploying medical segmentation AI:
- Start with nnU-Net — it auto-configures and routinely beats custom models.
- Data: minimum 30–50 annotated cases; more for rare structures/pathology.
- Multi-site validation: test on data from different hospitals/scanners than training.
- Clinical integration: DICOM RT-STRUCT output for radiotherapy; FHIR integration for EHR.
- QA workflow: every AI segmentation reviewed and approved by radiologist before clinical use.
- Regulatory: FDA 510(k) or CE Mark required for clinical deployment in US/EU; document training data, performance, and bias analysis.