Medical Segmentation: Difference between revisions

From BloomWiki
Jump to navigation Jump to search
BloomWiki: Medical Segmentation
 
BloomWiki: Medical Segmentation
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
{{BloomIntro}}
Medical image segmentation is the task of delineating anatomical structures, pathological regions, or objects of interest in medical images — identifying exactly which pixels belong to a tumor, organ, lesion, or cell. Unlike classification (which assigns one label to an entire image) or detection (which localizes objects with bounding boxes), segmentation produces pixel-wise (2D) or voxel-wise (3D) masks with exact boundaries. It is a foundational task enabling quantitative radiology, radiotherapy planning, surgical navigation, and computational pathology.
Medical image segmentation is the task of delineating anatomical structures, pathological regions, or objects of interest in medical images — identifying exactly which pixels belong to a tumor, organ, lesion, or cell. Unlike classification (which assigns one label to an entire image) or detection (which localizes objects with bounding boxes), segmentation produces pixel-wise (2D) or voxel-wise (3D) masks with exact boundaries. It is a foundational task enabling quantitative radiology, radiotherapy planning, surgical navigation, and computational pathology.
</div>


== Remembering ==
__TOC__
 
<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Segmentation mask''' — A pixel-wise (or voxel-wise in 3D) map labeling each image element with its class.
* '''Segmentation mask''' — A pixel-wise (or voxel-wise in 3D) map labeling each image element with its class.
* '''Semantic segmentation''' — Labeling every pixel with a class (e.g., liver, tumor, background); no instance distinction.
* '''Semantic segmentation''' — Labeling every pixel with a class (e.g., liver, tumor, background); no instance distinction.
Line 18: Line 23:
* '''Histopathology''' — Microscopic study of tissue; whole-slide images (WSI) can be gigapixel-scale.
* '''Histopathology''' — Microscopic study of tissue; whole-slide images (WSI) can be gigapixel-scale.
* '''SAM (Segment Anything Model)''' — Meta's foundation model for promptable segmentation; adapted to medical imaging (MedSAM, SAM-Med).
* '''SAM (Segment Anything Model)''' — Meta's foundation model for promptable segmentation; adapted to medical imaging (MedSAM, SAM-Med).
</div>


== Understanding ==
<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Medical image segmentation is uniquely challenging: (1) '''3D data''': CT and MRI scans are 3D volumes (e.g., 512×512×400 voxels), requiring 3D models or slice-by-slice processing. (2) '''Rare structures''': organs and lesions occupy small fractions of the image volume, causing extreme class imbalance. (3) '''Annotator variability''': expert physicians disagree on exact boundaries; ground truth itself is uncertain. (4) '''Domain shift''': models trained on one hospital's scanner fail on another's due to acquisition differences.
== <span style="color: #FFFFFF;">Understanding</span> ==
Medical image segmentation is uniquely challenging:
# '''3D data''': CT and MRI scans are 3D volumes (e.g., 512×512×400 voxels), requiring 3D models or slice-by-slice processing.
# '''Rare structures''': organs and lesions occupy small fractions of the image volume, causing extreme class imbalance.
# '''Annotator variability''': expert physicians disagree on exact boundaries; ground truth itself is uncertain.
# '''Domain shift''': models trained on one hospital's scanner fail on another's due to acquisition differences.


'''U-Net: the standard framework''': The U-Net (Ronneberger et al., 2015) revolutionized medical segmentation. Its encoder-decoder structure with skip connections was designed specifically for small datasets — typical in medical AI. The encoder extracts features at multiple scales; the decoder progressively upsamples to full resolution; skip connections inject high-resolution encoder features into the decoder to preserve spatial detail. Despite its age, U-Net variants still dominate medical segmentation benchmarks.
'''U-Net: the standard framework''': The U-Net (Ronneberger et al., 2015) revolutionized medical segmentation. Its encoder-decoder structure with skip connections was designed specifically for small datasets — typical in medical AI. The encoder extracts features at multiple scales; the decoder progressively upsamples to full resolution; skip connections inject high-resolution encoder features into the decoder to preserve spatial detail. Despite its age, U-Net variants still dominate medical segmentation benchmarks.
Line 29: Line 40:


'''Universal medical segmentation''': Models like TotalSegmentator (trained to segment 117 anatomical structures in CT) and Segment Anything in Medical Images (SAMM) aim for broad, generalizable segmentation without task-specific fine-tuning — a major step toward clinical utility.
'''Universal medical segmentation''': Models like TotalSegmentator (trained to segment 117 anatomical structures in CT) and Segment Anything in Medical Images (SAMM) aim for broad, generalizable segmentation without task-specific fine-tuning — a major step toward clinical utility.
</div>


== Applying ==
<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''Medical image segmentation with nnU-Net:'''
'''Medical image segmentation with nnU-Net:'''
<syntaxhighlight lang="python">
<syntaxhighlight lang="python">
Line 106: Line 119:
: '''Pathology (WSI)''' → CLAM, HoverNet (nucleus segmentation), CONCH
: '''Pathology (WSI)''' → CLAM, HoverNet (nucleus segmentation), CONCH
: '''Research framework''' → MONAI (Medical Open Network for AI) — PyTorch-based
: '''Research framework''' → MONAI (Medical Open Network for AI) — PyTorch-based
</div>


== Analyzing ==
<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
{| class="wikitable"
|+ Medical Segmentation Performance Comparison
|+ Medical Segmentation Performance Comparison
Line 124: Line 139:


'''Failure modes''': Domain shift between training and test scanners causes catastrophic performance drops (Dice drop of 20-30%). Annotator disagreement — models trained on one annotator's style fail with another's labels. Rare finding segmentation — lesions with <10 training examples are unreliable. Out-of-distribution pathology — novel disease variants not in training data.
'''Failure modes''': Domain shift between training and test scanners causes catastrophic performance drops (Dice drop of 20-30%). Annotator disagreement — models trained on one annotator's style fail with another's labels. Rare finding segmentation — lesions with <10 training examples are unreliable. Out-of-distribution pathology — novel disease variants not in training data.
</div>


== Evaluating ==
<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Medical segmentation evaluation: (1) '''Dice coefficient''': primary metric; report per-structure for multi-class tasks. (2) '''Hausdorff Distance 95th percentile (HD95)''': measures boundary accuracy; complements Dice for clinical relevance. (3) '''Volume error''': absolute and relative volume difference; clinically important for radiotherapy. (4) '''Prospective clinical validation''': test in the actual clinical workflow with prospective cases. (5) '''Inter-observer variability''': compare model performance to human-human disagreement — model should not exceed human disagreement.
== <span style="color: #FFFFFF;">Evaluating</span> ==
Medical segmentation evaluation:
# '''Dice coefficient''': primary metric; report per-structure for multi-class tasks.
# '''Hausdorff Distance 95th percentile (HD95)''': measures boundary accuracy; complements Dice for clinical relevance.
# '''Volume error''': absolute and relative volume difference; clinically important for radiotherapy.
# '''Prospective clinical validation''': test in the actual clinical workflow with prospective cases.
# '''Inter-observer variability''': compare model performance to human-human disagreement — model should not exceed human disagreement.
</div>


== Creating ==
<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Deploying medical segmentation AI: (1) Start with nnU-Net — it auto-configures and routinely beats custom models. (2) Data: minimum 30–50 annotated cases; more for rare structures/pathology. (3) Multi-site validation: test on data from different hospitals/scanners than training. (4) Clinical integration: DICOM RT-STRUCT output for radiotherapy; FHIR integration for EHR. (5) QA workflow: every AI segmentation reviewed and approved by radiologist before clinical use. (6) Regulatory: FDA 510(k) or CE Mark required for clinical deployment in US/EU; document training data, performance, and bias analysis.
== <span style="color: #FFFFFF;">Creating</span> ==
Deploying medical segmentation AI:
# Start with nnU-Net — it auto-configures and routinely beats custom models.
# Data: minimum 30–50 annotated cases; more for rare structures/pathology.
# Multi-site validation: test on data from different hospitals/scanners than training.
# Clinical integration: DICOM RT-STRUCT output for radiotherapy; FHIR integration for EHR.
# QA workflow: every AI segmentation reviewed and approved by radiologist before clinical use.
# Regulatory: FDA 510(k) or CE Mark required for clinical deployment in US/EU; document training data, performance, and bias analysis.


[[Category:Artificial Intelligence]]
[[Category:Artificial Intelligence]]
[[Category:Medical Imaging]]
[[Category:Medical Imaging]]
[[Category:Segmentation]]
[[Category:Segmentation]]
</div>

Latest revision as of 01:53, 25 April 2026

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Medical image segmentation is the task of delineating anatomical structures, pathological regions, or objects of interest in medical images — identifying exactly which pixels belong to a tumor, organ, lesion, or cell. Unlike classification (which assigns one label to an entire image) or detection (which localizes objects with bounding boxes), segmentation produces pixel-wise (2D) or voxel-wise (3D) masks with exact boundaries. It is a foundational task enabling quantitative radiology, radiotherapy planning, surgical navigation, and computational pathology.

Remembering[edit]

  • Segmentation mask — A pixel-wise (or voxel-wise in 3D) map labeling each image element with its class.
  • Semantic segmentation — Labeling every pixel with a class (e.g., liver, tumor, background); no instance distinction.
  • Instance segmentation — Distinguishing individual instances of the same class (e.g., each separate cell nucleus).
  • Panoptic segmentation — Combines semantic and instance segmentation; labels all pixels with class and instance ID.
  • U-Net — A encoder-decoder architecture with skip connections; the dominant framework for medical image segmentation.
  • Skip connections — Direct connections from encoder to decoder that preserve high-resolution spatial features.
  • V-Net — 3D extension of U-Net for volumetric medical image segmentation.
  • nnU-Net — A self-configuring U-Net framework that automatically adapts to any medical imaging dataset; widely used baseline.
  • Intersection over Union (IoU) — Primary segmentation metric: area of overlap / area of union between predicted and ground truth masks.
  • Dice coefficient — 2 × |A ∩ B| / (|A| + |B|); equivalent to F1 score for segmentation; used in Dice loss.
  • Dice loss — 1 - Dice coefficient; directly optimizes the Dice metric; better than cross-entropy for imbalanced segmentation.
  • CT (Computed Tomography) — 3D medical imaging using X-rays; voxel-based volumetric data.
  • MRI (Magnetic Resonance Imaging) — 3D imaging using magnetic fields; soft tissue contrast superior to CT.
  • Histopathology — Microscopic study of tissue; whole-slide images (WSI) can be gigapixel-scale.
  • SAM (Segment Anything Model) — Meta's foundation model for promptable segmentation; adapted to medical imaging (MedSAM, SAM-Med).

Understanding[edit]

Medical image segmentation is uniquely challenging:

  1. 3D data: CT and MRI scans are 3D volumes (e.g., 512×512×400 voxels), requiring 3D models or slice-by-slice processing.
  2. Rare structures: organs and lesions occupy small fractions of the image volume, causing extreme class imbalance.
  3. Annotator variability: expert physicians disagree on exact boundaries; ground truth itself is uncertain.
  4. Domain shift: models trained on one hospital's scanner fail on another's due to acquisition differences.

U-Net: the standard framework: The U-Net (Ronneberger et al., 2015) revolutionized medical segmentation. Its encoder-decoder structure with skip connections was designed specifically for small datasets — typical in medical AI. The encoder extracts features at multiple scales; the decoder progressively upsamples to full resolution; skip connections inject high-resolution encoder features into the decoder to preserve spatial detail. Despite its age, U-Net variants still dominate medical segmentation benchmarks.

nnU-Net (no-new-U-Net): A self-configuring framework that automatically determines preprocessing, architecture, training, and postprocessing for any new medical dataset. It achieved state-of-the-art on 23 of 23 medical segmentation tasks in a comprehensive benchmark, often outperforming task-specific models. nnU-Net is now the de facto starting point for new medical segmentation problems.

Medical SAM: Meta's Segment Anything Model provides interactive, prompt-based segmentation. MedSAM fine-tunes SAM on 1.5M medical image-mask pairs, enabling zero-shot and prompted segmentation of medical structures. SAM-Med2D and SAM-Med3D extend this to 3D volumetric medical images.

Universal medical segmentation: Models like TotalSegmentator (trained to segment 117 anatomical structures in CT) and Segment Anything in Medical Images (SAMM) aim for broad, generalizable segmentation without task-specific fine-tuning — a major step toward clinical utility.

Applying[edit]

Medical image segmentation with nnU-Net: <syntaxhighlight lang="python">

  1. nnU-Net: self-configuring framework for medical segmentation
  2. pip install nnunetv2
  1. Step 1: Prepare dataset in nnU-Net format
  2. Dataset must be organized as:
  3. nnUNet_raw/Dataset001_Liver/
  4. imagesTr/ -- training images (NIfTI format: .nii.gz)
  5. labelsTr/ -- training segmentation masks
  6. imagesTs/ -- test images
  7. dataset.json -- metadata file

import json dataset_info = {

   "name": "LiverTumor",
   "description": "Liver and tumor segmentation from CT scans",
   "reference": "Medical Segmentation Decathlon",
   "licence": "CC-BY-SA 4.0",
   "channel_names": {"0": "CT"},
   "labels": {"background": 0, "liver": 1, "tumor": 2},
   "numTraining": 131,
   "file_ending": ".nii.gz"

}

  1. Step 2: Plan and preprocess
  2. nnUNetv2_plan_and_preprocess -d 001 --verify_dataset_integrity
  1. Step 3: Train (nnU-Net auto-selects architecture: 2D, 3D full res, 3D low res, cascade)
  2. nnUNetv2_train 001 3d_fullres 0 --npz (fold 0 of 5-fold CV)
  1. Step 4: Predict on new data
  2. nnUNetv2_predict -i /path/to/imagesTs -o /path/to/output \
  3. -d 001 -c 3d_fullres --save_probabilities
  1. Custom PyTorch U-Net for teaching purposes

import torch import torch.nn as nn

class DoubleConv(nn.Module):

   def __init__(self, in_ch, out_ch):
       super().__init__()
       self.conv = nn.Sequential(
           nn.Conv3d(in_ch, out_ch, 3, padding=1), nn.BatchNorm3d(out_ch), nn.ReLU(inplace=True),
           nn.Conv3d(out_ch, out_ch, 3, padding=1), nn.BatchNorm3d(out_ch), nn.ReLU(inplace=True)
       )
   def forward(self, x): return self.conv(x)

class UNet3D(nn.Module):

   def __init__(self, in_ch=1, out_ch=3, features=[32, 64, 128, 256]):
       super().__init__()
       self.encoders = nn.ModuleList([DoubleConv(in_ch if i==0 else features[i-1], features[i]) for i in range(len(features))])
       self.pool = nn.MaxPool3d(2)
       self.decoders = nn.ModuleList([nn.ConvTranspose3d(features[i], features[i-1], 2, stride=2) for i in range(len(features)-1, 0, -1)])
       self.dec_convs = nn.ModuleList([DoubleConv(features[i], features[i-1]) for i in range(len(features)-1, 0, -1)])
       self.head = nn.Conv3d(features[0], out_ch, 1)
   def forward(self, x):
       skips = []
       for enc in self.encoders[:-1]:
           x = enc(x); skips.append(x); x = self.pool(x)
       x = self.encoders[-1](x)
       for up, conv, skip in zip(self.decoders, self.dec_convs, reversed(skips)):
           x = up(x)
           x = torch.cat([x, skip], dim=1)
           x = conv(x)
       return self.head(x)

</syntaxhighlight>

Medical segmentation tools
Self-configuring → nnU-Net v2 (start here for any new task)
Interactive/prompted → MedSAM, SAM-Med2D, SAM-Med3D
Universal anatomy → TotalSegmentator (117 CT structures), MONAI Label
Pathology (WSI) → CLAM, HoverNet (nucleus segmentation), CONCH
Research framework → MONAI (Medical Open Network for AI) — PyTorch-based

Analyzing[edit]

Medical Segmentation Performance Comparison
Task Best Method Dice Score Clinical Threshold
Liver (CT) nnU-Net 3D 97% >95%
Cardiac (MRI) nnU-Net 3D 92% >90%
Brain tumor (MRI) nnU-Net + ensemble 88% (whole tumor) >85%
Lung lesion (CT) nnU-Net cascade 73% >70% (task-dependent)
Cell nuclei (histo) HoverNet 82% (instance) Task-dependent

Failure modes: Domain shift between training and test scanners causes catastrophic performance drops (Dice drop of 20-30%). Annotator disagreement — models trained on one annotator's style fail with another's labels. Rare finding segmentation — lesions with <10 training examples are unreliable. Out-of-distribution pathology — novel disease variants not in training data.

Evaluating[edit]

Medical segmentation evaluation:

  1. Dice coefficient: primary metric; report per-structure for multi-class tasks.
  2. Hausdorff Distance 95th percentile (HD95): measures boundary accuracy; complements Dice for clinical relevance.
  3. Volume error: absolute and relative volume difference; clinically important for radiotherapy.
  4. Prospective clinical validation: test in the actual clinical workflow with prospective cases.
  5. Inter-observer variability: compare model performance to human-human disagreement — model should not exceed human disagreement.

Creating[edit]

Deploying medical segmentation AI:

  1. Start with nnU-Net — it auto-configures and routinely beats custom models.
  2. Data: minimum 30–50 annotated cases; more for rare structures/pathology.
  3. Multi-site validation: test on data from different hospitals/scanners than training.
  4. Clinical integration: DICOM RT-STRUCT output for radiotherapy; FHIR integration for EHR.
  5. QA workflow: every AI segmentation reviewed and approved by radiologist before clinical use.
  6. Regulatory: FDA 510(k) or CE Mark required for clinical deployment in US/EU; document training data, performance, and bias analysis.