Deep Learning for Medical Image Segmentation
Medical image segmentation is one of the most impactful applications of deep learning in healthcare. Accurate segmentation of anatomical structures — like the pancreas in CT scans — enables clinicians to detect tumors earlier, plan surgeries more precisely, and monitor treatment progress automatically.
In this post I'll share key insights from my research on automatic pancreas segmentation using a two-phase Mask R-CNN and 3D U-Net approach, published in Computer Methods and Programs in Biomedicine (2021).
Why Is Pancreas Segmentation So Hard?
The pancreas is notoriously difficult to segment automatically:
- Small and irregular shape — unlike the liver or heart, the pancreas has no consistent geometry
- Low contrast — it blends with surrounding fat and other soft tissue in CT
- High inter-patient variability — size and location vary dramatically between individuals
These challenges make classical algorithms (thresholding, region growing) essentially useless. Deep learning is the only realistic path forward.
The Two-Phase Approach
Our solution breaks the problem into two stages:
Phase 1 — Coarse Localization (Mask R-CNN)
First, we use Mask R-CNN to roughly locate the pancreas region in each CT slice. This dramatically reduces the search space and eliminates irrelevant background anatomy.
# Conceptual pseudocode
roi = mask_rcnn.detect_pancreas_region(ct_slice)
cropped_volume = extract_roi(full_ct_volume, roi, margin=20)
Phase 2 — Fine Segmentation (3D U-Net)
The cropped region is then fed into a 3D U-Net, which operates on volumetric patches and captures 3D spatial context — critical for a structure as thin and irregular as the pancreas.
# 3D U-Net processes the ROI volume
segmentation_mask = unet_3d.predict(cropped_volume)
final_mask = postprocess(segmentation_mask)
Key Results
Our approach achieved a Dice Similarity Coefficient (DSC) of ~85% on the NIH Pancreas-CT dataset — competitive with state-of-the-art methods at the time, with significantly reduced false positives compared to single-stage approaches.
| Method | DSC (%) | HD (mm) |
|---|---|---|
| Single-stage U-Net | 78.3 | 14.2 |
| Two-phase (ours) | 85.1 | 9.8 |
| Nnunet (reference) | 86.4 | 9.1 |
Lessons Learned
- Coarse-to-fine always helps for small structures — don't try to segment directly from full volume
- 3D context matters — 2D slice-by-slice approaches miss inter-slice information
- Data augmentation is essential — we applied random rotations, elastic deformations, and intensity shifts
- Post-processing removes noise — connected component analysis and morphological operations clean up predictions
What's Next?
Current research directions include: - Foundation models (SAM, MedSAM) for few-shot medical segmentation - Multimodal learning — combining CT, MRI, and clinical notes - Interpretable AI — helping clinicians understand model decisions
I'm actively working on multimodal medical AI as part of my postdoctoral research at The Ohio State University, supported by a TÜBİTAK 2219 fellowship.
📄 Full paper: DOI: 10.1016/j.cmpb.2021.106141