Editing Transfer Learning (section)

== <span style="color: #FFFFFF;">Understanding</span> ==
Transfer learning rests on a fundamental insight: '''lower layers of neural networks learn general features that are useful across many tasks, while higher layers learn task-specific features.'''

In a CNN trained on ImageNet:
* Early layers detect edges, colors, and simple textures — useful for any image task
* Middle layers detect shapes, patterns, and object parts
* Later layers detect high-level semantic features specific to ImageNet classes

When you transfer this model to a medical imaging task, the early and middle layer features are still useful (edges, textures, shapes are relevant in X-rays too), and only the final layers need to be adapted to the new task.

'''Why not always train from scratch?''' Three reasons:
1. '''Data efficiency''': You may only have 500 labeled medical images, not enough to train a good model from scratch. Starting from a pre-trained model gives you millions of examples worth of feature learning for free.
2. '''Compute efficiency''': Pre-training ImageNet takes weeks on many GPUs. Fine-tuning takes minutes to hours.
3. '''Better generalization''': Pre-trained features are often more robust and generalizable than features learned from a small dataset.

'''When does transfer learning work best?''' When source and target domains share underlying structure. A model pre-trained on natural photos transfers well to satellite imagery (both are images), but transfers poorly to audio spectrograms (very different structure). The more similar the domains, the more layers you can freeze and the less fine-tuning data you need.

'''Zero-shot transfer''' is the most powerful form: a model like CLIP or GPT-4 trained on massive diverse data can perform tasks at inference time that it was never explicitly trained on — by virtue of having learned general-purpose representations.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">