Editing
Self Supervised
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== <span style="color: #FFFFFF;">Understanding</span> == The key insight of self-supervised learning is that '''data contains its own supervision signal''' if you know how to extract it. Human language is full of structure: words predict their neighbors, sentences follow each other coherently. Images have spatial structure: patches are consistent with their surroundings. Audio has temporal structure: frames predict nearby frames. By designing tasks that exploit these structures, we can train models on billions of unlabeled examples β far more than could ever be labeled by humans. The result is representations that capture rich, generalizable features of the data. '''Contrastive learning''' is the dominant paradigm for vision SSL. The idea: create two augmented views of the same image (positive pair) and train the model to map them to similar representations, while pushing representations of different images (negative pairs) apart. The model cannot cheat by mapping everything to the same point (called collapse) because it must distinguish different images. '''Masked modeling''' is the dominant paradigm for NLP and increasingly vision. BERT masks 15% of tokens and trains the model to predict them. This forces the model to understand context and semantics β you can't predict a masked word without understanding the sentence. MAE extends this to images, masking 75% of patches and reconstructing them. '''Why SSL beats supervised pretraining in many settings''': Supervised pretraining is limited to the labels available (1000 ImageNet classes). SSL trains on the full diversity of the data without label constraints, producing more general representations that transfer better to diverse downstream tasks. </div> <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information