Editing
Unsupervised Clustering
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== <span style="color: #FFFFFF;">Understanding</span> == Unsupervised learning discovers '''inherent structure''' in data. The key challenge: without labels, how do we know if the discovered structure is meaningful? This is the fundamental evaluation problem of unsupervised learning. '''K-means''' is the simplest and most widely used clustering algorithm. Initialize K cluster centers (randomly or with K-means++). Assign each point to the nearest center. Recompute centers as the mean of assigned points. Repeat until convergence. Pros: simple, fast. Cons: assumes spherical clusters of equal size, requires K to be specified, sensitive to initialization. '''DBSCAN''' doesn't require K, can find clusters of arbitrary shape, and identifies outliers as noise points not belonging to any cluster. Works by: a "core" point has β₯ MinPts neighbors within radius Ξ΅; reachable points form a cluster; unreachable points are noise. Critical for applications where cluster number and shape are unknown. '''Dimensionality reduction''' is essential before clustering in high-dimensional spaces (the curse of dimensionality makes all points equidistant). PCA gives a linear compression preserving maximum variance. t-SNE/UMAP give non-linear compressions suited for visualization. Autoencoders give deep non-linear compressions suited for representation learning. '''The curse of dimensionality''': In high-dimensional spaces, data becomes increasingly sparse and the notion of "nearest neighbor" breaks down β all points become equidistant. K-means fails above ~20 dimensions without dimensionality reduction. This motivates learning compact representations before clustering. </div> <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information