Editing
Embeddings Vector Dbs
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== <span style="color: #FFFFFF;">Understanding</span> == The magic of embeddings is that they transform the hard problem of semantic similarity into simple geometric distance. After training on massive amounts of text data (or image-text pairs), embedding models learn to place words, sentences, and documents that mean similar things close together in a high-dimensional space. The classic demonstration: In a good word embedding space: * "king" - "man" + "woman" β "queen" * "Paris" - "France" + "Italy" β "Rome" This isn't hardcoded β it emerges from the statistical patterns of how words co-occur in language. '''Why not use keyword search?''' Keywords match exact strings. Semantic search understands meaning. A query for "cardiac event" will find documents about "heart attack" via embeddings; keyword search would miss this unless the exact phrase appears. '''How vector databases work''': Storing millions of embedding vectors and doing exact search (computing cosine similarity against every stored vector) would be too slow. ANN algorithms solve this by building smart index structures. HNSW (Hierarchical Navigable Small World) builds a layered graph where each layer is a sparser approximation of the dense lower layer β like a highway system where you first navigate between cities (coarse layer) then between neighborhoods (fine layer). This achieves sub-millisecond query times on millions of vectors. '''Hybrid search''' combines vector (semantic) search with BM25 keyword search, using a Reciprocal Rank Fusion (RRF) algorithm to merge results. This consistently outperforms either approach alone, because different query types benefit from different retrieval mechanisms. </div> <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information