Editing Attention Mechanisms (section)

== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Attention''' — A mechanism that computes a weighted combination of input elements, where weights represent how relevant each element is to the current computation.
* '''Self-attention''' — Attention applied to a single sequence where each element attends to all other elements of the same sequence.
* '''Cross-attention''' — Attention where queries come from one sequence and keys/values from another; used in encoder-decoder models.
* '''Query (Q)''' — A vector representing "what I am looking for" at the current position.
* '''Key (K)''' — A vector representing "what I offer" for each position in the sequence.
* '''Value (V)''' — A vector representing "what I give if selected" for each position.
* '''Attention weight''' — The scalar importance assigned to each key-value pair given the query; computed via softmax of scaled dot products.
* '''Attention head''' — One parallel attention operation; multi-head attention runs H heads simultaneously.
* '''Multi-head attention''' — Running H attention operations in parallel with different projections, then concatenating outputs.
* '''Scaled dot-product attention''' — The standard attention formula: Attention(Q,K,V) = softmax(QKᵀ/√d_k)V.
* '''Causal (masked) attention''' — Self-attention where each position can only attend to positions before it; used in autoregressive decoders.
* '''Positional encoding''' — Information added to embeddings indicating each token's position, since attention is permutation-invariant.
* '''Attention sink''' — The empirical phenomenon where early tokens attract disproportionate attention mass in LLMs.
* '''Flash Attention''' — A memory-efficient, hardware-optimized implementation of exact attention using tiling and recomputation.
* '''Sparse attention''' — Attention variants that restrict which positions can attend to which, reducing O(n²) complexity.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">