Editing Long Context Memory (section)

== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Context window''' — The maximum number of tokens an LLM can process in a single forward pass.
* '''Long context''' — Context windows exceeding 32k tokens, enabling processing of long documents, books, and extended conversations.
* '''KV cache''' — Key-Value cache storing attention keys and values for all processed tokens; grows linearly with context length.
* '''Lost in the Middle''' — An empirical finding that LLMs perform worse at retrieving information from the middle of long contexts vs. the beginning/end.
* '''Needle in a Haystack (NIAH)''' — A benchmark hiding a specific fact in a long document and asking the model to retrieve it; tests effective context utilization.
* '''RULER''' — A more comprehensive long-context benchmark covering multi-hop retrieval, aggregation, and ordering.
* '''RoPE (Rotary Position Embedding)''' — A position encoding method that generalizes to longer sequences than training length via "context extension."
* '''YaRN''' — A technique for extending RoPE-based models to longer contexts without full retraining.
* '''Ring Attention''' — A distributed attention mechanism enabling near-infinite context by distributing KV cache across devices.
* '''Sliding window attention''' — Restricts attention to a local window; efficient but loses long-range information.
* '''Retrieval-augmented memory''' — Augmenting model context with retrieved relevant chunks from external memory stores.
* '''Episodic memory''' — Storing and retrieving specific past events or conversations, enabling persistent agent memory.
* '''Working memory''' — The information currently held in the context window; limited by context length.
* '''Compressive memory''' — Summarizing and compressing older context to extend effective memory beyond the raw context window.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">