Editing Nlp (section)

== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Token''' — The basic unit of text that NLP models process. Tokens can be words, subwords, or characters depending on the tokenization strategy.
* '''Tokenization''' — The process of splitting text into tokens. Common algorithms: Byte-Pair Encoding (BPE), WordPiece, SentencePiece.
* '''Corpus''' — A large collection of text used to train NLP models.
* '''Vocabulary''' — The set of all unique tokens a model knows. Modern LLMs typically have vocabularies of 32k–100k tokens.
* '''Part-of-Speech (POS) tagging''' — Labeling each word with its grammatical role (noun, verb, adjective, etc.).
* '''Named Entity Recognition (NER)''' — Identifying and classifying entities in text (persons, organizations, locations, dates).
* '''Sentiment analysis''' — Determining the emotional tone of text (positive, negative, neutral).
* '''Machine translation''' — Automatically converting text from one language to another.
* '''Stemming''' — Reducing words to their root form (e.g., "running" → "run"). Often too aggressive.
* '''Lemmatization''' — Reducing words to their dictionary form using linguistic rules (e.g., "better" → "good").
* '''Stop words''' — Common words (the, is, at) often removed in preprocessing as they carry little semantic meaning.
* '''TF-IDF''' — Term Frequency–Inverse Document Frequency; a statistical measure of how important a word is to a document in a collection.
* '''Word embeddings''' — Dense vector representations of words that capture semantic relationships (Word2Vec, GloVe).
* '''Perplexity''' — A metric for evaluating language models; lower perplexity indicates better prediction of text sequences.
* '''BLEU score''' — Bilingual Evaluation Understudy; a metric for evaluating machine translation quality.
* '''Large Language Model (LLM)''' — A neural network trained on massive text corpora to predict and generate text.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">