Editing Diffusion Models (section)

== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Generative model''' — A model that learns the underlying distribution of training data and can generate new samples from that distribution.
* '''Forward process (diffusion)''' — The process of gradually adding Gaussian noise to data over T time steps until it becomes pure noise.
* '''Reverse process (denoising)''' — The learned process of iteratively removing noise from a noisy sample to recover a clean data point.
* '''Noise schedule''' — The function controlling how much noise is added at each time step. Common schedules: linear, cosine, sigmoid.
* '''U-Net''' — The neural network architecture originally used as the denoising backbone in diffusion models; it processes images at multiple scales via encoder-decoder with skip connections.
* '''Score function''' — The gradient of the log probability density, which points in the direction of higher data density; diffusion models implicitly learn to estimate this.
* '''DDPM (Denoising Diffusion Probabilistic Models)''' — The foundational 2020 paper that established the modern diffusion model framework (Ho et al.).
* '''DDIM (Denoising Diffusion Implicit Models)''' — A faster sampling method that achieves similar quality in far fewer steps (50 instead of 1000) by using a deterministic sampling formula.
* '''Latent diffusion''' — Performing the diffusion process in a compressed latent space (using a VAE encoder/decoder) rather than pixel space. This is how Stable Diffusion works.
* '''VAE (Variational Autoencoder)''' — The compression model used in latent diffusion to encode images into a compact latent representation.
* '''Classifier-Free Guidance (CFG)''' — A technique to improve sample quality and text-image alignment by interpolating between conditional and unconditional model predictions.
* '''Guidance scale''' — A hyperparameter controlling the strength of CFG; higher values produce samples more aligned with the conditioning signal but less diverse.
* '''Text-to-image''' — Generating images conditioned on natural language prompts.
* '''ControlNet''' — An architecture that adds spatial conditioning (e.g., edge maps, depth maps, pose skeletons) to pre-trained diffusion models without retraining.
* '''Inpainting''' — Using a diffusion model to fill in a masked region of an image coherently with its surroundings.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">