Editing Adversarial Ml (section)

== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Adversarial example''' — An input deliberately modified to cause a model to produce an incorrect output, often with the modification imperceptible to humans.
* '''Perturbation''' — The modification added to a clean input to create an adversarial example; typically constrained to be small.
* '''L∞ perturbation''' — Limits the maximum change to any single pixel/feature; the most common adversarial constraint.
* '''L2 perturbation''' — Limits the total Euclidean distance between the original and perturbed input.
* '''White-box attack''' — An attack with full knowledge of the model architecture, weights, and gradients.
* '''Black-box attack''' — An attack without model access; attacker only observes inputs and outputs.
* '''Targeted attack''' — An adversarial attack crafted to make the model produce a specific wrong output.
* '''Untargeted attack''' — An attack that only needs to make the model produce any wrong output.
* '''FGSM (Fast Gradient Sign Method)''' — A simple one-step adversarial attack using the sign of the gradient to perturb inputs.
* '''PGD (Projected Gradient Descent)''' — A stronger iterative multi-step adversarial attack; the gold standard for evaluating robustness.
* '''Adversarial training''' — The most effective defense: include adversarial examples in training data.
* '''Transferability''' — Adversarial examples often transfer between models trained on the same data, enabling black-box attacks.
* '''Backdoor attack (Trojan)''' — Poisoning training data with a trigger pattern that causes misbehavior only when the trigger is present.
* '''Data poisoning''' — Corrupting training data to cause specific model failures at test time.
* '''Certified robustness''' — A formal guarantee that a model's prediction will not change within a specified perturbation radius.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">