Editing Adversarial Machine Learning (section)

== <span style="color: #FFFFFF;">Understanding</span> ==
The fundamental discovery (Szegedy et al., 2014) that shocked the ML community: deep neural networks, which achieve superhuman accuracy on image classification, are trivially fooled by adding small, structured noise imperceptible to humans. This revealed that neural networks are not learning the same features as humans — they rely on statistical patterns that are completely invisible to human perception.

**Why do adversarial examples exist?** Neural networks make decisions in high-dimensional spaces. In these spaces, the decision boundary can be very close to natural data points — a tiny step in the "wrong" direction (determined by the gradient) crosses the boundary. Humans are not sensitive to the same features that define neural network boundaries.

**FGSM**: The simplest attack. Given loss L(model(x), y), perturb: x_adv = x + ε·sign(∇_x L). Just one gradient step, in the direction that most increases loss. Cheap to compute, surprisingly effective.

**PGD**: Iterative multi-step FGSM with projection back to the ε-ball after each step. Much stronger than FGSM. Madry et al. (2018) proposed PGD adversarial training as a defense: train with worst-case PGD examples, producing much more robust models at some cost to clean accuracy.

**Beyond image attacks**: NLP adversarial attacks swap words for synonyms, change character-level features (invisible Unicode), or exploit LLM instruction following. Physical-world attacks print adversarial patterns on real objects (adversarial patches, adversarial glasses for face recognition bypass, adversarial stop signs). Backdoor attacks plant triggers in training data — a model learns to associate a trigger pattern (a specific watermark) with a target class.

**The robustness-accuracy tradeoff**: Adversarially robust models consistently perform slightly worse on clean data. This is a fundamental tension that has not yet been eliminated.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">