Editing Cooperative AI (section)

== <span style="color: #FFFFFF;">Understanding</span> ==
Cooperative AI sits at the intersection of multi-agent RL, game theory, and AI safety. Its central insight: cooperation is not the default outcome of intelligent agents — it must be designed, incentivized, and learned.

**Why cooperation is hard**: Even when cooperation is collectively optimal, individually rational agents may not cooperate. The Prisoner's Dilemma illustrates this: two agents, both better off cooperating, may defect because they can't trust each other. In AI systems, this manifests as: agents optimizing local rewards at expense of collective outcomes, agents exploiting others' cooperative behavior, and agents failing to coordinate on conventions.

**The Hanabi challenge**: Hanabi is a cooperative card game where players can't see their own cards. They must give each other clues using strict conventions. It requires Theory of Mind (inferring what others know and intend) and implicit communication. State-of-the-art AI Hanabi agents achieve near-perfect scores when trained together, but catastrophically fail when paired with human players or agents from different training runs — highlighting the "ad hoc cooperation" problem.

**Other-play and zero-shot coordination**: Training agents together via self-play produces conventions that work within the trained team but not with outsiders. Other-play (Hu et al., 2020) addresses this by training agents with random partners at each step, encouraging learning conventions that are robust and interpretable — more likely to align with human conventions.

**Cooperative AI for human-AI teams**: The most important cooperative AI problem is human-AI cooperation. AI assistants, autonomous vehicles in traffic, AI colleagues in workplaces — all require the AI to correctly model human partners, adapt to their preferences and behaviors, and signal its intentions clearly. This requires Theory of Mind, value alignment, and transparent communication.

**The safety connection**: A key AI safety goal is building AI that cooperates with human oversight — "corrigible" AI that supports human ability to monitor, correct, and shut it down. This requires the AI to model its own fallibility and genuinely value human oversight, not merely comply with it while seeking to circumvent restrictions.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">