Editing
Cooperative AI
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== <span style="color: #FFFFFF;">Understanding</span> == Cooperative AI sits at the intersection of multi-agent RL, game theory, and AI safety. Its central insight: cooperation is not the default outcome of intelligent agents β it must be designed, incentivized, and learned. **Why cooperation is hard**: Even when cooperation is collectively optimal, individually rational agents may not cooperate. The Prisoner's Dilemma illustrates this: two agents, both better off cooperating, may defect because they can't trust each other. In AI systems, this manifests as: agents optimizing local rewards at expense of collective outcomes, agents exploiting others' cooperative behavior, and agents failing to coordinate on conventions. **The Hanabi challenge**: Hanabi is a cooperative card game where players can't see their own cards. They must give each other clues using strict conventions. It requires Theory of Mind (inferring what others know and intend) and implicit communication. State-of-the-art AI Hanabi agents achieve near-perfect scores when trained together, but catastrophically fail when paired with human players or agents from different training runs β highlighting the "ad hoc cooperation" problem. **Other-play and zero-shot coordination**: Training agents together via self-play produces conventions that work within the trained team but not with outsiders. Other-play (Hu et al., 2020) addresses this by training agents with random partners at each step, encouraging learning conventions that are robust and interpretable β more likely to align with human conventions. **Cooperative AI for human-AI teams**: The most important cooperative AI problem is human-AI cooperation. AI assistants, autonomous vehicles in traffic, AI colleagues in workplaces β all require the AI to correctly model human partners, adapt to their preferences and behaviors, and signal its intentions clearly. This requires Theory of Mind, value alignment, and transparent communication. **The safety connection**: A key AI safety goal is building AI that cooperates with human oversight β "corrigible" AI that supports human ability to monitor, correct, and shut it down. This requires the AI to model its own fallibility and genuinely value human oversight, not merely comply with it while seeking to circumvent restrictions. </div> <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information