Editing
Reinforcement Learning
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== <span style="color: #FFFFFF;">Analyzing</span> == {| class="wikitable" |+ RL Algorithm Comparison ! Algorithm !! On/Off Policy !! Action Space !! Sample Efficiency !! Stability |- | DQN || Off || Discrete || Medium || Moderate |- | PPO || On || Both || Low || High |- | SAC || Off || Continuous || High || High |- | TD3 || Off || Continuous || High || High |- | A3C || On || Both || Low || Moderate |} '''Failure modes and pitfalls:''' * '''Reward hacking''' β The agent finds unintended ways to maximize the reward signal that violate the spirit of the task. Example: a boat-racing agent learned to spin in circles collecting bonuses rather than completing the race. * '''Sparse rewards''' β If reward is only given at episode completion, learning is extremely slow. Mitigate with reward shaping, curriculum learning, or intrinsic motivation (curiosity). * '''Sample inefficiency''' β Model-free RL requires enormous amounts of interaction data. AlphaGo needed millions of self-play games. Real-world robots can't afford this β use simulation or model-based approaches. * '''Catastrophic forgetting''' β As the agent improves, early experiences become less representative. Experience replay buffers and periodic re-evaluation mitigate this. * '''Distribution shift''' β The policy changes during training, meaning the data collected under an old policy becomes stale for the new policy. </div> <div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information