Editing
Reinforcement Learning
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== <span style="color: #FFFFFF;">Remembering</span> == * '''Agent''' β The learner and decision-maker that interacts with the environment. * '''Environment''' β Everything the agent interacts with; it receives actions and returns observations and rewards. * '''State (s)''' β A representation of the current situation of the environment. * '''Action (a)''' β A choice made by the agent at each time step. * '''Reward (r)''' β A scalar signal provided by the environment indicating how good or bad an action was. * '''Policy (Ο)''' β A mapping from states to actions, defining the agent's behavior. * '''Value function (V)''' β An estimate of the expected cumulative future reward from a given state when following a policy. * '''Q-function (Q)''' β An estimate of the expected cumulative reward from taking action a in state s, then following policy Ο. * '''Episode''' β A sequence of states, actions, and rewards from an initial state to a terminal state. * '''Discount factor (Ξ³)''' β A value between 0 and 1 that reduces the weight of future rewards relative to immediate ones. * '''Exploration vs. exploitation''' β The trade-off between trying new actions (exploration) and repeating known good actions (exploitation). * '''Markov Decision Process (MDP)''' β The mathematical framework for RL problems, defined by states, actions, transitions, and rewards. * '''Model-free RL''' β Methods that learn directly from interaction without building an explicit model of the environment. * '''Model-based RL''' β Methods that learn a model of the environment's dynamics and use it to plan. * '''PPO (Proximal Policy Optimization)''' β A widely-used policy gradient algorithm known for stability and efficiency. * '''DQN (Deep Q-Network)''' β A Q-learning algorithm using a neural network to approximate the Q-function. </div> <div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information