Cost/loss function versus reward function

Cost/Loss Function versus Reward Function
- Python Automation and Machine Learning for ICs -
- An Online Book -

Python Automation and Machine Learning for ICs http://www.globalsino.com/ICs/

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Table 3660. Comparison between cost functions and reward functions in reinforcement learning.

	Cost function	Reward function
Minimization Objective	The primary goal of a cost function is to be minimized. It represents a penalty or loss associated with the agent's actions.	The primary goal of a reward function is to be maximized. It represents a positive reinforcement for the agent's actions.
Impact	Negative Impact: Higher values of the cost function indicate a poorer performance. The agent aims to reduce the cost to achieve better behavior.	Positive Impact: Higher values of the reward function indicate a better performance. The agent aims to maximize the reward to learn desirable behavior.
Penalty or Encouragement	Penalty for Deviations: Cost functions are often used to penalize deviations from desired behavior or states.	Encouragement for Desired Behavior: Reward functions are designed to encourage the agent to take actions that lead to desirable outcomes or states.
Optimization	The learning algorithm seeks to find a policy that minimizes the cumulative cost over time.	The learning algorithm seeks to find a policy that maximizes the cumulative reward over time.
Example	For helicopter, J(θ) = \|\|x-x_Desired\|\|² represents a cost function, where x is the current state, and x_Desired is the desired state.	For helicopter, R(θ) = -\|\|s-s_Desired\|\|² represents a reward function, where s is the current state, and s_Desired is the desired state.
Example	Both x and s represent the squared Euclidean distance between the current state and the desired state
Shared Characteristics	Guiding Learning: Both cost and reward functions guide the learning process by providing a measure of the agent's performance. Learning Objective: The ultimate objective is to find a policy that either minimizes the cumulative cost or maximizes the cumulative reward over time. Balancing Exploration and Exploitation: Both functions play a role in balancing the exploration of new actions and the exploitation of known actions to achieve optimal behavior.

============================================

=================================================================================