| |
Cost function |
Reward function |
| Minimization Objective |
The primary goal of a cost function is to be minimized. It represents a penalty or loss associated with the agent's actions. |
The primary goal of a reward function is to be maximized. It represents a positive reinforcement for the agent's actions. |
| Impact |
Negative Impact: Higher values of the cost function indicate a poorer performance. The agent aims to reduce the cost to achieve better behavior. |
Positive Impact: Higher values of the reward function indicate a better performance. The agent aims to maximize the reward to learn desirable behavior. |
| Penalty or Encouragement |
Penalty for Deviations: Cost functions are often used to penalize deviations from desired behavior or states. |
Encouragement for Desired Behavior: Reward functions are designed to encourage the agent to take actions that lead to desirable outcomes or states. |
| Optimization |
The learning algorithm seeks to find a policy that minimizes the cumulative cost over time. |
The learning algorithm seeks to find a policy that maximizes the cumulative reward over time. |
| Example |
For helicopter, J(θ) = ||x-xDesired||2 represents a cost function, where x is the current state, and xDesired is the desired state. |
For helicopter, R(θ) = -||s-sDesired||2 represents a reward function, where s is the current state, and sDesired is the desired state. |
| Both x and s represent the squared Euclidean distance between the current state and the desired state |
|
| Shared Characteristics |
Guiding Learning: Both cost and reward functions guide the learning process by providing a measure of the agent's performance. Learning Objective: The ultimate objective is to find a policy that either minimizes the cumulative cost or maximizes the cumulative reward over time.
Balancing Exploration and Exploitation: Both functions play a role in balancing the exploration of new actions and the exploitation of known actions to achieve optimal behavior. |