=================================================================================
The relationship between training error and model complexity in machine learning is a fundamental concept that helps us understand the behavior of models as they are trained on data. This relationship is often referred to as the bias-variance trade-off:
-
Underfitting (High Bias):
- When a model is too simple or has low complexity, it may not be able to capture the underlying patterns in the training data effectively.
- This leads to high training error because the model is not flexible enough to fit the data.
- This is also known as underfitting because the model "underfits" the training data.
-
Overfitting (High Variance):
- When a model is overly complex, it can fit the training data very closely, even capturing noise or random fluctuations in the data.
- This can result in low training error because the model fits the training data very well, but it may not generalize to new, unseen data.
- Overfit models have high variance because they are too sensitive to the training data.
-
Balanced Model Complexity:
- There is an ideal level of model complexity that balances bias and variance, leading to good generalization.
- As you increase the model's complexity up to this ideal point, the training error tends to decrease as the model becomes better at capturing the underlying patterns.
- However, beyond this point, increasing complexity further results in overfitting, and the training error may start to increase again.
Figure 3797 shows the training error depending on model complexity. The high degree of polynormial the smaller the training error ideally as shown by the blue curve in Figure 3797. That is, a higher degree polynomial can potentially fit the training data more closely, resulting in a smaller training error (often measured as mean squared error or a similar metric). This is because a higher degree polynomial can introduce more flexibility and complexity into the model, allowing it to capture intricate patterns and variations in the training data.

Figure 3797. Training error versus model complexity (code).
However, in reality, there are important caveats to consider as shown in the orange curve in Figure 3797:
-
The training error reduces with increase of model complexity at the beginning of increase of model complexity; and then at some model complexity, the error is minimum; however, it increases with further increase of model complexity so that it shows overfitting.
-
Overfitting: While increasing the degree of the polynomial can reduce training error, it can also make the model overly complex and sensitive to noise in the data. This can lead to overfitting, where the model fits the training data extremely well but performs poorly on unseen data. In this case, the model may have a very low training error but a high test (or validation) error, which is undesirable.
- Strength of regularization: The strength of regularization is controlled by a hyperparameter, often denoted as λ. When λ is large, it is easier to underfit the data, and when λ is small, which also follows the orange curve in Figure 3797.
- Computational Complexity: Higher degree polynomials require more parameters, making the model more computationally intensive to train. Additionally, they may require more data to generalize effectively.
- Occam's Razor: Occam's Razor is a principle in science and machine learning that suggests simpler models are generally preferred over more complex ones when they have similar performance. This is because simpler models are often more interpretable and less prone to overfitting.
============================================
|