Bandwidth parameter (τ or h) in LWR and KDE

Bandwidth Parameter (τ or h) in LWR and KDE
- Python for Integrated Circuits -
- An Online Book -

Python for Integrated Circuits http://www.globalsino.com/ICs/

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

In Locally Weighted Regression (LWR), the goal is to fit the parameter vector θ in such a way that it minimizes the weighted sum of squared errors (also known as the cost function). The specific cost function that LWR aims to minimize is as follows:

minimize ------------------------------------------------------- [3887a]

where,

J(θ) is the cost function that we want to minimize.
Σᵢ represents the summation over all the data points i in your dataset.
w⁽ⁱ⁾ is a weight function assigned to the i-th data point. In LWR, the weights are typically determined by a kernel function that assigns higher weights to data points that are closer to the point at which you want to make a prediction.
y⁽ⁱ⁾ is the target or output value associated with the i-th data point.
θ^T is the transpose of the parameter vector θ.
x⁽ⁱ⁾ is the feature vector associated with the i-th data point.

The common choice of w⁽ⁱ⁾, shows in Figure 3887a, is,

minimize --------------------------------------------------- [3887b]

where,

τ is the bandwidth shown in Figure 3887a.

Weight Function (w(i)) in LWR

(a)

Weight Function (w(i)) in LWR

(b)

Figure 3887a. (a) Weight Function (w⁽ⁱ⁾) in LWR (Python code), and (b) effect of bandwidth (τ) on gaussian function (Python code). The red dot stands for the feature vector associated with the i-th data point. The bandwidth (τ) is shown in the figure as well.

The bandwidth parameter (often denoted as τ or h) in locally weighted regression (LWR) and kernel density estimation (KDE) does indeed have an effect on the trade-off between overfitting and underfitting. Understanding this effect requires an understanding of how LWR and KDE work.

In LWR and KDE (kernel density estimation), the bandwidth parameter determines the width or spread of the kernel function used to assign weights to data points. A narrower bandwidth assigns higher weights to data points that are very close to the prediction point, making the regression or density estimation highly sensitive to local variations in the data. In contrast, a wider bandwidth assigns more uniform weights to data points within a larger neighborhood, resulting in a smoother and more global estimation.

Here's how the bandwidth parameter affects overfitting and underfitting:

Narrow Bandwidth (Low τ or h):
- Pros: Narrow bandwidth focuses on local details and can provide a very accurate fit to the training data near the prediction point.
- Cons: It is highly sensitive to noise and can result in overfitting. The model can capture noise and small fluctuations in the data, leading to poor generalization to unseen data.
Wide Bandwidth (High τ or h):
- Pros: Wide bandwidth provides a smoother, more global estimate that is less affected by noise and local variations.
- Cons: It can lead to underfitting because it may not capture important local patterns or variations in the data. The model becomes too smooth and may miss details present in the data.

The choice of bandwidth is a critical hyperparameter in LWR and KDE, and selecting the right bandwidth value is often done through cross-validation or other model selection techniques. The goal is to strike a balance between capturing important local information while avoiding the pitfalls of overfitting or underfitting.

============================================

=================================================================================