Expected Risk (Population Risk) - Python Automation and Machine Learning for ICs - - An Online Book - |
||||||||
Python Automation and Machine Learning for ICs http://www.globalsino.com/ICs/ | ||||||||
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix | ||||||||
================================================================================= In machine learning, "expected risk" refers to the expected value of the loss or error incurred by a predictive model when applied to new, unseen data. It is a fundamental concept in statistical learning theory and plays a central role in model training and evaluation. Expected risk is also sometimes referred to as "expected prediction error." Expected risk is not the same as "population risk," but they are related concepts:
The expected risk (error) of a hypothesis hs ∈H, which is selected based on the training dataset S from a hypothesis class H, can be decomposed into the approximation error, εapp, and the estimation error, εest, as following, LD(hs) = εapp + εest ------------------------------------- [3983] Figure 3983a shows the expected risk (error), approximation error, and estimation error. Figure 3983a. Expected risk (error), approximation error, and the estimation error. [1] Figure 3983b shows the relationship between these terms in Equation 3983. The red points are specific hypotheses. The best hypothesis (the Bayes hypothesis) lies outside the chosen hypothesis class H. The distance between the risk of h^ and the risk of h* is the estimation error, while the distance between ℎ* and Bayes hypothesis is the approximation error. Some properties are:
In practice, machine learning practitioners use expected risk as a proxy for population risk. When we train a machine learning model, we aim to minimize its expected risk on the training data because we assume that a lower expected risk will also correspond to a lower population risk when the model is deployed in the real world. However, note that there are no guarantees that a model with low expected risk on the training data will perform well on all unseen data. Overfitting, where a model fits the training data too closely and performs poorly on new data, is a common concern. Cross-validation and other techniques are used to estimate and mitigate this risk by providing a more accurate estimate of expected risk. ============================================ Text classification based on the values in ColumnA to predict the values for ColumnB. To achieve this, a text classification model is used below. In this example, a simple Multinomial Naive Bayes classifier from the sklearn library is applied to classify the new string in ColumnA and predict the corresponding value for ColumnB. This uses the trained model to predict values for a new string from the CSV file. Note that for more complex scenarios, more advanced text classification techniques and more training data are needed. Code: The code above belongs to the Multinomial Naive Bayes algorithm. In this code, the terms "Expected Risk" and "Population Risk" are used to describe concepts related to the performance evaluation of a machine learning model. Specifically, they are related to the accuracy of the Naive Bayes classifier trained on your data.
In your script, "Expected Risk" is calculated as the misclassification rate on the test set, which is an estimate of how well your model might perform on new, unseen data from the same distribution as your test set. It's not directly calculating the "Population Risk" because you would need access to the entire population for that, but it provides an estimate of model performance on new data. ============================================
[1] www.medium.com.
|
||||||||
================================================================================= | ||||||||
|
||||||||