Training process (train model ) in ML

Training Process (Train Model ) in ML
- Python for Integrated Circuits -
- An Online Book -

Python for Integrated Circuits http://www.globalsino.com/ICs/

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Training data sets require several example predictor variables to classify or predict a response. In machine learning, the predictor variables are called features and the responses are called labels. The training data is used to fit models, the validatioan data is used to make sure the model generalizes well to new data, and the test set is used to evaluate the performance of th efinal model.

In predictive modeling and machine learning, the term "fit" or "model fitting" refers to the process of training a machine learning model on a dataset. When you "fit" a model to your training data, you're essentially teaching the model to learn patterns, relationships, and associations within that dataset. The goal is to make the model capable of making accurate predictions or classifications based on the input data.

Here's a more detailed explanation of the three stages you mentioned:

Training Data: In this stage, you use your training dataset to "fit" or "train" your machine learning model. The model learns from the features (input variables) and labels (target variable) in the training data, adjusting its internal parameters to minimize the difference between its predictions and the actual outcomes, by choising correct parameters (θ), in the training dataset. This process involves optimization techniques, like gradient descent, to find the best parameters for the model.
Validation Data: After training, you use a separate dataset, called the validation dataset, to assess how well your model generalizes to new, unseen data. This step helps you determine whether your model has learned meaningful patterns from the training data and can make accurate predictions on data it hasn't encountered before. It's crucial for identifying and addressing issues like overfitting (when the model fits the training data too closely but performs poorly on new data) or underfitting (when the model is too simple to capture underlying patterns).
Test Set: Finally, the test set is used to evaluate the performance of the final, trained model. This set is entirely separate from the training and validation data and serves as an independent measure of how well the model is expected to perform in real-world scenarios. The model's predictions on the test set are assessed using various evaluation metrics to determine its overall accuracy and effectiveness.

Therefore, "fitting" a model refers to the training phase where the model learns from the training data to make predictions or classifications. The validation and test datasets are used to ensure that the model can generalize to new data and performs well when deployed for practical use.

On the other hand, when training a machine learning model, the goal is often to minimize a certain objective function (e.g., a loss function) by adjusting the weights of the model. In fact, in most cases, the "training" or "model training" is actually the process, where the goal is to minimize the difference between the predicted output (hypothesis(x)) and the actual output (y), which can be given by (h(x)-y)². The learning rate plays a crucial role in this process. If the learning rate is too small, the model may take a long time to converge or may get stuck in a local minimum. On the other hand, if the learning rate is too large, the optimization process may oscillate or even diverge, making it difficult to find the optimal set of weights. In supervised learning, this process is a fundamental step where the machine learning algorithm adjusts its parameters to make the predictions as close as possible to the true target values in the training dataset.

The process of minimizing the difference between the hypothesis and the actual target, by choising correct parameters (θ), is typically achieved through various optimization techniques, such as gradient descent, which iteratively updates the model's parameters to reduce the prediction error. The objective is to find the set of parameters that results in the best possible fit of the model to the training data, allowing it to generalize well to new, unseen data.

Figure 4109a shows how supervised learning works. To provide a precise characterization of the supervised learning problem, the objective is to acquire a function h: X → Y from a given training set. This function, denoted as h(x), should excel at predicting the associated value y. Traditionally, this function h is referred to as a "hypothesis" due to historical conventions.

Workflow of supervised learning

Figure 4109a. Workflow of supervised learning.

Figure 4109b shows the Vertex AI providing a unified set of APIs for the ML lifecycle. When sending training jobs to Vertex AI, most of the logic is split into a task.py and a model.py.

Vertix AI

Figure 4109b. Vertex AI providing a unified set of APIs for the ML lifecycle. [1]

To make your code compatible with Vertex AI, there are three basic steps that must be completed in a specific order:
          i) Upload data to Google Cloud Storage,
          ii) Move code into a trainer Python package,
          iii) Submit your training job with gcloud to train on Vertex AI.

You can use either pre-built containers or custom containers to run training jobs. Both containers require you specify settings that Vertex AI needs to run your training code, including region, display-name and worker-pool-spec.

Figure 4109c shows simplified overview of a machine learning workflow. The machine learning model is trained on input data gathered from different databases. Once it is trained, it can be applied to make predictions for other input data.

Simplified overview of a machine learning workflow

Figure 4109c. Simplified overview of a machine learning workflow. [2]

Non-linearity helps in training your model at a much faster rate and with more accuracy without the loss of your important information. Non-saturating, non-linear activation functions such as ReLUs can be used to fix the successive reduction of signal vs. noise caused by each additional layer in the network during the training process.

Models for ML can be trained by using cloud infrastructure through Google Cloud Training Service.

Training package can be run by using Docker containers and then training Docker images can be pushed on Docker registry.

With a Keras model for training, the significance of the .fit() method is that it can define the number of epochs.

Table 4109. Training models and training objectives.

Training model	Training objective
word2vec	Word vectors
scikit-learn	SVM (Support Vector Machine)

============================================

[1] Diagram courtesy Henry Tappen and Brian Kobashikawa.
[2] Siwar Chibani and François-Xavier Coudert, Machine learning approaches for the prediction of materials properties, APL Mater. 8, 080701 (2020); https://doi.org/10.1063/5.0018384.

=================================================================================