Softmax regression (multinomial logistic regression) / softmax multi-class network / softmax classifier

Softmax Regression (Multinomial Logistic Regression) /
Softmax Multi-Class Network / Softmax Classifier
- Python and Machine Learning for Integrated Circuits -
- An Online Book -

Python and Machine Learning for Integrated Circuits http://www.globalsino.com/ICs/

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Softmax regression, also known as multinomial logistic regression, softmax multi-class network, or a softmax classifier, is a machine learning algorithm used for classification tasks. It is an extension of logistic regression that is used when the task involves classifying instances into more than two classes. In softmax regression, the goal is to assign a probability to each class for a given input and then predict the class with the highest probability. It is particularly common in machine learning and deep learning applications for tasks like image classification, natural language processing, and more.

Softmax regression is considered another member of the Generalized Linear Model (GLM) family because it shares several key characteristics and principles with GLMs. Generalized Linear Models are a broad class of statistical models used for various types of regression and classification tasks, and softmax regression can be viewed as a specific instance within this family.

A oftmax multi-class network works in the way below:

Input Layer: The network takes in an input vector representing the features of the input data. For example, in an image classification task, the input vector might represent the pixel values of an image.
Weighted Sum: The input is multiplied by a set of weights, and the products are summed up. This operation is often followed by the addition of a bias term.
Activation Function: The result of the weighted sum is typically passed through an activation function. In the case of softmax regression, the softmax function is commonly used as the activation function.
Softmax Activation Function: The softmax function is used to convert the raw scores (logits) into probabilities. It takes an input vector and normalizes it into a probability distribution. The formula for the softmax function for a class is:

Softmax Activation Function ---------------------------- [3854a]

where,

is the raw score (logit) for class

is the total number of classes.

Output: The output of the softmax function is a probability distribution over the classes. Each element in the output vector represents the probability of the input belonging to the corresponding class.
Loss Function: The predicted probability distribution is compared to the true distribution (one-hot encoded vector representing the actual class) using a loss function. The cross-entropy loss is commonly used for softmax classification.
Training: The network is trained using optimization algorithms (e.g., gradient descent) to minimize the loss function by adjusting the weights and biases.

Cross-entropy is commonly used as the loss function for training Softmax regression models:

Cross-Entropy Loss: Cross-entropy, or more specifically, categorical cross-entropy, is a loss function used to measure the dissimilarity between the predicted class probabilities and the actual class labels. It quantifies how well the model's predicted probabilities match the true class distributions.
Minimizing Cross-Entropy: During the training of a Softmax regression model, the goal is to minimize the cross-entropy loss. This minimization process involves adjusting the model's parameters (weights and biases) using techniques like gradient descent. The gradients of the cross-entropy loss with respect to the model parameters are computed, and these gradients guide the parameter updates to make the predicted probabilities more similar to the true class distributions.

The decision boundaries in Figure 3854 represent hyperplanes. In a real Softmax Regression model, these boundaries would be learned from the data.

Hyperplanes, in Softmax regression for classification

Figure 3854. Hyperplanes, in Softmax regression for classification, seperating the classes (Python code).

Softmax function is commonly used in neural networks, especially in the output layer for classification problems. In neural networks, the softmax function is often used to convert a vector of raw scores (also called logits) into a probability distribution. The softmax function takes an input vector, exponentiates each element, and then normalizes the results so that they sum to 1. This is useful in classification problems where you want the network to output probabilities for each class.

The softmax function is defined for a vector with elements as follows:

softmax function ----------------------------------------------- [3854a]

where,:

is the vector of raw scores or logits.
is the base of the natural logarithm.
The denominator on the righ side of the equation above is the sum of the exponentials over all elements in .
softmax is the -th element of the resulting probability distribution.

The division by the sum ensures that the resulting values form a valid probability distribution. The goal with the softmax function is to convert these logits into probabilities that sum to 1.

For the example described in page3876, with softmax function, then we can have softmax multi-class network below,


Assuming:, ,

				→	Cat
	=	z₁⁽ⁱ⁾	→
				→	Dog
		z₂⁽ⁱ⁾	→
				→	Sheep
		z₃⁽ⁱ⁾	→

In this case, the division by the sum ensures that the resulting values form a valid probability distribution. The goal with the softmax function is to convert these logits into probabilities that sum to 1. Therefore, the probabilities of the three animals depend on each other.

The softmax multi-class network is a type of neural network architecture used for classification tasks where the goal is to assign an input into one of multiple classes.

============================================

=================================================================================