Expectation-Maximization (EM) algorithm working in Gaussian Mixture Models (GMMs)

Expectation-Maximization (EM) algorithm
working in Gaussian Mixture Models (GMMs)
- Python Automation and Machine Learning for ICs -
- An Online Book -

Python Automation and Machine Learning for ICs http://www.globalsino.com/ICs/

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Expectation-Maximization (EM) algorithm is a general framework for finding maximum likelihood estimates of parameters in models with latent variables. The algorithm alternates between two steps: the E-step (Expectation) and the M-step (Maximization). The EM algorithm is often used in the context of Gaussian Mixture Models (GMMs), where the underlying distribution is a mixture of several Gaussian distributions.

In EM algorithm working in Gaussian Mixture Models (GMMs), for instance, for KNN (K-Nearest Neighbours), EM provides a probabilistic or "soft" assignment of data points to different clusters or components, and it incorporates probabilities rather than making a hard assignment based on the closest cluster center. Additionally, the update equations for the cluster parameters involve weighted averages based on these probabilities.

In a GMM, each data point is considered to be generated by one of several Gaussian distributions, and the EM algorithm iteratively estimates the parameters of these Gaussian distributions. The probability that a data point belongs to the -th Gaussian component is given by ⁽ⁱ⁾ as discussed in Mixture of Gaussians (MoG). Then, the main steps of EM algorithm working in GMMs are:

i) Soft Assignment:

Instead of assigning a data point exclusively to the cluster with the maximum probability, EM assigns it to all clusters with some probability. The "softness" of the assignment is captured by these probabilities. Mathematically, the assignment is based on the responsibility ⁽ⁱ⁾:

Expectation-Maximization (EM) algorithm --------------------------------- [3695a]

ii) Parameter updates:

The cluster parameters, such as the mixing coefficients (), means (), and covariances (_j), are updated using weighted averages. The weights are the responsibilities⁽ⁱ⁾, indicating the likelihood of a data point belonging to a specific cluster:

ii.a) Mixing coefficient update:

Expectation-Maximization (EM) algorithm ----------------------------------------------------------------- [3695b]

ii.b) Mean update:

Expectation-Maximization (EM) algorithm ---------------------------------------------- [3695c]

ii.c) Covariance update (for a full covariance matrix):

Expectation-Maximization (EM) algorithm ---------------------------------------------- [3695d]

where,

is the joint probability of observing ⁽ⁱ⁾ and its corresponding latent variable ⁽ⁱ⁾ given the parameters , , and .
represents the probability that a randomly selected data point belongs to the -th component of the mixture. It is estimated by counting the proportion of data points assigned to the -th component.
represents the mean of the distribution associated with the -th component of the mixture. It is estimated by computing the weighted average of the observations assigned to the -th component, where the weights are the probabilities of belonging to that component.

These equations illustrate that the updates are influenced by the probabilities ⁽ⁱ⁾, which reflect the degree of association of each data point with each cluster. The soft assignment and weighted updates are key features that distinguish EM-based clustering, like GMMs, from "hard" clustering methods.

The basic steps of deriving the EM algorithm using Multivariate Gaussian properties, specifically in Gaussian Mixture Models are:

Define the Model:
- Assume that the observed data is generated from a mixture of multiple multivariate Gaussian distributions.
- Define the parameters of the model, including the mean vectors, covariance matrices, and mixing coefficients for each Gaussian component.
- Derive P(x,z). The goal is to derive the joint distribution of the observed data () and the latent variables (). The joint distribution is a key component in the derivation of the EM algorithm.
E-step (Expectation):
- Compute the posterior probabilities (responsibilities) of each data point belonging to each Gaussian component. In this step, the goal is to compute the expected value of the latent variables given the observed data and the current parameter estimates. This involves calculating the conditional probabilities of the latent variables given the observed data and the current parameter values. This is done using Bayes' theorem. The conditional probability is given by,
- The responsibility of the ith data point for the j-th Gaussian component is given by:
M-step (Maximization):
- Update the parameters of the model based on the computed responsibilities.
- The parameter update can be given by,
- Update the mean vectors, covariance matrices, and mixing coefficients using the weighted sum of the data points, where the weights are the responsibilities:
            ----------------------------------- [3695f]
  
            ----------------------------------- [3695g]
  
            ----------------------------------- [3695h]
Repeat:
- Repeat the E-step and M-step until convergence. Convergence can be determined by monitoring changes in the log-likelihood or other convergence criteria.

============================================

=================================================================================