Probabilistic clustering – Gaussian mixture model

A Gaussian mixture model (GMM) is useful for modeling data which comes from one of several groups. The groups might be different from each other, but data points within the same group can be well-modeled by a Gaussian distribution.
For example, the height of man and woman are normally distributed, and each of probability density distribution (PDF) is shown as bellow,
 simple example of Gaussian
A multidimensional GMM is specified by a sum of Gaussians,

gmm1

Where
gmm2GMM is parameterized by weight, mean and covariance.
gmm3

The model parameters, weight, mean and covariance are estimated by EM algorithm.

In E step, probabilistic assignments are estimated using old model parameters,

gmm4

In M step, model paramters are updated using the assignment of individual points,

gmm5
The likelihood increases at each iterations, and when there is not much increase in likelihood value, we can assume that the EM algorithm is converged.

An animation demonstrating GMM fitting using the EM algorithm is shown below. The algorithm steps through from a random initialization to convergence.
gmm6
Advertisements

2 thoughts on “Probabilistic clustering – Gaussian mixture model

  1. Hi AHilan,
    Good explanation and great sharing..
    If i want to apply the latent variable using multinomial distribution,
    could you share any good resource or notes or any sample code (in R preferably)?
    Thank you in advance

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s