Factor analysis is a statistical method. It is used to describe variability among correlated observed variables in terms of a potentially lower number of unobserved variables.
The generative model is given by
y = μ + Λx +ε
y is P ×1 dimension observed variable
μ is P ×1 dimension mean vector
Λ is P × R dimension factor loading matrix
x is R×1 dimension unobserved variable (or latent variable)
ε is P ×1 dimension error term
We assume that
Ε(x) = Ε(ε) = 0
Ε(ΛΛT ) = Ι
p(x) = N (x |0, I )
Ε(y) = μ
Σ = Ε(yyT ) = ΛΛT + Ψ
p(y|θ ) = N(y|μ, ΛΛT +Ψ)
Mean μ is estimated using the observed variable.
The model parameters Λ,Ψ are estimated using expectation maximization (EM) algorithm.
Initially, model parameters Λ,Ψ are selected with random values and iteratively updated with EM algorithm.
In E step, the posterior p(xn |yn, θt) is defined as below,
qt+1 = p(xn |yn, θt ) = N(xn|mn,Vn)
Vn = (I −ΛTΨ−1Λ)−1
mn = VnΛTΨ−1(yn − μ )
In M step, the model parameters Λt+1, Ψt+1 are updated with posterior estimation as below,
Λt+1 =( ΣynmnT)(ΣVn)-1
Ψt+1 = 1/N diag (ΣynynT + Λt+1 ΣmnynT)
In each iteration, likelihood estimate L(Λ,Ψ) is estimated in order to confirm whether model is converged. Likelihood estimate L(Λ,Ψ) is estimated as below,
L(Λ,Ψ) = N/2 log|Ψ| − N/2 tr(SΨ−1)
Where S = 1/N Σ (yn-Λxn)T(yn-Λxn)
The maximum likelihood estimate values are plotted again iterations, and it can be observed that when the number of iterations increases, the model is converged as shown in following Figure,
The main applications of factor analysis is to reduce the number of variables and to detect structure in the relationships between variables. Factor analysis is commonly used to model the varialbiy in speaker and face recognition applications.