Understanding the Logarithm Trick in Maximum Likelihood Estimation

Maximum Likelihood Estimation is a fundamental and powerful idea that’s at the centre of many things we do with data – so much so that we often use it without knowing it. MLE allows us to find a model’s parameters that are likely to enable the model to represent the data we have on our hands as closely as possible. This short post addresses the logarithm trick which is used to enable simpler MLE calculation.

There are two elements to understanding the formulation of MLE for the common Multivariate Gaussian model (which could be extended to other models equally):

  1. The i.i.d assumption that simplifies the MLE formulation
  2. The logarithm trick with enables solution of the MLE formulation

On this blog I’ve discussed topics like time series analysis in the past where the idea of independent and identically distributed variables is addressed, and of course, being an important statistical topic, is is well explained and understood. The logarithm trick, however, is specific to the simplification and solution of MLE formulations, and is helpful to understand.

The logarithm function very simply enables scale variance in any input data while allowing location invariance. This is extremely helpful when dealing with monotonic input data that we want to ensure continues to be monotonic after transformation, but whose scale we want to change.

When building a model p ( x_1,.... x_n | \theta ) of the data (x_1,.... x_n), the MLE formulation seeks to find the appropriate values of \theta such that

\bigtriangledown_{\theta} \prod_{i = 1}^{n} p(x_i | \theta ) = 0

The interesting thing about the log transform is, as I said earlier that in the transformation ln ( \prod_{i} f_i ) = \sum_{i} ln(f_i) , there is no change in where f_i may attain a maximum or a minimum when it is transformed to ln(f_i) for any i. This logarithm trick enables us to compute the latter product more simply, and thereby execute the MLE.