Adventures in Data Land
Log-probabilities, semirings and floating point numbers

Here’s a trick/bug that is a) really well known in the research community, b) lots of beginners get it wrong nonetheless, c) simple unit tests may not detect it and d) it may be a really fatal bug in your code. You can even find it in large scale machine learning toolboxes.

Suppose you want to do mixture clustering and you happily compute p(x|y) and p(y), say by a mixture of Gaussians. Quite often you’ll need access to p(y|x) which can be computed via

Let’s assume that your code runs well in 2D but now you try it in 100 dimensions and it fails with a division by zero error. What’s going on? After all, your code is algebraically correct. Most likely you ignored the fact that floating point numbers have only a fixed precision and by exponentiating the argument of the Gaussian (recall, we have a normalization that is exponential in the number of dimensions) you encountered numerical underflow. What happened is that you were trying to store the relevant information in the exponent rather than the mantissa of a floating point number. On single precision you have 8 bit for the exponent and 23 for the mantissa and you just ran out of storage. 

Here’s how you can pull things back into the mantissa - simply store log probabilities and operate with them. Hence instead of + and x you should use ‘log +’ and +.

In this case you will not get numerical underflow when adding probabilities since the argument in the log is going to be O(1) (we have at least one term which is exp(0) and the remaining terms are smaller than 1), or at least not due to running out of precision in the exponent.

In more general terms, what happened is that we replaced the operations ‘+’ and ‘x’ by two operations ‘log +’ and ‘+’ which act just in the same way as addition and multiplication. Aji and McEliece formalized this in their paper on the Generalized Distributive Law. Systems that satisfy these operations are called commutative semirings. Some examples are:

  • Set union (+) and intersection (x)
  • The tropical semiring which uses min (+) and + (x)
  • Boolean AND (x) and OR (+)
  • Plain old calculus with addition (+) and multiplication (x)
  • The log + semiring with log + (+) and + (x)

Replacing these symbols in well known algorithms such as dynamic programming gives the forward-backward algorithm, shortest path calculations and others, but this is a story for another day.