Memory Latency, Hashing, Optimal Golomb Rulers and Feistel Networks

In many problems involving hashing we want to look up a range of elements from a vector, e.g. of the form $$v[h(i,j)]$$ for arbitrary $$i$$ and for a range of $$j \in \{1, \ldots, n\}$$ where $$h(i,j)$$ is a hash function. This happens e.g. for multiclass classification, collaborative filtering, and multitask learning.

While this works just fine in terms of estimation performance, traversing all values of j leads to an algorithm which is horrible in terms of memory access patterns. Modern RAM chips are much faster (over 10x) when it comes to reading values in sequence than when carrying out random reads. Furthermore, random access destroys the benefit of a cache. This leads to algorithms which are efficient in terms of their memory footprint, however, they can be relatively slow. One way to address this is to bound the range of $$h(i,j)$$ for different values of j. Here are some ways of how we could do this:

1. Decompose $$h(i,j) = h(i) + j$$. This is computationally very cheap, it has good sequential access properties but it leads to horrible collisions should there ever be two $$i$$ and $$i’$$ for which $$|h(i) - h(i’)| \leq n$$.
2. Decompose $$h(i,j) = h(i) + h’(j)$$ where $$h’(j)$$ has a small range of values.
This is a really bad idea since now we have a nontrivial probability of collision as soon as the range of $$h’(j)$$ is less than $$n^2$$ due to the birthday paradox. Moreover, for adjacent values $$h(i)$$ and $$h(i’)$$ we will get many collisions.
3. Decompose $$h(i,j) = h(i) + g(j)$$ where $$g(j)$$ is an Optimal Golomb Ruler.
The latter is an increasing sequence of integers for which any pairwise distance occurs exactly once. In other words, the condition $$h(a) - h(b) = h(c) - h(d)$$ implies that $$a = c$$ and $$b = d$$. John Langford proposed this to address the problem. In fact, it solves our problem since there are a) no collisions for a fixed $$i$$ and b) for neighboring values $$h(i)$$ and $$h(i’)$$ we will get at most one collision (due to the Golomb ruler property). Alas, this only works up to $$n=26$$ since finding an Optimal Golomb Ruler is hard (it is currently unknown whether it is actually NP hard).
4. An alternative that works for larger n and that is sufficiently simple to compute is to use cryptography. After all, all we want is that the hash function $$h’(j)$$ has small range and that it doesn’t have any self collisions or any systematic collisions. We can achieve this by encrypting j using the key i to generate an encrypted message of N possible values. In other words we use
$$h(i,j) = h(i) + \mathrm{crypt}(j|i,N)$$
Since it is an encryption of j, the mapping is invertible and we won’t have collisions for a given value of j. Furthermore, for different i the encodings will be uncorrelated (after all, i is the key). Finally, we can control the range $$N>n$$ simply by choosing the encryption algorithm. In this case the random memory access is of bounded range, hence the CPU cache will not suffer from too many misses.

A particularly nice algorithm is the Feistel cipher. It works as follows: define the iterative map

$$f(x,y) = (y, x \mathrm{XOR} h(y))$$

Here $$h$$ is a hash function. After 4 iterations $$(x,y) \to f(x,y)$$ we obtain an encryption of $$(x,y)$$. Now use $$x=i$$ and $$y = j$$ to obtain the desired result. Basically we are trading off memory latency with computation (which is local).