Tamas Sarlos pointed out a much smarter strategy on how to obtain a sparse representation of a (possibly dense) vector: Priority Sampling by Duffield, Lund and Thorup (Journal of the ACM 2006). The idea is quite ingenious and (surprisingly so) essentially optimal, as Mario Szegedy showed. Here’s the algorithm:
For each \(x_i\) compute a priority \(p_i = \frac{x_i}{a_i}\) where \(a_i \sim U(0, 1]\) is drawn from a uniform distribution. Denote by \(\tau\) the k+1 largest such priority. Then pick all k indices i which satisfy \(p_i > \tau\) and assign them the value \(s_i = \mathrm{max}(x_i, \tau)\). All other coordinates are set to \(s_i = 0\).
This provides an estimator with the following properties:
- The variance is no larger than that of the best k+1 sparse estimator.
- The entries \(s_i\) satisfy \(\mathbf{E}[s_i] = x_i\)
- The covariance vanishes, i.e. \(\mathbf{E}[s_i s_j] = x_i x_j\)
Note that we assumed that all \(x_i \geq 0\). Otherwise simply apply the same algorithm to \(|x_i|\) and then return signed versions of the estimate.