I recently used this trick to obtain a 10x speedup on an R package I am working on, and was so happy with myself I thought I would share it.
Say we have a low-rank plus diagonal matrix \(\mathbf{C} = \mathbf{W}\mathbf{W}^\top + \sigma^2\mathit{I}\) with \(d \times k\) matrix \(\mathbf{W}\) where \(d \gg k\). If we are interested in solving \(\mathbf{C}^{-1}\mathbf{A}\) for another \(d\times d\) matrix \(\mathbf{A}\), this would normally be achieved by Cholesky-decomposing \(\mathbf{C}\) then using forward/back-substitution to solve the system. The computational complexity here is \(\mathcal{O}(d^3)\) for the Cholesky decomposition, and \(\mathcal{O}(d^2)\) for each forward/back-substitution 1.
However, we can do a lot better than this using the structure of \(\mathbf{C}\). Using the Woodbury matrix identity, the inverse of \(\mathbf{C}\) can be written down in closed form: \[\begin{equation} \mathbf{C}^{-1} = \sigma^{-2}\mathit{I} - \sigma^{-2}\mathbf{W}\mathbf{M}^{-1}\mathbf{W}^\top \tag{1} \end{equation}\] where we have introduced the small \(k\times k\) matrix \(\mathbf{M} = \mathbf{W}^\top\mathbf{W} + \sigma^2\mathit{I}\). Now we could construct \(\mathbf{C}^{-1}\) from this directly and solve \(\mathbf{C}^{-1}\mathbf{A}\) with a matrix multiplication, however multiplying these two \(d\times d\) matrices together is still \(\mathcal{O}(d^3)\) operations! Instead, we can post-multiply equation (1) by \(\mathbf{A}\) to obtain \[\begin{equation} \mathbf{C}^{-1}\mathbf{A} = \sigma^{-2}\mathbf{A} - \sigma^{-2}\mathbf{W}\mathbf{M}^{-1}\mathbf{W}^\top\mathbf{A} \tag{2} \end{equation}\]Now if we compute the second term as \((\mathbf{W}\mathbf{M}^{-1})(\mathbf{W}^\top\mathbf{A})\), then Equation (2) can be computed in \(\mathcal{O}(d^2k + dk^3)\) operations, since there are no \((d\times d)\times(d\times d)\) matrix multiplications. If \(d\gg k\) this can produce considerable speedups.