So prime numbers are useful. I found that I needed to generate a prime number for a hash table (details of which should be posted soon. If all goes well that is…). And I decided to come up with an algorithm to generate that prime number. Clearly stating the problem:

“Find the smallest prime greater than n”

The simplest way of generating primes is of course using Eurothenes' sieve1. However, this naive method runs in order of $\frac{n}{2} + \frac{n}{3} + \frac{n}{5} \ldots + \frac{n}{p}$ time, which MathWorld tells me is about $T(n) = \omega(n \ln(\ln(n)))$, and uses $\Theta(n)$ memory. This sucks for big numbers (or atleast I think I can do better). Further more, this step has to be repeated (or intelligently modified) till a prime number has been found. (I will show later that the best this can go would be $T(\lceil(\sqrt{n})+1^2\rceil)$), which is of the same order, $\omega(n \ln(\ln(n+1)))$ (mind you this is a lower bound). The algorithm I came up with, which I thought might be better is: generate all primes till $\sqrt{n}$, and check if any of them divide the number in question. If they don’t well, we have a prime, ladies and gentlemen. If they do, then we’ll just have to move along won’t we…

This method takes $\Theta(T(\sqrt{n}) + \pi(\sqrt{n}))$ time for each step. $\pi(n)$ is a very very special number. It denotes the number of primes less than a number ’n’. The prime number theorem2 states that this number converges to $Li(x)$, i.e. $\int \frac{x}{\ln(x)} dx$.

Another tremendously useful aspect of this theorem is that it can be used to trivially prove that there will always be a prime between two perfect squares (thus proving the statement in the previous approach). We can use this aspect by only checking division with primes less than $\sqrt{n}+1$. This gives us a worst case scenario of $\Theta((2n+1)\pi(\sqrt{n}+1))$ (This comes from $(n+1)^2 - n^2 = 2n+1$). This simplifies to $\Theta(n^{1.5}/\ln(n))$. It also requires $\Theta(\sqrt{n})$ memory (to compute the sieve). Much to my embarrassment, it turns out that the former is in fact better, then mine (though it does trade off for memory).

Finally, I have tried to improve this algorithm by maintaining the remainder of all the prime divisions, and then incrementing by the least non-conflicting number. This is because if $n \equiv -1 \pmod p)$, then $n+1$ will be divisible by that prime. I don’t know how to analyse the run time of such an algorithm, but I’m sure it’s better than the latter.

Code for this experiment can be found here.

Finally, thanks to:

  1. Pranesh Srinivasan: Who convinced me that the sieve is the best method for prime number generation, for all numbers $< 10^7$.
  2. Anirudh Sivaraman: For pointing out errors in my running-order calculation. (It doesn’t change the result though).