If you find yourself searching for information on working with prime numbers in Python, you will find many different answers and methods, you may find that some might not work for your needs, especially while working with larger primes and
find next prime number cases.
This was the case for a
base62 hashing library I was working on, BaseHash on github.com which uses golden primes,
62^n * 1.618033988749894848, and multiplicative inverses to create obfuscated reversible
base62 hash identifiers. I wanted a way of extending the
base62 to other bases easily, which the library supports
base94, so I needed an algorithm for finding the next prime using
base^length that was quick and efficient instead of hard-coding the primes. This allows for hash lengths far greater than anyone would want to hard-code for. For example, with
base62, you can produce a hash up to 171 digits before Python gives you an error.
I have written a few different methods for this and have benchmarked them on an old system, Single-Core AMD Athlon 64 LE-1640, 2.6GHz with 2.00GB DDR RAM running Windows 7. I would like to share with you all my findings.
Note: Although this was all done for Python, this can easily be ported to other languages.
Primality Testing Algorithms
First, I will start with the two most common answers you will see on Python forums.
All benchmarks are 10,000 iterations.
from math import sqrt def is_prime_1(n): if n == 2: return True if (n < 2) or (n % 2 == 0): return False return not any(n % i == 0 for i in xrange(3, int(sqrt(n)) + 1, 2)) # benchmark for is_prime_1(100**4-1) 10000 calls, 60975 per second. 50006 function calls in 0.167 seconds
def is_prime_2(n): if n == 2: return True if (n < 2) or (n % 2 == 0): return False return all(n % i for i in xrange(3, int(sqrt(n)) + 1, 2)) # benchmark for is_prime_2(100**4-1) 10000 calls, 74626 per second. 40006 function calls in 0.138 seconds
So, we can see that using
not any allows for
~13651 more calls per second. But, what happens when we start using larger numbers? For instance,
99999999999999999999? You will run into an error,
OverflowError: Python int too large to convert to C long. To get past this, we can use a modified
itertools.count generator, modified to take
steps so we can pass on all
def mrange(start, stop, step=1): i = start while i < stop: yield i i += step
def is_prime_1(n): ... return not any(n % i == 0 for i in mrange(3, int(sqrt(n)) + 1, 2)) # benchmark for is_prime_1(100**10-1) using mrange over xrange 10000 calls, 37037 per second. 70006 function calls in 0.272 seconds
def is_prime_2(n): ... return all(n % i for i in mrange(3, int(sqrt(n)) + 1, 2)) # benchmark for is_prime_2(100**10-1) using mrange over xrange 10000 calls, 53191 per second. 60006 function calls in 0.190 seconds
Okay! So we got the
OverflowError taken care of, and we still know that
all is faster than
Millter-Rabin are fairly large, I have the code up on
gist.github.com for these methods.
fermat.py: on gist.github.com
# benchmark fermat(100**10-1) 10000 calls, 21141 per second. 20006 function calls in 0.481 seconds
Fermat, we see that it is indeed slower, twice as long as what
is_prime_1 and we know
is_prime_2 is even faster. However, we're going from
60006 - 70006 calls down to only
20006. The overall issue with the
Fermat primality test is that it can give false results on pseudoprimes.
miller_rabin.py: on gist.github.com
# benchmark miller_rabin(100**10-1) 10000 calls, 11111 per second. 74800 function calls in 0.902 seconds
Miller-Rabin primality test is significantly slower than any of the methods mentioned thus far, but we have
k accuracy of eliminating false results on pseudoprimes.
solovay-strassen.py: on gist.github.com
# benchmark solovay_strassen(100**10-1) 10000 calls, 2440 per second. 571496 function calls (74873 primitive calls) in 4.100 seconds
We can just say
ouch on using the
Finding the Next Prime
Now to benchmark
find next prime methods using the primality tests mentioned above. We will continue to use
99999999999999999999, which the next prime is
Due to system resources, we will drop these benchmarks down to 1,000 iterations.
These tests all use the following method, changing the
tester parameter with the primality tests mentioned above. Now, we know that the
is_prime_2 primality tests were faster than the other algorithms tested, so they should be faster here too, right?
def next_prime(n, tester): if tester(n): n += 1 if (n % 2 == 0) and (n != 2): n += 1 while True: if tester(n): break n += 2 return n
print next_prime(100**10-1, is_prime_1) - I cannot offer any benchmark data on using this method. I allowed the script to run for over five minutes, got nothing.
print next_prime(100**10-1, is_prime_1) - Same as
next_prime(100**10-1, is_prime_1), I cannot offer any benchmark data on using this method.
print next_prime(100**10-1, fermat)
# returns `100000000000000000039` 1000 calls, 1131 per second. 45006 function calls in 0.885 seconds
print next_prime(100**10-1, miller_rabin)
# returns `100000000000000000039` 1000 calls, 337 per second. 258669 function calls in 2.971 seconds
print next_prime(100**10-1, solovay_strassen)
# returns `100000000000000000039` 1000 calls, 89 per second. 1775492 function calls (223447 primitive calls) in 11.164 seconds
When working with small numbers, it is better and faster to use methods like
is_prime_2, even with finding the next prime using something like
next_prime. However, when working with larger numbers, these simply do not work. They are memory and time heavy.
The other algorithms outlined,
Solovay-Strassen took longer to execute in the primality testing benchmarks, however they work perfect for finding the next prime.
Fermat primality test is the quickest, you do have the chance of false results on pseudoprimes. It is better to go with either the
Miller-Rabin or the
Solovay-Strassen algorithms. For speed and efficiency, the
Miller-Rabin preforms roughly four times faster than the