Prime Numbers with Python
Overview
If you find yourself searching for information on working with prime numbers in Python, you will find many different answers and methods, you may find that some might not work for your needs, especially while working with larger primes and find next prime number
cases.
This was the case for a base62
hashing library I was working on, BaseHash on github.com which uses golden primes, 62^n * 1.618033988749894848
, and multiplicative inverses to create obfuscated reversible base62
hash identifiers. I wanted a way of extending the base62
to other bases easily, which the library supports base36
, base52
, base56
, base58
, base62
, and base94
, so I needed an algorithm for finding the next prime using base^length
that was quick and efficient instead of hard-coding the primes. This allows for hash lengths far greater than anyone would want to hard-code for. For example, with base62
, you can produce a hash up to 171 digits before Python gives you an error.
I have written a few different methods for this and have benchmarked them on an old system, Single-Core AMD Athlon 64 LE-1640, 2.6GHz with 2.00GB DDR RAM running Windows 7. I would like to share with you all my findings.
Note: Although this was all done for Python, this can easily be ported to other languages.
Primality Testing Algorithms
First, I will start with the two most common answers you will see on Python forums.
All benchmarks are 10,000 iterations.
Using not any
from math import sqrt
def is_prime_1(n):
if n == 2:
return True
if (n < 2) or (n % 2 == 0):
return False
return not any(n % i == 0 for i in xrange(3, int(sqrt(n)) + 1, 2))
# benchmark for is_prime_1(100**4-1)
10000 calls, 60975 per second.
50006 function calls in 0.167 seconds
Using all
def is_prime_2(n):
if n == 2:
return True
if (n < 2) or (n % 2 == 0):
return False
return all(n % i for i in xrange(3, int(sqrt(n)) + 1, 2))
# benchmark for is_prime_2(100**4-1)
10000 calls, 74626 per second.
40006 function calls in 0.138 seconds
So, we can see that using all
over not any
allows for ~13651
more calls per second. But, what happens when we start using larger numbers? For instance, 100**10-1
or 99999999999999999999
? You will run into an error, OverflowError: Python int too large to convert to C long
. To get past this, we can use a modified itertools.count
generator, modified to take steps
so we can pass on all even
numbers.
def mrange(start, stop, step=1):
i = start
while i < stop:
yield i
i += step
is_prime_1
:
def is_prime_1(n):
...
return not any(n % i == 0 for i in mrange(3, int(sqrt(n)) + 1, 2))
# benchmark for is_prime_1(100**10-1) using mrange over xrange
10000 calls, 37037 per second.
70006 function calls in 0.272 seconds
is_prime_2
:
def is_prime_2(n):
...
return all(n % i for i in mrange(3, int(sqrt(n)) + 1, 2))
# benchmark for is_prime_2(100**10-1) using mrange over xrange
10000 calls, 53191 per second.
60006 function calls in 0.190 seconds
Okay! So we got the OverflowError
taken care of, and we still know that all
is faster than not any
.
So, what about the other methods that we see being said to use, such as the Solovay-Strassen, the Miller-Rabin, and the Fermat primality tests?
Since the Solovay-Strassen
and Millter-Rabin
are fairly large, I have the code up on gist.github.com
for these methods.
fermat.py: on gist.github.com
# benchmark fermat(100**10-1)
10000 calls, 21141 per second.
20006 function calls in 0.481 seconds
With Fermat
, we see that it is indeed slower, twice as long as what is_prime_1
and we know is_prime_2
is even faster. However, we're going from 60006 - 70006
calls down to only 20006
. The overall issue with the Fermat
primality test is that it can give false results on pseudoprimes.
miller_rabin.py: on gist.github.com
# benchmark miller_rabin(100**10-1)
10000 calls, 11111 per second.
74800 function calls in 0.902 seconds
The Miller-Rabin
primality test is significantly slower than any of the methods mentioned thus far, but we have k
accuracy of eliminating false results on pseudoprimes.
solovay-strassen.py: on gist.github.com
# benchmark solovay_strassen(100**10-1)
10000 calls, 2440 per second.
571496 function calls (74873 primitive calls) in 4.100 seconds
We can just say ouch
on using the Solovay-Strassen
algorithm.
Finding the Next Prime
Now to benchmark find next prime
methods using the primality tests mentioned above. We will continue to use 100**10-1
or 99999999999999999999
, which the next prime is 100000000000000000039
Due to system resources, we will drop these benchmarks down to 1,000 iterations.
These tests all use the following method, changing the tester
parameter with the primality tests mentioned above. Now, we know that the is_prime_1
and is_prime_2
primality tests were faster than the other algorithms tested, so they should be faster here too, right?
def next_prime(n, tester):
if tester(n):
n += 1
if (n % 2 == 0) and (n != 2):
n += 1
while True:
if tester(n):
break
n += 2
return n
print next_prime(100**10-1, is_prime_1)
- I cannot offer any benchmark data on using this method. I allowed the script to run for over five minutes, got nothing.
print next_prime(100**10-1, is_prime_1)
- Same as next_prime(100**10-1, is_prime_1)
, I cannot offer any benchmark data on using this method.
print next_prime(100**10-1, fermat)
# returns `100000000000000000039`
1000 calls, 1131 per second.
45006 function calls in 0.885 seconds
print next_prime(100**10-1, miller_rabin)
# returns `100000000000000000039`
1000 calls, 337 per second.
258669 function calls in 2.971 seconds
print next_prime(100**10-1, solovay_strassen)
# returns `100000000000000000039`
1000 calls, 89 per second.
1775492 function calls (223447 primitive calls) in 11.164 seconds
Conclusion
When working with small numbers, it is better and faster to use methods like is_prime_1
and is_prime_2
, even with finding the next prime using something like next_prime
. However, when working with larger numbers, these simply do not work. They are memory and time heavy.
The other algorithms outlined, Fermat
, Miller-Rabin
, and Solovay-Strassen
took longer to execute in the primality testing benchmarks, however they work perfect for finding the next prime.
Although the Fermat
primality test is the quickest, you do have the chance of false results on pseudoprimes. It is better to go with either the Miller-Rabin
or the Solovay-Strassen
algorithms. For speed and efficiency, the Miller-Rabin
preforms roughly four times faster than the Solovay-Strassen
.
Written by Nathan Lucas
Related protips
1 Response
I notice a bug when checking for primality of 28430288029929701389 when using "isprime2". The script works for values much larger than this, but it gets hung when checking this value for primality.