Last Updated: December 29, 2016

· bnlucas

Prime Numbers with Python

#python

Overview

If you find yourself searching for information on working with prime numbers in Python, you will find many different answers and methods, you may find that some might not work for your needs, especially while working with larger primes and find next prime number cases.

This was the case for a base62 hashing library I was working on, BaseHash on github.com which uses golden primes, 62^n * 1.618033988749894848, and multiplicative inverses to create obfuscated reversible base62 hash identifiers. I wanted a way of extending the base62 to other bases easily, which the library supports base36, base52, base56, base58, base62, and base94, so I needed an algorithm for finding the next prime using base^length that was quick and efficient instead of hard-coding the primes. This allows for hash lengths far greater than anyone would want to hard-code for. For example, with base62, you can produce a hash up to 171 digits before Python gives you an error.

I have written a few different methods for this and have benchmarked them on an old system, Single-Core AMD Athlon 64 LE-1640, 2.6GHz with 2.00GB DDR RAM running Windows 7. I would like to share with you all my findings.

Note: Although this was all done for Python, this can easily be ported to other languages.

Primality Testing Algorithms

First, I will start with the two most common answers you will see on Python forums.

All benchmarks are 10,000 iterations.

Using not any

from math import sqrt
def is_prime_1(n):
    if n == 2:
        return True
    if (n < 2) or (n % 2 == 0):
        return False
    return not any(n % i == 0 for i in xrange(3, int(sqrt(n)) + 1, 2))

# benchmark  for is_prime_1(100**4-1)
10000 calls, 60975 per second.
50006 function calls in 0.167 seconds

Using all

def is_prime_2(n):
    if n == 2:
        return True
    if (n < 2) or (n % 2 == 0):
        return False
    return all(n % i for i in xrange(3, int(sqrt(n)) + 1, 2))

# benchmark for is_prime_2(100**4-1)
10000 calls, 74626 per second.
40006 function calls in 0.138 seconds

So, we can see that using all over not any allows for ~13651 more calls per second. But, what happens when we start using larger numbers? For instance, 100**10-1 or 99999999999999999999? You will run into an error, OverflowError: Python int too large to convert to C long. To get past this, we can use a modified itertools.count generator, modified to take steps so we can pass on all even numbers.

def mrange(start, stop, step=1):
    i = start
    while i < stop:
        yield i
        i += step

is_prime_1:

def is_prime_1(n):
    ...
    return not any(n % i == 0 for i in mrange(3, int(sqrt(n)) + 1, 2))

# benchmark for is_prime_1(100**10-1) using mrange over xrange
10000 calls, 37037 per second.
70006 function calls in 0.272 seconds

is_prime_2:

def is_prime_2(n):
    ...
    return all(n % i for i in mrange(3, int(sqrt(n)) + 1, 2))

# benchmark for is_prime_2(100**10-1) using mrange over xrange
10000 calls, 53191 per second.
60006 function calls in 0.190 seconds

Okay! So we got the OverflowError taken care of, and we still know that all is faster than not any.

So, what about the other methods that we see being said to use, such as the Solovay-Strassen, the Miller-Rabin, and the Fermat primality tests?

Since the Solovay-Strassen and Millter-Rabin are fairly large, I have the code up on gist.github.com for these methods.

fermat.py: on gist.github.com

# benchmark fermat(100**10-1)
10000 calls, 21141 per second.
20006 function calls in 0.481 seconds

With Fermat, we see that it is indeed slower, twice as long as what is_prime_1 and we know is_prime_2 is even faster. However, we're going from 60006 - 70006 calls down to only 20006. The overall issue with the Fermat primality test is that it can give false results on pseudoprimes.

miller_rabin.py: on gist.github.com

# benchmark miller_rabin(100**10-1)
10000 calls, 11111 per second.
74800 function calls in 0.902 seconds

The Miller-Rabin primality test is significantly slower than any of the methods mentioned thus far, but we have k accuracy of eliminating false results on pseudoprimes.

solovay-strassen.py: on gist.github.com

# benchmark solovay_strassen(100**10-1)
10000 calls, 2440 per second.
571496 function calls (74873 primitive calls) in 4.100 seconds

We can just say ouch on using the Solovay-Strassen algorithm.

Finding the Next Prime

Now to benchmark find next prime methods using the primality tests mentioned above. We will continue to use 100**10-1 or 99999999999999999999, which the next prime is 100000000000000000039

Due to system resources, we will drop these benchmarks down to 1,000 iterations.

These tests all use the following method, changing the tester parameter with the primality tests mentioned above. Now, we know that the is_prime_1 and is_prime_2 primality tests were faster than the other algorithms tested, so they should be faster here too, right?

def next_prime(n, tester):
    if tester(n):
        n += 1
    if (n % 2 == 0) and (n != 2):
        n += 1
    while True:
        if tester(n):
            break
        n += 2
    return n

print next_prime(100**10-1, is_prime_1) - I cannot offer any benchmark data on using this method. I allowed the script to run for over five minutes, got nothing.

print next_prime(100**10-1, is_prime_1) - Same as next_prime(100**10-1, is_prime_1), I cannot offer any benchmark data on using this method.

print next_prime(100**10-1, fermat)

# returns `100000000000000000039`

1000 calls, 1131 per second.
45006 function calls in 0.885 seconds

print next_prime(100**10-1, miller_rabin)

# returns `100000000000000000039`

1000 calls, 337 per second.
258669 function calls in 2.971 seconds

print next_prime(100**10-1, solovay_strassen)

# returns `100000000000000000039`

1000 calls, 89 per second.
1775492 function calls (223447 primitive calls) in 11.164 seconds

Conclusion

When working with small numbers, it is better and faster to use methods like is_prime_1 and is_prime_2, even with finding the next prime using something like next_prime. However, when working with larger numbers, these simply do not work. They are memory and time heavy.

The other algorithms outlined, Fermat, Miller-Rabin, and Solovay-Strassen took longer to execute in the primality testing benchmarks, however they work perfect for finding the next prime.

Although the Fermat primality test is the quickest, you do have the chance of false results on pseudoprimes. It is better to go with either the Miller-Rabin or the Solovay-Strassen algorithms. For speed and efficiency, the Miller-Rabin preforms roughly four times faster than the Solovay-Strassen.

Written by Nathan Lucas

Say Thanks

Respond

1 Response

Add your response

sanjosanjo

I notice a bug when checking for primality of 28430288029929701389 when using "isprime2". The script works for values much larger than this, but it gets hung when checking this value for primality.

over 1 year ago ·