Last Updated: December 29, 2016
·
9.474K
· bnlucas

# Overview

If you find yourself searching for information on working with prime numbers in Python, you will find many different answers and methods, you may find that some might not work for your needs, especially while working with larger primes and `find next prime number` cases.

This was the case for a `base62` hashing library I was working on, BaseHash on github.com which uses golden primes, `62^n * 1.618033988749894848`, and multiplicative inverses to create obfuscated reversible `base62` hash identifiers. I wanted a way of extending the `base62` to other bases easily, which the library supports `base36`, `base52`, `base56`, `base58`, `base62`, and `base94`, so I needed an algorithm for finding the next prime using `base^length` that was quick and efficient instead of hard-coding the primes. This allows for hash lengths far greater than anyone would want to hard-code for. For example, with `base62`, you can produce a hash up to 171 digits before Python gives you an error.

I have written a few different methods for this and have benchmarked them on an old system, Single-Core AMD Athlon 64 LE-1640, 2.6GHz with 2.00GB DDR RAM running Windows 7. I would like to share with you all my findings.

Note: Although this was all done for Python, this can easily be ported to other languages.

# Primality Testing Algorithms

All benchmarks are 10,000 iterations.

Using `not any`

``````from math import sqrt
def is_prime_1(n):
if n == 2:
return True
if (n < 2) or (n % 2 == 0):
return False
return not any(n % i == 0 for i in xrange(3, int(sqrt(n)) + 1, 2))

# benchmark  for is_prime_1(100**4-1)
10000 calls, 60975 per second.
50006 function calls in 0.167 seconds``````

Using `all`

``````def is_prime_2(n):
if n == 2:
return True
if (n < 2) or (n % 2 == 0):
return False
return all(n % i for i in xrange(3, int(sqrt(n)) + 1, 2))

# benchmark for is_prime_2(100**4-1)
10000 calls, 74626 per second.
40006 function calls in 0.138 seconds``````

So, we can see that using `all` over `not any` allows for `~13651` more calls per second. But, what happens when we start using larger numbers? For instance, `100**10-1` or `99999999999999999999`? You will run into an error, `OverflowError: Python int too large to convert to C long`. To get past this, we can use a modified `itertools.count` generator, modified to take `steps` so we can pass on all `even` numbers.

``````def mrange(start, stop, step=1):
i = start
while i < stop:
yield i
i += step``````

`is_prime_1`:

``````def is_prime_1(n):
...
return not any(n % i == 0 for i in mrange(3, int(sqrt(n)) + 1, 2))

# benchmark for is_prime_1(100**10-1) using mrange over xrange
10000 calls, 37037 per second.
70006 function calls in 0.272 seconds``````

`is_prime_2`:

``````def is_prime_2(n):
...
return all(n % i for i in mrange(3, int(sqrt(n)) + 1, 2))

# benchmark for is_prime_2(100**10-1) using mrange over xrange
10000 calls, 53191 per second.
60006 function calls in 0.190 seconds``````

Okay! So we got the `OverflowError` taken care of, and we still know that `all` is faster than `not any`.

So, what about the other methods that we see being said to use, such as the Solovay-Strassen, the Miller-Rabin, and the Fermat primality tests?

Since the `Solovay-Strassen` and `Millter-Rabin` are fairly large, I have the code up on `gist.github.com` for these methods.

fermat.py: on gist.github.com

``````# benchmark fermat(100**10-1)
10000 calls, 21141 per second.
20006 function calls in 0.481 seconds``````

With `Fermat`, we see that it is indeed slower, twice as long as what `is_prime_1` and we know `is_prime_2` is even faster. However, we're going from `60006 - 70006` calls down to only `20006`. The overall issue with the `Fermat` primality test is that it can give false results on pseudoprimes.

miller_rabin.py: on gist.github.com

``````# benchmark miller_rabin(100**10-1)
10000 calls, 11111 per second.
74800 function calls in 0.902 seconds``````

The `Miller-Rabin` primality test is significantly slower than any of the methods mentioned thus far, but we have `k` accuracy of eliminating false results on pseudoprimes.

solovay-strassen.py: on gist.github.com

``````# benchmark solovay_strassen(100**10-1)
10000 calls, 2440 per second.
571496 function calls (74873 primitive calls) in 4.100 seconds``````

We can just say `ouch` on using the `Solovay-Strassen` algorithm.

# Finding the Next Prime

Now to benchmark `find next prime` methods using the primality tests mentioned above. We will continue to use `100**10-1` or `99999999999999999999`, which the next prime is `100000000000000000039`

Due to system resources, we will drop these benchmarks down to 1,000 iterations.

These tests all use the following method, changing the `tester` parameter with the primality tests mentioned above. Now, we know that the `is_prime_1` and `is_prime_2` primality tests were faster than the other algorithms tested, so they should be faster here too, right?

``````def next_prime(n, tester):
if tester(n):
n += 1
if (n % 2 == 0) and (n != 2):
n += 1
while True:
if tester(n):
break
n += 2
return n``````

`print next_prime(100**10-1, is_prime_1)` - I cannot offer any benchmark data on using this method. I allowed the script to run for over five minutes, got nothing.

`print next_prime(100**10-1, is_prime_1)` - Same as `next_prime(100**10-1, is_prime_1)`, I cannot offer any benchmark data on using this method.

`print next_prime(100**10-1, fermat)`

``````# returns `100000000000000000039`

1000 calls, 1131 per second.
45006 function calls in 0.885 seconds``````

`print next_prime(100**10-1, miller_rabin)`

``````# returns `100000000000000000039`

1000 calls, 337 per second.
258669 function calls in 2.971 seconds``````

`print next_prime(100**10-1, solovay_strassen)`

``````# returns `100000000000000000039`

1000 calls, 89 per second.
1775492 function calls (223447 primitive calls) in 11.164 seconds``````

# Conclusion

When working with small numbers, it is better and faster to use methods like `is_prime_1` and `is_prime_2`, even with finding the next prime using something like `next_prime`. However, when working with larger numbers, these simply do not work. They are memory and time heavy.

The other algorithms outlined, `Fermat`, `Miller-Rabin`, and `Solovay-Strassen` took longer to execute in the primality testing benchmarks, however they work perfect for finding the next prime.

Although the `Fermat` primality test is the quickest, you do have the chance of false results on pseudoprimes. It is better to go with either the `Miller-Rabin` or the `Solovay-Strassen` algorithms. For speed and efficiency, the `Miller-Rabin` preforms roughly four times faster than the `Solovay-Strassen`.

2.834K
3

### Flatten a list of lists in one line in Python

317.2K
7

277.2K
23

#### 1 Response

I notice a bug when checking for primality of 28430288029929701389 when using "isprime2". The script works for values much larger than this, but it gets hung when checking this value for primality.

over 1 year ago ·

#### Have a fresh tip? Share with Coderwall community!

Best #Python Authors
terrasea
317.2K
cheglastgnat
277.2K
saji89
249.1K
lotia
191.7K
Awesome Job