# Overview

Yesterday, June 25th 2013, I wrote a pro tip on working with primes in Python, especially large primes. I also started a thread on Stackoverflow on this to help clear the air on the methods for working with large primes in Python.

Thanks to a comment left on the Stackoverflow question, I have more results to share with you and also some methods for minor math operations to speed things up for you.

# Get the Code

The code used in this pro tip can be found on *gist.github.com* at primes.py

The code listed in the previous pro tip can be found at fermat.py, miller_rabin.py and solovay-strassen.py

# Benchmark Methods

The test system: Single-Core AMD Athlon 64 LE-1640, 2.6GHz with 2.00GB DDR RAM running Windows 7.

I am using a mixture of two benchmarking methods, one is a homebrew timestamp decorator:

```
from functools import wraps
def benchmark(iterations=10000):
def wrapper(function):
@wraps(function)
def func(*args, **kwargs):
t1 = time()
for i in range(iterations):
call = function(*args, **kwargs)
t2 = time()
t3 = int(1 / ((t2 - t1) / iterations))
print func.func_name, 'at', iterations, 'iterations:', t2 - t1
print 'Can perform, t3, 'calls per second.'
return call
return func
return wrapper
```

The other is Python's cProfile

```
import cProfile
cProfile.run('test_method()')
```

So the full test is:

```
@benchmark()
def test_method():
pass
cProfile.run('test_method()')
```

# Minor Math Operations

## Divisible by Two

Finding if an integer is divisible by two, we can use `n mod 2`

modulo operation. If we get `0`

, we know that `n`

is divisible by 2. Can we make this faster?

```
@benchmark()
def test_mod(n):
return n mod 2
cProfile.run('test_mod(10**100)')
# test_mod at 10000 iterations: 0.0360000133514
# Can perform 277777 calls per second.
```

This isn't bad, we can run `n % 2`

`277777 times per second`

. What about `n & 1`

using the binary `AND`

operator?

```
@benchmark()
def test_and(n):
return n & 1
cProfile.run('test_and(10**100)')
# test_and at 10000 iterations: 0.0299999713898
# Can perform 333333 calls per second.
```

So, as we can see, `n & 1`

out preforms `n % 2`

be `0.00804`

seconds. Allowing us to run this `333333 times per second`

, which is `55556`

more operations per second.

## Divide by 2 or Binary Right Shift?

Divide number by 2:

```
@benchmark()
def test_div(n):
return n / 2
cProfile.run('test_div(10**100)')
# test_div at 10000 iterations: 0.0370001792908
# Can perform 270268 calls per second.
```

Binary Right Shift `n >> 1`

```
@benchmark
def test_brshift(n):
return n >> 1
cProfile.run('test_brshift(10**100)')
# test_brshift at 10000 iterations: 0.029000043869
# Can perform 344827 calls per second.
```

So, by using the `binary right shift`

operator, we can *divide by two* `74559`

times more per second.

## Random Number Generation: x..y

We see two main methods for this, `random.randint`

and `random.randrange`

, which one is better?

```
from random import randint
@benchmark()
def test_randint(x, y)
return randint(x, y)
cProfile.run('test_randint(10**100, 10**101)')
# test_randint at 10000 iterations: 0.270999908447
# Can perform 36900 calls per second.
```

What about `randrange`

?

```
from random import randrange
@benchmark
def test_randrange(x, y)
return randrange(x, y)
cProfile.run('test_randrange(10**100, 10**101)')
# test_randrange at 10000 iterations: 0.233000040054
# Can perform 42918 calls per second.
```

So now we know it is faster to use `random.randrange`

over `random.randint`

to generate a random number. `6018`

more operations per second.

## Using `range`

or `xrange`

:

Iterating with `range`

```
@benchmark()
def test_range(n):
for i in range(n):
continue
cProfile.run('test_range(10**4)')
# test_range at 10000 iterations: 8.27799987793
# Can perform 1208 calls per second.
# from cProfile `20006 function calls in 8.106 seconds`
```

Iterating with `xrange`

```
@benchmark()
def test_xrange(n):
for i in xrange(n):
continue
cProfile.run('test_xrange(10**4)')
# test_xrange at 10000 iterations: 4.91900014877
# Can perform 2032 calls per second.
# from cProfile `10006 function calls in 5.719 seconds`
```

`xrange`

is significantly faster than `range`

with less function calls. Now we know that `xrange`

is the way to go.

And for simplicity, I am going to use `not n & 1`

in the following tests over `n & 1 == 0`

. The benchmarks are minimal, you can run `not n & 1`

2 more times per second than the later.

# Primality Tests

As determined in Prime Numbers in Python, we know that out of all the methods discussed, these two are fastest:

```
def mrange(start, stop, step):
while start < stop:
yield start
start += step
def is_prime(n):
if n == 2:
return True
if not n & 1:
return False
return all(n % i for i in mrange(3, int(sqrt(n)) + 1, 2))
```

Although this out performed all other methods in determining if number `n`

was prime, it does not work for determining the next highest prime. So, we went with the Miller-Rabin primality test.

```
def miller_rabin(n, k=10):
if n == 2:
return True
if not n & 1:
return False
def check(a, s, d, n):
x = pow(a, d, n)
if x == 1:
return True
for i in xrange(s - 1):
if x == n - 1:
return True
x = pow(x, 2, n)
return x == n - 1
s = 0
d = n - 1
while d % 2 == 0:
d >>= 1
s += 1
for i in xrange(k):
a = randrange(2, n - 1)
if not check(a, s, d, n):
return False
return True
```

With this comment left on the thread on Stackoverflow, we will try a few things... Can we reduce the time taken on a Miller-Rabin test, and we will also look at other primality tests not mentioned in the first pro tip

Testing the `Miller-Rabin`

with the code supplied in the comment:

```
def isStrongPseudoprime(n, a):
d, s = n-1, 0
while d%2 == 0:
d, s = d/2, s+1
t = pow(a, d, n)
if t == 1:
return True
while s > 0:
if t == n-1:
return True
t, s = (t*t) % n, s-1
return False
from random import randint
@benchmark()
def test_miller_rabin_new(n, limit=10):
for i in range(limit):
a = randint(2, n-1)
if not isStrongPseudoprime(n, a):
return False
return True
cProfile.run('test_miller_rabin_new(10**100)')
# test_miller_rabin_new at 10000 iterations: 11.0569999218
# Can perform 904 calls per second.
```

This is incredibly slow. What about the `Miller-Rabin`

mentioned in yesterday's pro tip? For space, this is checking the same code listed above, `miller_rabin(n, k=10)`

```
# test_miller_rabin_old at 10000 iterations: 0.0349998474121
# Can perform 285715 calls per second.
```

Using the optimization methods mentioned so far, the only real optimization we can do is replace the `while d % 2 == 0:`

with `while not d & 1:`

while and to start with `s = -1`

over `s = 0`

while factoring powers of 2. Using `s = -1`

here eliminates the `s - 1`

used within `check(a, s, d, n)`

.

```
def miller_rabin(n, k=10):
if n == 2:
return True
if not n & 1:
return False
def check(a, s, d, n):
x = pow(a, d, n)
if x == 1:
return True
for i in xrange(s):
if x == n - 1:
return True
x = pow(x, 2, n)
return x == n - 1
s = -1
d = n - 1
while not d & 1:
d >>= 1
s += 1
for i in xrange(k):
a = randrange(2, n - 1)
if not check(a, s, d, n):
return False
return True
```

It gives some performance upgrades, both used `cProfile.run('test_miller_rabin(10**100)')`

```
test_miller_rabin_old at 10000 iterations: 0.0640001296997
Can perform 156249 calls per second.
test_miller_rabin_new at 10000 iterations: 0.0500001907349
Can perform 199999 calls per second.
```

Doing `43750`

more operations per second.

To simplify tests, and the entire point of good code is reusable code, I have moved `check()`

out of `miller_rabin()`

and have also added `factor()`

. Benchmarking the changes, I got the same `miller_rabin_1 at 10000 iterations: 0.0439999103546`

and `miller_rabin_2 at 10000 iterations: 0.0439999103546`

.

```
def factor(n, p=2):
s = 0
d = n - 1
q = p
while not d & q - 1:
s += 1
q *= p
return s, d // (q // p)
def strong_pseudoprime(n, a, s=None, d=None):
if (s is None) or (d is None):
s, d = factor(n, 2)
x = pow(a, d, n)
if x == 1:
return True
for i in xrange(s):
x = pow(x, 2, n)
if x == n - 1:
return True
return x == n - 1
def miller_rabin(n, k=10):
if n == 2:
return True
if not n & 1:
return False
s, d = factor(n)
for i in xrange(k):
a = randrange(2, n - 1)
if not strong_pseudoprime(n, a, s, d):
return False
return True
```

# Other Primality Algorithms

## Lucas Pseudoprime Tests

For these tests we use the following two methods:

```
def jacobi(a, p):
if (not p & 1) or (p < 0):
raise ValueError('p must be a positive odd number.')
if (a == 0) or (a == 1):
return a
a = a % p
t = 1
while a != 0:
while not a & 1:
a >>= 1
if p & 7 in (3, 5):
t = -t
a, p = p, a
if (a & 3 == 3) and (p & 3) == 3:
t = -t
a = a % p
if p == 1:
return t
return 0
```

And

```
from fractions import gcd
def selfridge(n):
d = 5
s = 1
ds = d * s
while True:
if gcd(ds, n) > 1:
return ds, 0, 0
if jacobi(ds, n) == -1:
return ds, 1, (1 - ds) / 4
d += 2
s *= -1
ds = d * s
```

The first is a standard Lucas pseudoprime test. Taking the code given in this comment, I have benchmarked and rewritten this.

```
def chain(n, u1, v1, u2, v2, d, q, m):
k = q
while m > 0:
u2 = (u2 * v2) % n
v2 = (v2 * v2 - 2 * q) % n
q = (q * q) % n
if m & 1 == 1:
t1, t2 = u2 * v1, u1 * v2
t3, t4 = v2 * v1, u2 * u1 * d
u1, v1 = t1 + t2, t3 + t4
if u1 & 1 == 1:
u1 = u1 + n
if v1 & 1 == 1:
v1 = v1 + n
u1, v1 = (u1 / 2) % n, (v1 / 2) % n
k = (q * k) % n
m = m >> 1
return u1, v1, k
```

## Standard Lucas Pseudoprime Test

```
def lucas_pseudoprime(n):
d, p, q = selfridge(n)
if p == 0:
return n == d
u, v, k = chain(n, 0, 2, 1, p, d, q, (n + 1) / 2)
return u == 0
cProfile.run('lucas_pseudoprime(10**100)')
# lucas_pseudoprime at 10000 iterations: 0.0999999046326
# Can perform 100000 calls per second.
```

The benchmark on the code supplied in the comment was

```
isStandardLucasPseudoprime at 10000 iterations: 0.119999885559
Can perform 83333 calls per second.
```

## Strong Lucas Pseudoprime Test

```
def strong_lucas_pseudoprime(n):
d, p, q = selfridge_2(n)
if p == 0:
return n == d
s, t = factor(n)
u, v, k = chain(n, 1, p, 1, p, d, q, t >> 1)
if (u == 0) or (v == 0):
return True
for i in xrange(1, s):
v = (v * v - 2 * k) % n
k = (k * k) % n
if v == 0:
return True
return False
cProfile.run('strong_lucas_pseudoprime(10**100)')
# strong_lucas_pseudoprime at 10000 iterations: 0.0969998836517
# Can perform 103092 calls per second.
```

The benchmark for the supplied code:

```
isStrongLucasPseudoprime at 10000 iterations: 0.129999876022
Can perform 76923 calls per second.
```

## Baillie–PSW Primality Test

```
def isqrt(n):
if n < 0:
raise ValueError('Square root is not defined for negative numbers.')
x = int(n)
if x == 0:
return 0
a, b = divmod(x.bit_length(), 2)
n = 2 ** (a + b)
while True:
y = (n + x // n) >> 1
if y >= n:
return n
n = y
def is_square(n):
s = isqrt(n)
return s * s == n
def baillie_psw(n, limit=100):
if not n & 1:
return False
if n < 2 or is_square(n):
return False
for i in xrange(3, limit + 1, 2):
if n % i:
return n == i
return strong_pseudoprime(n, 2) and strong_lucas_pseudoprime(n)
cProfile.run('bailli_psw(10**100)')
# baillie_psw at 10000 iterations: 0.113000154495
# Can perform 88495 calls per second.
```

The benchmark for the supplied code:

```
isBaillieWagstaffPrime at 10000 iterations: 2.29200005531
Can perform 4363 calls per second.
```

From these benchmarks, we can see that so far the `Miller-Rabin`

test is still fastest, `test_miller_rabin_new at 10000 iterations: 0.0500001907349`

over the `Baillie-PSW`

at `baillie_psw at 10000 iterations: 0.113000154495`

. So what about determining the next highest prime?

# Determining Next Highest Prime

Changing yesterday's article to use wheel factorization.

**These benchmarks are iterated over 1,000 times for lack of resources.**

Also, do to system limitations, the number we'll be using is `100**10-1`

, not `10**100`

, which is exactly what yesterday's pro tip used.

First we will benchmark the `Miller-Rabin`

test.

```
def next_prime_miller_rabin(n):
if n < 2:
return 2
if n < 5:
return [3, 5, 5][n - 2]
gap = [1, 6, 5, 4, 3, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1,
6, 5, 4, 3, 2, 1, 2]
n = n + 1 if not n & 1 else n + 2
while not miller_rabin(n):
n += gap[n % 30]
return n
cProfile.run('next_prime_miller_rabin(100*10-1)')
# returns `100000000000000000039`
# next_prime_miller_rabin at 1000 iterations: 1.79100012779
# Can perform 558 calls per second.
# 199885 function calls in 2.396 seconds
```

This is pretty good, compared to the benchmark for the previous pro tip:

```
1000 calls, 337 per second.
258669 function calls in 2.971 seconds
```

So, we have reduced the time over 1,000 iterations from `2.971`

to `1.791`

, a `1.18`

second drop. This allows for `221`

more calls per second. We have also significantly reduced the function calls. `258669`

down to `199885`

.

Now, the `Baillie-PSW`

test:

```
def next_prime_baillie_psw(n):
if n < 2:
return 2
if n < 5:
return [3, 5, 5][n - 2]
gap = [1, 6, 5, 4, 3, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1,
6, 5, 4, 3, 2, 1, 2]
n = n + 1 if not n & 1 else n + 2
while not baillie_psw(n):
n += gap[n % 30]
return n
cProfile.run('next_prime_bailie_psw(100*10-1)')
# returns `100000000000000000039`
# next_prime_baillie_psw at 1000 iterations: 0.93499994278
# Can perform 1069 calls per second.
# 69006 function calls in 1.527 seconds
```

Wow, so not only have we dropped this from `1.7910`

seconds to `0.9354`

, we have also dropped the function calls from `199885`

to `69009`

.

# Conclusion

With optimizing some code choices, `not n & 1`

over `n % 2 == 0`

, using `n >> 1`

over `n / 2`

, `random.randrange`

over `random.randint`

, `xrange`

over `range`

... We have been able to speed things up a bit. Going further, we have discussed more methods for determining if a number is prime and also determining the next highest prime.

To recap, we have determined that the Baillie-PSW primality test is much faster than the Miller-Rabin. This is due to the steps that it takes.

- A simple check to see if
`n`

is square. - Trial division to
`limit`

. No need to perform other tests if it passes this.*1* - Running a
`strong pseudoprime test`

for`n, base 2`

. - Running a
`strong pseudoprime test`

for`n, base 3`

. - Finally, running a
`strong Lucas pseudoprime test`

on`n`

.

** 1**: We discovered with yesterday's pro tip that using

`all(n % i for i in mrange(3, int(sqrt(n)) + 1, 2))`

was the fastest for just checking if `n`

was prime. The benchmark was `10000 calls, 53191 per second. 60006 function calls in 0.190 seconds`

on testing `100*10-1`

. With the tests mentioned above, the `baillie_psw`

method out performs this. The benchmark for `baillie_psw(100**10-1)`

comes out to `baillie_psw at 10000 iterations: 0.111000061035. Can perform 90090 calls per second.`

. And we get `30006 function calls in 0.746 seconds`

from `cProfile.run('baillie_psw(100**10-1)')`

Thanks for writing these up.

I may be confused, but isn't there a mistake in the baillie_psw() method?

Where it says

for i in xrange(3, limit + 1, 2):

if n % i:

return n == i

shouldn't the if condition say

if n % i == 0:

? Because you're going to return an answer only if n is divisible by i.

Lars