t0u70w
Last Updated: February 25, 2016
·
2.468K
· bnlucas # Overview

Yesterday, June 25th 2013, I wrote a pro tip on working with primes in Python, especially large primes. I also started a thread on Stackoverflow on this to help clear the air on the methods for working with large primes in Python.

Thanks to a comment left on the Stackoverflow question, I have more results to share with you and also some methods for minor math operations to speed things up for you.

# Get the Code

The code used in this pro tip can be found on gist.github.com at primes.py

The code listed in the previous pro tip can be found at fermat.py, miller_rabin.py and solovay-strassen.py

# Benchmark Methods

The test system: Single-Core AMD Athlon 64 LE-1640, 2.6GHz with 2.00GB DDR RAM running Windows 7.

I am using a mixture of two benchmarking methods, one is a homebrew timestamp decorator:

``````from functools import wraps

def benchmark(iterations=10000):
def wrapper(function):
@wraps(function)
def func(*args, **kwargs):
t1 = time()
for i in range(iterations):
call = function(*args, **kwargs)
t2 = time()
t3 = int(1 / ((t2 - t1) / iterations))
print func.func_name, 'at', iterations, 'iterations:', t2 - t1
print 'Can perform, t3, 'calls per second.'
return call
return func
return wrapper``````

The other is Python's cProfile

``````import cProfile
cProfile.run('test_method()')``````

So the full test is:

``````@benchmark()
def test_method():
pass

cProfile.run('test_method()')``````

# Minor Math Operations

## Divisible by Two

Finding if an integer is divisible by two, we can use `n mod 2` modulo operation. If we get `0`, we know that `n` is divisible by 2. Can we make this faster?

``````@benchmark()
def test_mod(n):
return n mod 2

cProfile.run('test_mod(10**100)')

# test_mod at 10000 iterations: 0.0360000133514
# Can perform 277777 calls per second.``````

This isn't bad, we can run `n % 2` `277777 times per second`. What about `n & 1` using the binary `AND` operator?

``````@benchmark()
def test_and(n):
return n & 1

cProfile.run('test_and(10**100)')

# test_and at 10000 iterations: 0.0299999713898
# Can perform 333333 calls per second.``````

So, as we can see, `n & 1` out preforms `n % 2` be `0.00804` seconds. Allowing us to run this `333333 times per second`, which is `55556` more operations per second.

## Divide by 2 or Binary Right Shift?

Divide number by 2:

``````@benchmark()
def test_div(n):
return n / 2

cProfile.run('test_div(10**100)')

# test_div at 10000 iterations: 0.0370001792908
# Can perform 270268 calls per second.``````

Binary Right Shift `n >> 1`

``````@benchmark
def test_brshift(n):
return n >> 1

cProfile.run('test_brshift(10**100)')

# test_brshift at 10000 iterations: 0.029000043869
# Can perform 344827 calls per second.``````

So, by using the `binary right shift` operator, we can divide by two `74559` times more per second.

## Random Number Generation: x..y

We see two main methods for this, `random.randint` and `random.randrange`, which one is better?

``````from random import randint

@benchmark()
def test_randint(x, y)
return randint(x, y)

cProfile.run('test_randint(10**100, 10**101)')

# test_randint at 10000 iterations: 0.270999908447
# Can perform 36900 calls per second.``````

What about `randrange`?

``````from random import randrange

@benchmark
def test_randrange(x, y)
return randrange(x, y)

cProfile.run('test_randrange(10**100, 10**101)')

# test_randrange at 10000 iterations: 0.233000040054
# Can perform 42918 calls per second.``````

So now we know it is faster to use `random.randrange` over `random.randint` to generate a random number. `6018` more operations per second.

## Using `range` or `xrange`:

Iterating with `range`

``````@benchmark()
def test_range(n):
for i in range(n):
continue

cProfile.run('test_range(10**4)')

# test_range at 10000 iterations: 8.27799987793
# Can perform 1208 calls per second.

# from cProfile `20006 function calls in 8.106 seconds```````

Iterating with `xrange`

``````@benchmark()
def test_xrange(n):
for i in xrange(n):
continue

cProfile.run('test_xrange(10**4)')

# test_xrange at 10000 iterations: 4.91900014877
# Can perform 2032 calls per second.

# from cProfile `10006 function calls in 5.719 seconds```````

`xrange` is significantly faster than `range` with less function calls. Now we know that `xrange` is the way to go.

And for simplicity, I am going to use `not n & 1` in the following tests over `n & 1 == 0`. The benchmarks are minimal, you can run `not n & 1` 2 more times per second than the later.

# Primality Tests

As determined in Prime Numbers in Python, we know that out of all the methods discussed, these two are fastest:

``````def mrange(start, stop, step):
while start < stop:
yield start
start += step

def is_prime(n):
if n == 2:
return True
if not n & 1:
return False
return all(n % i for i in mrange(3, int(sqrt(n)) + 1, 2))``````

Although this out performed all other methods in determining if number `n` was prime, it does not work for determining the next highest prime. So, we went with the Miller-Rabin primality test.

``````def miller_rabin(n, k=10):
if n == 2:
return True
if not n & 1:
return False

def check(a, s, d, n):
x = pow(a, d, n)
if x == 1:
return True
for i in xrange(s - 1):
if x == n - 1:
return True
x = pow(x, 2, n)
return x == n - 1

s = 0
d = n - 1

while d % 2 == 0:
d >>= 1
s += 1

for i in xrange(k):
a = randrange(2, n - 1)
if not check(a, s, d, n):
return False
return True``````

With this comment left on the thread on Stackoverflow, we will try a few things... Can we reduce the time taken on a Miller-Rabin test, and we will also look at other primality tests not mentioned in the first pro tip

Testing the `Miller-Rabin` with the code supplied in the comment:

``````def isStrongPseudoprime(n, a):
d, s = n-1, 0
while d%2 == 0:
d, s = d/2, s+1
t = pow(a, d, n)
if t == 1:
return True
while s > 0:
if t == n-1:
return True
t, s = (t*t) % n, s-1
return False

from random import randint

@benchmark()
def test_miller_rabin_new(n, limit=10):
for i in range(limit):
a = randint(2, n-1)
if not isStrongPseudoprime(n, a):
return False
return True

cProfile.run('test_miller_rabin_new(10**100)')

# test_miller_rabin_new at 10000 iterations: 11.0569999218
# Can perform 904 calls per second.``````

This is incredibly slow. What about the `Miller-Rabin` mentioned in yesterday's pro tip? For space, this is checking the same code listed above, `miller_rabin(n, k=10)`

``````# test_miller_rabin_old at 10000 iterations: 0.0349998474121
# Can perform 285715 calls per second.``````

Using the optimization methods mentioned so far, the only real optimization we can do is replace the `while d % 2 == 0:` with `while not d & 1:` while and to start with `s = -1` over `s = 0` while factoring powers of 2. Using `s = -1` here eliminates the `s - 1` used within `check(a, s, d, n)`.

``````def miller_rabin(n, k=10):
if n == 2:
return True
if not n & 1:
return False

def check(a, s, d, n):
x = pow(a, d, n)
if x == 1:
return True
for i in xrange(s):
if x == n - 1:
return True
x = pow(x, 2, n)
return x == n - 1

s = -1
d = n - 1
while not d & 1:
d >>= 1
s += 1

for i in xrange(k):
a = randrange(2, n - 1)
if not check(a, s, d, n):
return False
return True``````

It gives some performance upgrades, both used `cProfile.run('test_miller_rabin(10**100)')`

``````test_miller_rabin_old at 10000 iterations: 0.0640001296997
Can perform 156249 calls per second.

test_miller_rabin_new at 10000 iterations: 0.0500001907349
Can perform 199999 calls per second.``````

Doing `43750` more operations per second.

To simplify tests, and the entire point of good code is reusable code, I have moved `check()` out of `miller_rabin()` and have also added `factor()`. Benchmarking the changes, I got the same `miller_rabin_1 at 10000 iterations: 0.0439999103546` and `miller_rabin_2 at 10000 iterations: 0.0439999103546`.

``````def factor(n, p=2):
s = 0
d = n - 1
q = p
while not d & q - 1:
s += 1
q *= p
return s, d // (q // p)

def strong_pseudoprime(n, a, s=None, d=None):
if (s is None) or (d is None):
s, d = factor(n, 2)
x = pow(a, d, n)
if x == 1:
return True
for i in xrange(s):
x = pow(x, 2, n)
if x == n - 1:
return True
return x == n - 1

def miller_rabin(n, k=10):
if n == 2:
return True
if not n & 1:
return False

s, d = factor(n)

for i in xrange(k):
a = randrange(2, n - 1)
if not strong_pseudoprime(n, a, s, d):
return False
return True``````

# Other Primality Algorithms

## Lucas Pseudoprime Tests

For these tests we use the following two methods:

``````def jacobi(a, p):
if (not p & 1) or (p < 0):
raise ValueError('p must be a positive odd number.')
if (a == 0) or (a == 1):
return a
a = a % p
t = 1
while a != 0:
while not a & 1:
a >>= 1
if p & 7 in (3, 5):
t = -t
a, p = p, a
if (a & 3 == 3) and (p & 3) == 3:
t = -t
a = a % p
if p == 1:
return t
return 0``````

And

``````from fractions import gcd

def selfridge(n):
d = 5
s = 1
ds = d * s
while True:
if gcd(ds, n) > 1:
return ds, 0, 0
if jacobi(ds, n) == -1:
return ds, 1, (1 - ds) / 4
d += 2
s *= -1
ds = d * s``````

The first is a standard Lucas pseudoprime test. Taking the code given in this comment, I have benchmarked and rewritten this.

``````def chain(n, u1, v1, u2, v2, d, q, m):
k = q
while m > 0:
u2 = (u2 * v2) % n
v2 = (v2 * v2 - 2 * q) % n
q = (q * q) % n
if m & 1 == 1:
t1, t2 = u2 * v1, u1 * v2
t3, t4 = v2 * v1, u2 * u1 * d
u1, v1 = t1 + t2, t3 + t4
if u1 & 1 == 1:
u1 = u1 + n
if v1 & 1 == 1:
v1 = v1 + n
u1, v1 = (u1 / 2) % n, (v1 / 2) % n
k = (q * k) % n
m = m >> 1
return u1, v1, k``````

## Standard Lucas Pseudoprime Test

``````def lucas_pseudoprime(n):
d, p, q = selfridge(n)
if p == 0:
return n == d
u, v, k = chain(n, 0, 2, 1, p, d, q, (n + 1) / 2)
return u == 0

cProfile.run('lucas_pseudoprime(10**100)')

# lucas_pseudoprime at 10000 iterations: 0.0999999046326
# Can perform 100000 calls per second.``````

The benchmark on the code supplied in the comment was

``````isStandardLucasPseudoprime at 10000 iterations: 0.119999885559
Can perform 83333 calls per second.``````

## Strong Lucas Pseudoprime Test

``````def strong_lucas_pseudoprime(n):
d, p, q = selfridge_2(n)
if p == 0:
return n == d

s, t = factor(n)

u, v, k = chain(n, 1, p, 1, p, d, q, t >> 1)

if (u == 0) or (v == 0):
return True

for i in xrange(1, s):
v = (v * v - 2 * k) % n
k = (k * k) % n
if v == 0:
return True

return False

cProfile.run('strong_lucas_pseudoprime(10**100)')

# strong_lucas_pseudoprime at 10000 iterations: 0.0969998836517
# Can perform 103092 calls per second.``````

The benchmark for the supplied code:

``````isStrongLucasPseudoprime at 10000 iterations: 0.129999876022
Can perform 76923 calls per second.``````

## Baillie–PSW Primality Test

``````def isqrt(n):
if n < 0:
raise ValueError('Square root is not defined for negative numbers.')
x = int(n)
if x == 0:
return 0
a, b = divmod(x.bit_length(), 2)
n = 2 ** (a + b)
while True:
y = (n + x // n) >> 1
if y >= n:
return n
n = y

def is_square(n):
s = isqrt(n)
return s * s == n

def baillie_psw(n, limit=100):
if not n & 1:
return False
if n < 2 or is_square(n):
return False
for i in xrange(3, limit + 1, 2):
if n % i:
return n == i
return strong_pseudoprime(n, 2) and strong_lucas_pseudoprime(n)

cProfile.run('bailli_psw(10**100)')

# baillie_psw at 10000 iterations: 0.113000154495
# Can perform 88495 calls per second.``````

The benchmark for the supplied code:

``````isBaillieWagstaffPrime at 10000 iterations: 2.29200005531
Can perform 4363 calls per second.``````

From these benchmarks, we can see that so far the `Miller-Rabin` test is still fastest, `test_miller_rabin_new at 10000 iterations: 0.0500001907349` over the `Baillie-PSW` at `baillie_psw at 10000 iterations: 0.113000154495`. So what about determining the next highest prime?

# Determining Next Highest Prime

Changing yesterday's article to use wheel factorization.

These benchmarks are iterated over 1,000 times for lack of resources.

Also, do to system limitations, the number we'll be using is `100**10-1`, not `10**100`, which is exactly what yesterday's pro tip used.

First we will benchmark the `Miller-Rabin` test.

``````def next_prime_miller_rabin(n):
if n < 2:
return 2
if n < 5:
return [3, 5, 5][n - 2]
gap = [1, 6, 5, 4, 3, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1,
6, 5, 4, 3, 2, 1, 2]
n = n + 1 if not n & 1 else n + 2
while not miller_rabin(n):
n += gap[n % 30]
return n

cProfile.run('next_prime_miller_rabin(100*10-1)')

# returns `100000000000000000039`

# next_prime_miller_rabin at 1000 iterations: 1.79100012779
# Can perform 558 calls per second.
#          199885 function calls in 2.396 seconds``````

This is pretty good, compared to the benchmark for the previous pro tip:

``````1000 calls, 337 per second.
258669 function calls in 2.971 seconds``````

So, we have reduced the time over 1,000 iterations from `2.971` to `1.791`, a `1.18` second drop. This allows for `221` more calls per second. We have also significantly reduced the function calls. `258669` down to `199885`.

Now, the `Baillie-PSW` test:

``````def next_prime_baillie_psw(n):
if n < 2:
return 2
if n < 5:
return [3, 5, 5][n - 2]
gap = [1, 6, 5, 4, 3, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1,
6, 5, 4, 3, 2, 1, 2]
n = n + 1 if not n & 1 else n + 2
while not baillie_psw(n):
n += gap[n % 30]
return n

cProfile.run('next_prime_bailie_psw(100*10-1)')

# returns `100000000000000000039`

# next_prime_baillie_psw at 1000 iterations: 0.93499994278
# Can perform 1069 calls per second.
#          69006 function calls in 1.527 seconds``````

Wow, so not only have we dropped this from `1.7910` seconds to `0.9354`, we have also dropped the function calls from `199885` to `69009`.

# Conclusion

With optimizing some code choices, `not n & 1` over `n % 2 == 0`, using `n >> 1` over `n / 2`, `random.randrange` over `random.randint`, `xrange` over `range`... We have been able to speed things up a bit. Going further, we have discussed more methods for determining if a number is prime and also determining the next highest prime.

To recap, we have determined that the Baillie-PSW primality test is much faster than the Miller-Rabin. This is due to the steps that it takes.

1. A simple check to see if `n` is square.
2. Trial division to `limit`. No need to perform other tests if it passes this. 1
3. Running a `strong pseudoprime test` for `n, base 2`.
4. Running a `strong pseudoprime test` for `n, base 3`.
5. Finally, running a `strong Lucas pseudoprime test` on `n`.

1: We discovered with yesterday's pro tip that using `all(n % i for i in mrange(3, int(sqrt(n)) + 1, 2))` was the fastest for just checking if `n` was prime. The benchmark was `10000 calls, 53191 per second. 60006 function calls in 0.190 seconds` on testing `100*10-1`. With the tests mentioned above, the `baillie_psw` method out performs this. The benchmark for `baillie_psw(100**10-1)` comes out to `baillie_psw at 10000 iterations: 0.111000061035. Can perform 90090 calls per second.`. And we get `30006 function calls in 0.746 seconds` from `cProfile.run('baillie_psw(100**10-1)')`

#### 3 Responses

15908

Thanks for writing these up.
I may be confused, but isn't there a mistake in the baillie_psw() method?
Where it says
for i in xrange(3, limit + 1, 2):
if n % i:
return n == i

shouldn't the if condition say
if n % i == 0:
? Because you're going to return an answer only if n is divisible by i.

Lars

over 1 year ago ·
15909

Silly formatting...

Thanks for writing these up.

I may be confused, but isn't there a mistake in the baillie_psw() method?

Where it says

``````for i in xrange(3, limit + 1, 2):
if n % i:
return n == i``````

shouldn't the if condition say

``if n % i == 0:``

? Because you're going to return an answer only if n is divisible by i.

Lars

over 1 year ago ·
21140 I apologize I never replied back to this. I haven't been doing much coding in the past two years and I spaced all of this. You are correct, this was changed but never updated on here.

``````def baillie_psw(n, limit=100):
if n == 2:
return True

if not n & 1:
return False

if n < 2 or is_square(n):
return False

if gcd(n, PRIMONIAL_31) > 1:
return n in PRIMES_LE_31

bound = min(limit, isqrt(n))
for i in range(3, bound, 2):
if not n % i:
return False

return strong_pseudoprime(n, 2) \
and strong_pseudoprime(n, 3) \
and strong_lucas_pseudoprime(n)``````

The use of `if not n % i:` works the same as `if n % i == 0:`

over 1 year ago ·