Last Updated: February 25, 2016
·
3.365K
· bnlucas

Prime Numbers with Python v2

Overview

Yesterday, June 25th 2013, I wrote a pro tip on working with primes in Python, especially large primes. I also started a thread on Stackoverflow on this to help clear the air on the methods for working with large primes in Python.

Thanks to a comment left on the Stackoverflow question, I have more results to share with you and also some methods for minor math operations to speed things up for you.

Get the Code

The code used in this pro tip can be found on gist.github.com at primes.py

The code listed in the previous pro tip can be found at fermat.py, miller_rabin.py and solovay-strassen.py

Benchmark Methods

The test system: Single-Core AMD Athlon 64 LE-1640, 2.6GHz with 2.00GB DDR RAM running Windows 7.

I am using a mixture of two benchmarking methods, one is a homebrew timestamp decorator:

from functools import wraps

def benchmark(iterations=10000):
    def wrapper(function):
        @wraps(function)
        def func(*args, **kwargs):
            t1 = time()
            for i in range(iterations):
                call = function(*args, **kwargs)
            t2 = time()
            t3 = int(1 / ((t2 - t1) / iterations))
            print func.func_name, 'at', iterations, 'iterations:', t2 - t1
            print 'Can perform, t3, 'calls per second.'
            return call
        return func
    return wrapper

The other is Python's cProfile

import cProfile
cProfile.run('test_method()')

So the full test is:

@benchmark()
def test_method():
    pass

cProfile.run('test_method()')

Minor Math Operations

Divisible by Two

Finding if an integer is divisible by two, we can use n mod 2 modulo operation. If we get 0, we know that n is divisible by 2. Can we make this faster?

@benchmark()
def test_mod(n):
    return n mod 2

cProfile.run('test_mod(10**100)')

# test_mod at 10000 iterations: 0.0360000133514
# Can perform 277777 calls per second.

This isn't bad, we can run n % 2 277777 times per second. What about n & 1 using the binary AND operator?

@benchmark()
def test_and(n):
    return n & 1

cProfile.run('test_and(10**100)')

# test_and at 10000 iterations: 0.0299999713898
# Can perform 333333 calls per second.

So, as we can see, n & 1 out preforms n % 2 be 0.00804 seconds. Allowing us to run this 333333 times per second, which is 55556 more operations per second.

Divide by 2 or Binary Right Shift?

Divide number by 2:

@benchmark()
def test_div(n):
    return n / 2

cProfile.run('test_div(10**100)')

# test_div at 10000 iterations: 0.0370001792908
# Can perform 270268 calls per second.

Binary Right Shift n >> 1

@benchmark
def test_brshift(n):
    return n >> 1

cProfile.run('test_brshift(10**100)')

# test_brshift at 10000 iterations: 0.029000043869
# Can perform 344827 calls per second.

So, by using the binary right shift operator, we can divide by two 74559 times more per second.

Random Number Generation: x..y

We see two main methods for this, random.randint and random.randrange, which one is better?

from random import randint

@benchmark()
def test_randint(x, y)
    return randint(x, y)

cProfile.run('test_randint(10**100, 10**101)')

# test_randint at 10000 iterations: 0.270999908447
# Can perform 36900 calls per second.

What about randrange?

from random import randrange

@benchmark
def test_randrange(x, y)
    return randrange(x, y)

cProfile.run('test_randrange(10**100, 10**101)')

# test_randrange at 10000 iterations: 0.233000040054
# Can perform 42918 calls per second.

So now we know it is faster to use random.randrange over random.randint to generate a random number. 6018 more operations per second.

Using range or xrange:

Iterating with range

@benchmark()
def test_range(n):
    for i in range(n):
        continue

cProfile.run('test_range(10**4)')

# test_range at 10000 iterations: 8.27799987793
# Can perform 1208 calls per second.

# from cProfile `20006 function calls in 8.106 seconds`

Iterating with xrange

@benchmark()
def test_xrange(n):
    for i in xrange(n):
        continue

cProfile.run('test_xrange(10**4)')

# test_xrange at 10000 iterations: 4.91900014877
# Can perform 2032 calls per second.

# from cProfile `10006 function calls in 5.719 seconds`

xrange is significantly faster than range with less function calls. Now we know that xrange is the way to go.

And for simplicity, I am going to use not n & 1 in the following tests over n & 1 == 0. The benchmarks are minimal, you can run not n & 1 2 more times per second than the later.

Primality Tests

As determined in Prime Numbers in Python, we know that out of all the methods discussed, these two are fastest:

def mrange(start, stop, step):
    while start < stop:
        yield start
        start += step

def is_prime(n):
    if n == 2:
        return True
    if not n & 1:
        return False
    return all(n % i for i in mrange(3, int(sqrt(n)) + 1, 2))

Although this out performed all other methods in determining if number n was prime, it does not work for determining the next highest prime. So, we went with the Miller-Rabin primality test.

def miller_rabin(n, k=10):
    if n == 2:
        return True
    if not n & 1:
        return False

    def check(a, s, d, n):
        x = pow(a, d, n)
        if x == 1:
            return True
        for i in xrange(s - 1):
            if x == n - 1:
                return True
            x = pow(x, 2, n)
        return x == n - 1

    s = 0
    d = n - 1

    while d % 2 == 0:
        d >>= 1
        s += 1

    for i in xrange(k):
        a = randrange(2, n - 1)
        if not check(a, s, d, n):
            return False
    return True

With this comment left on the thread on Stackoverflow, we will try a few things... Can we reduce the time taken on a Miller-Rabin test, and we will also look at other primality tests not mentioned in the first pro tip

Testing the Miller-Rabin with the code supplied in the comment:

def isStrongPseudoprime(n, a):
    d, s = n-1, 0
    while d%2 == 0:
        d, s = d/2, s+1
    t = pow(a, d, n)
    if t == 1:
        return True
    while s > 0:
        if t == n-1:
            return True
        t, s = (t*t) % n, s-1
    return False

from random import randint

@benchmark()
def test_miller_rabin_new(n, limit=10):
    for i in range(limit):
        a = randint(2, n-1)
        if not isStrongPseudoprime(n, a):
            return False
    return True

cProfile.run('test_miller_rabin_new(10**100)')

# test_miller_rabin_new at 10000 iterations: 11.0569999218
# Can perform 904 calls per second.

This is incredibly slow. What about the Miller-Rabin mentioned in yesterday's pro tip? For space, this is checking the same code listed above, miller_rabin(n, k=10)

# test_miller_rabin_old at 10000 iterations: 0.0349998474121
# Can perform 285715 calls per second.

Using the optimization methods mentioned so far, the only real optimization we can do is replace the while d % 2 == 0: with while not d & 1: while and to start with s = -1 over s = 0 while factoring powers of 2. Using s = -1 here eliminates the s - 1 used within check(a, s, d, n).

def miller_rabin(n, k=10):
    if n == 2:
        return True
    if not n & 1:
        return False

    def check(a, s, d, n):
        x = pow(a, d, n)
        if x == 1:
            return True
        for i in xrange(s):
            if x == n - 1:
                return True
            x = pow(x, 2, n)
        return x == n - 1

    s = -1
    d = n - 1
    while not d & 1:
        d >>= 1
        s += 1

    for i in xrange(k):
        a = randrange(2, n - 1)
        if not check(a, s, d, n):
            return False
    return True

It gives some performance upgrades, both used cProfile.run('test_miller_rabin(10**100)')

test_miller_rabin_old at 10000 iterations: 0.0640001296997
Can perform 156249 calls per second.

test_miller_rabin_new at 10000 iterations: 0.0500001907349
Can perform 199999 calls per second.

Doing 43750 more operations per second.

To simplify tests, and the entire point of good code is reusable code, I have moved check() out of miller_rabin() and have also added factor(). Benchmarking the changes, I got the same miller_rabin_1 at 10000 iterations: 0.0439999103546 and miller_rabin_2 at 10000 iterations: 0.0439999103546.

def factor(n, p=2):
    s = 0
    d = n - 1
    q = p
    while not d & q - 1:
        s += 1
        q *= p
    return s, d // (q // p)

def strong_pseudoprime(n, a, s=None, d=None):
    if (s is None) or (d is None):
        s, d = factor(n, 2)
    x = pow(a, d, n)
    if x == 1:
        return True
    for i in xrange(s):
        x = pow(x, 2, n)
        if x == n - 1:
            return True
    return x == n - 1

def miller_rabin(n, k=10):
    if n == 2:
        return True
    if not n & 1:
        return False

    s, d = factor(n)

    for i in xrange(k):
        a = randrange(2, n - 1)
        if not strong_pseudoprime(n, a, s, d):
            return False
    return True

Other Primality Algorithms

Lucas Pseudoprime Tests

For these tests we use the following two methods:

def jacobi(a, p):
    if (not p & 1) or (p < 0):
        raise ValueError('p must be a positive odd number.')
    if (a == 0) or (a == 1):
        return a
    a = a % p
    t = 1
    while a != 0:
        while not a & 1:
            a >>= 1
            if p & 7 in (3, 5):
                t = -t
        a, p = p, a
        if (a & 3 == 3) and (p & 3) == 3:
            t = -t
        a = a % p
    if p == 1:
        return t
    return 0

And

from fractions import gcd

def selfridge(n):
    d = 5
    s = 1
    ds = d * s
    while True:
        if gcd(ds, n) > 1:
            return ds, 0, 0
        if jacobi(ds, n) == -1:
            return ds, 1, (1 - ds) / 4
        d += 2
        s *= -1
        ds = d * s

The first is a standard Lucas pseudoprime test. Taking the code given in this comment, I have benchmarked and rewritten this.

def chain(n, u1, v1, u2, v2, d, q, m):
    k = q
    while m > 0:
        u2 = (u2 * v2) % n
        v2 = (v2 * v2 - 2 * q) % n
        q = (q * q) % n
        if m & 1 == 1:
            t1, t2 = u2 * v1, u1 * v2
            t3, t4 = v2 * v1, u2 * u1 * d
            u1, v1 = t1 + t2, t3 + t4
            if u1 & 1 == 1:
                u1 = u1 + n
            if v1 & 1 == 1:
                v1 = v1 + n
            u1, v1 = (u1 / 2) % n, (v1 / 2) % n
            k = (q * k) % n
        m = m >> 1
    return u1, v1, k

Standard Lucas Pseudoprime Test

def lucas_pseudoprime(n):
    d, p, q = selfridge(n)
    if p == 0:
        return n == d
    u, v, k = chain(n, 0, 2, 1, p, d, q, (n + 1) / 2)
    return u == 0

cProfile.run('lucas_pseudoprime(10**100)')

# lucas_pseudoprime at 10000 iterations: 0.0999999046326
# Can perform 100000 calls per second.

The benchmark on the code supplied in the comment was

isStandardLucasPseudoprime at 10000 iterations: 0.119999885559
Can perform 83333 calls per second.

Strong Lucas Pseudoprime Test

def strong_lucas_pseudoprime(n):
    d, p, q = selfridge_2(n)
    if p == 0:
        return n == d

    s, t = factor(n)

    u, v, k = chain(n, 1, p, 1, p, d, q, t >> 1)

    if (u == 0) or (v == 0):
        return True

    for i in xrange(1, s):
        v = (v * v - 2 * k) % n
        k = (k * k) % n
        if v == 0:
            return True

    return False

cProfile.run('strong_lucas_pseudoprime(10**100)')

# strong_lucas_pseudoprime at 10000 iterations: 0.0969998836517
# Can perform 103092 calls per second.

The benchmark for the supplied code:

isStrongLucasPseudoprime at 10000 iterations: 0.129999876022
Can perform 76923 calls per second.

Baillie–PSW Primality Test

def isqrt(n):
    if n < 0:
        raise ValueError('Square root is not defined for negative numbers.')
    x = int(n)
   if x == 0:
        return 0
    a, b = divmod(x.bit_length(), 2)
    n = 2 ** (a + b)
    while True:
        y = (n + x // n) >> 1
        if y >= n:
            return n
        n = y

def is_square(n):
    s = isqrt(n)
    return s * s == n

def baillie_psw(n, limit=100):
    if not n & 1:
        return False
    if n < 2 or is_square(n):
        return False
    for i in xrange(3, limit + 1, 2):
        if n % i:
            return n == i
    return strong_pseudoprime(n, 2) and strong_lucas_pseudoprime(n)

cProfile.run('bailli_psw(10**100)')

# baillie_psw at 10000 iterations: 0.113000154495
# Can perform 88495 calls per second.

The benchmark for the supplied code:

isBaillieWagstaffPrime at 10000 iterations: 2.29200005531
Can perform 4363 calls per second.

From these benchmarks, we can see that so far the Miller-Rabin test is still fastest, test_miller_rabin_new at 10000 iterations: 0.0500001907349 over the Baillie-PSW at baillie_psw at 10000 iterations: 0.113000154495. So what about determining the next highest prime?

Determining Next Highest Prime

Changing yesterday's article to use wheel factorization.

These benchmarks are iterated over 1,000 times for lack of resources.

Also, do to system limitations, the number we'll be using is 100**10-1, not 10**100, which is exactly what yesterday's pro tip used.

First we will benchmark the Miller-Rabin test.

def next_prime_miller_rabin(n):
    if n < 2:
        return 2
    if n < 5:
        return [3, 5, 5][n - 2]
    gap = [1, 6, 5, 4, 3, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1, 
           6, 5, 4, 3, 2, 1, 2]
    n = n + 1 if not n & 1 else n + 2
    while not miller_rabin(n):
        n += gap[n % 30]
    return n

cProfile.run('next_prime_miller_rabin(100*10-1)')

# returns `100000000000000000039`

# next_prime_miller_rabin at 1000 iterations: 1.79100012779
# Can perform 558 calls per second.
#          199885 function calls in 2.396 seconds

This is pretty good, compared to the benchmark for the previous pro tip:

1000 calls, 337 per second.
258669 function calls in 2.971 seconds

So, we have reduced the time over 1,000 iterations from 2.971 to 1.791, a 1.18 second drop. This allows for 221 more calls per second. We have also significantly reduced the function calls. 258669 down to 199885.

Now, the Baillie-PSW test:

def next_prime_baillie_psw(n):
    if n < 2:
        return 2
    if n < 5:
        return [3, 5, 5][n - 2]
    gap = [1, 6, 5, 4, 3, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1, 2, 1, 4, 3, 2, 1, 
           6, 5, 4, 3, 2, 1, 2]
    n = n + 1 if not n & 1 else n + 2
    while not baillie_psw(n):
        n += gap[n % 30]
    return n

cProfile.run('next_prime_bailie_psw(100*10-1)')

# returns `100000000000000000039`

# next_prime_baillie_psw at 1000 iterations: 0.93499994278
# Can perform 1069 calls per second.
#          69006 function calls in 1.527 seconds

Wow, so not only have we dropped this from 1.7910 seconds to 0.9354, we have also dropped the function calls from 199885 to 69009.

Conclusion

With optimizing some code choices, not n & 1 over n % 2 == 0, using n >> 1 over n / 2, random.randrange over random.randint, xrange over range... We have been able to speed things up a bit. Going further, we have discussed more methods for determining if a number is prime and also determining the next highest prime.

To recap, we have determined that the Baillie-PSW primality test is much faster than the Miller-Rabin. This is due to the steps that it takes.

  1. A simple check to see if n is square.
  2. Trial division to limit. No need to perform other tests if it passes this. 1
  3. Running a strong pseudoprime test for n, base 2.
  4. Running a strong pseudoprime test for n, base 3.
  5. Finally, running a strong Lucas pseudoprime test on n.

1: We discovered with yesterday's pro tip that using all(n % i for i in mrange(3, int(sqrt(n)) + 1, 2)) was the fastest for just checking if n was prime. The benchmark was 10000 calls, 53191 per second. 60006 function calls in 0.190 seconds on testing 100*10-1. With the tests mentioned above, the baillie_psw method out performs this. The benchmark for baillie_psw(100**10-1) comes out to baillie_psw at 10000 iterations: 0.111000061035. Can perform 90090 calls per second.. And we get 30006 function calls in 0.746 seconds from cProfile.run('baillie_psw(100**10-1)')

3 Responses
Add your response

Thanks for writing these up.
I may be confused, but isn't there a mistake in the baillie_psw() method?
Where it says
for i in xrange(3, limit + 1, 2):
if n % i:
return n == i

shouldn't the if condition say
if n % i == 0:
? Because you're going to return an answer only if n is divisible by i.

Lars

over 1 year ago ·

Silly formatting...

Thanks for writing these up.

I may be confused, but isn't there a mistake in the baillie_psw() method?

Where it says

for i in xrange(3, limit + 1, 2):
    if n % i:
        return n == i

shouldn't the if condition say

if n % i == 0:

? Because you're going to return an answer only if n is divisible by i.

Lars

over 1 year ago ·

I apologize I never replied back to this. I haven't been doing much coding in the past two years and I spaced all of this. You are correct, this was changed but never updated on here.

def baillie_psw(n, limit=100):
    if n == 2:
        return True

    if not n & 1:
        return False

    if n < 2 or is_square(n):
        return False

    if gcd(n, PRIMONIAL_31) > 1:
        return n in PRIMES_LE_31

    bound = min(limit, isqrt(n))
    for i in range(3, bound, 2):
        if not n % i:
            return False

    return strong_pseudoprime(n, 2) \
        and strong_pseudoprime(n, 3) \
        and strong_lucas_pseudoprime(n)

The use of if not n % i: works the same as if n % i == 0:

over 1 year ago ·