Home > python > Speed up Python with Cython

Speed up Python with Cython

This post is based on a conversation with our local Python guru, Yves :)

Problem
You have a script that you would like to speed up. For instance, there is a function that is called lots of times and you suspect it causes a bottleneck.

Solution
With Cython, it is possible to compile a module to C source that you can then compile with GCC. The resulting binary can be imported in your Python script just as if it were a normal module. Since it’s a compiled module, you can expect some speed gains.

Example #01 (pure Python)
Let’s see the following simple script. It enumerates numbers up to a given threshold and tests if the given number is prime. At the end it prints the number of primes found.

Pure Python solution:

#!/usr/bin/env python

from prime import is_prime

UPTO = 10**7 / 4

def main():
    i = 1
    cnt = 0
    while i <= UPTO:
        if is_prime(i):
            cnt += 1

        i += 1

    print cnt


if __name__=="__main__":
    main()

prime.py:

def is_prime(n):
    if n == 2:
        return True
    if n % 2 == 0:
        return False

    i = 3
    maxi = n**0.5 + 1
    while i <= maxi:
        if n % i == 0:
            return False
        i += 2

    return True

According to the Unix time command, the execution time is 29.69 sec. on my laptop.

Example #02 (with Cython, first try)
Now let’s compile python.py.

cython prime.py

This will produce prime.c. Now compile it:

gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.7 -o prime.so prime.c

The output is the binary prime.so.

There is nothing to change in the main file, “from prime import is_prime” will import prime.so first.

Execution time: 21.11 sec.

Example #03 (with Cython, second try)
This paragraph is an update (20111027), incorporating the remarks of James.

By adding static type declarations, Cython can perform much better.

Controller part:

#!/usr/bin/env python

import pyximport             # add it here, before importing cython code
pyximport.install()

from prime import is_prime   # cython code is imported here

UPTO = 10**7 / 4

def main():
    i = 1
    cnt = 0
    while i <= UPTO:
        if is_prime(i):
            cnt += 1

        i += 1

    print cnt


if __name__=="__main__":
    main()

prime.pyx (notice the .pyx extension):

def is_prime(int n):
    if n == 2:
        return True
    if n % 2 == 0:
        return False

    cdef int i = 3
    cdef double maxi = n**0.5 + 1
    while i <= maxi:
        if n % i == 0:
            return False
        i += 2

    return True

The line in the first code “import pyximport; pyximport.install()” will ensure that the cython module is automatically built when imported, thus there is no need to run cython or gcc.

By default, compiled modules will end up in a .pyxbld directory in the user’s home directory. Passing a different path as build_dir as an argument when you call pyximport.install will override this.

Execution time: 2.15 sec. Lesson learned: use static type declarations in your Cython code whenever possible.

Example (with PyPy)
Just out of curiosity, I tried to launch the script with PyPy too. PyPy is a fast, compliant alternative implementation of the Python language, written in Python itself. Since it uses a JIT compiler, PyPy is often faster than the standard Python interpreter (see a presentation here).

Execution time (hang on!): 2.35 sec.

Well, the difference is quite spectacular in the case of this example but it doesn’t mean that PyPy is always faster. In a completely different problem setting the end result can be just the opposite. So always make some tests and then choose the solution which is best for you.

Conclusion
If your program seems to run slowly, first try to polish the code and use some better algorithms / data structures. If it’s still slow, you can try to compile some parts of it with Cython. However, bare in mind that you hurt portability. But before transforming your program to a half Python / half C monster, try PyPy too. Maybe you don’t need Cython at all.

Example #04 (with Cython, third try)
This is an update (20170521). The pyximport solution compiled everything for us. Now let’s see how to compile a project manually, without the magic of pyximport. This example was tried under Python 3.6.

prime.pyx:

def is_prime(int n):
    if n == 2:
        return True
    if n % 2 == 0:
        return False
 
    cdef int i = 3
    cdef double maxi = n**0.5 + 1
    while i <= maxi:
        if n % i == 0:
            return False
        i += 2
 
    return True

main.py:

#!/usr/bin/env python3

from prime import is_prime   # cython code is imported here

UPTO = 10**7 / 4

def main():
    i = 1
    cnt = 0
    while i <= UPTO:
        if is_prime(i):
            cnt += 1

        i += 1

    print(cnt)

##############################################################################

if __name__ == "__main__":
    main()

setup.py:

from setuptools import setup, Extension

setup(name='Prime Counter',
      ext_modules=[Extension('prime', ['prime.c'])],
)

compile.sh:

#!/usr/bin/env bash

cython prime.pyx 
python setup.py build_ext --inplace

At the end, simply lauch main.py.

Here you have more work to do. Example #03 with pyximport is a more comfortable solution.

Advertisements
Categories: python Tags: , , ,
  1. October 23, 2011 at 13:22

    Awesome stuff. I have considered using Cython many times, but because of the portability issue it stops me using it. However, if you are writing a Windows only application, the use of Cython to speed up the execution would be a good idea.

    Lovely post BTW.

  2. James
    October 27, 2011 at 14:47

    You can get a *much* bigger speedup with Cython here. The problem with your current code is that Cython doesn’t know the types of your variables, so when it sees, for example, n ** 0.5 + 1, it is unable to tell what operation is being carried out, and instead of converting the exponentiation and addition into C, it simply places calls to Python functions into the C code. If you add static type declarations by replacing ‘n’ with ‘int n’ in the argument list, ‘i = 3’ with ‘cdef int i = 3’, and ‘maxi = n**0.5 + 1’ with ‘cdef double maxi = n**0.5 + 1’, the program speeds up hugely. For me, the pure python version takes just over 30s, while the Cython version with static type declarations takes less than 1.5s. Of course, this algorithm is not the most efficient way of implementing the prime-counting function, so you can probably do a lot better than this.

    Two further hints with using Cython: the option -a produces a html file which highlights lines in your code where Python function calls are made, and you can click on the lines to see the resulting C code. Also, if you change the name of prime.py to prime.pyx, and put the line ‘import pyximport; pyximport.install()’ in your script before you import prime, your Cython code will be automatically built when you import it (that is, you don’t need to run cython or gcc).

    @Helen Neely: what is the portability issue you are talking about? You need cython and a C compiler to run the code, but they should be available on just about any operating system.

    • October 27, 2011 at 15:43

      Thanks for the remarks, I added them to the post.

      I also mentioned portability. I meant the following: for a pure Python script, you just need the Python interpreter. For this, you need Cython + gcc too. Installing gcc on Windows can be painful sometimes.

  3. James
    October 27, 2011 at 15:07

    Some further tweaks can get the Cython version down to less than 1.2s on my machine:

    cimport cython
    cdef extern from "math.h":
        double sqrt(double)
    
    @cython.cdivision(True)
    def is_prime(int n):
        if n == 2:
            return True
        if n % 2 == 0:
            return False
    
        cdef int i = 3
        cdef double maxi = sqrt(n) + 1
        while i <= maxi:
            if n % i == 0:
                return False
            i += 2
    
        return True
    

    This uses C’s sqrt function instead of **0.5 (imported with the cdef extern command), while the cdivision decorator causes Cython to use C’s % operator, which has slightly different behaviour from the Python version in the case of negative numbers.

    To add to what I said about pyximport – this module is clever enough to only build the Cython code on the first run (so the first run of the script will be slower than subsequent ones).

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: