In this example, I show two different methods for creating a list of strings using Cython. One uses an array of char pointers (and the strcpy
C function) and the other by simply appending elements to a list.
I then pass each of these lists into the set
function and see that performance is drastically different.
Question - What can I do to create the list using character pointers to have equal performance?
A simple function to create lists in Cython
from libc.string cimport strcpy
def make_lists():
cdef:
char c_list[100000][3]
Py_ssize_t i
list py_list = []
for i in range(100000):
strcpy(c_list[i], 'AB')
c_list[i][2] = b'\0'
py_list.append(b'AB')
return c_list, py_list
Here, c_list
is just an array of 3-length characters. Cython will return this object as a Python list. py_list
is just a normal Python list. We are filling both lists with just a single sequence of bytes, 'AB'.
c_list, py_list = make_lists()
>>> c_list[:10]
[b'AB', b'AB', b'AB', b'AB', b'AB', b'AB', b'AB', b'AB', b'AB', b'AB']
>>> c_list == py_list
True
%timeit set(c_list)
2.85 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit set(py_list)
1.02 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Interestingly, the performance difference vanishes if I decode each value to unicode, though it is slower than the original set(py_list)
. If I create a unicode list in pure Python then I am back to the original performance.
c_list_unicode = [v.decode() for v in c_list]
py_list_unicode = [v.decode() for v in py_list]
py_list_py = ['AB' for _ in range(len(py_list))]
%timeit set(c_list_unicode)
1.63 ms ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit set(py_list_unicode)
1.7 ms ± 35.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit set(py_list_py)
987 µs ± 45.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
def make_lists2():
cdef:
char *c_list[100000]
Py_ssize_t i
list py_list_slow = []
list py_list_fast = []
for i in range(100000):
c_list[i] = 'AB'
py_list_slow.append(c_list[i])
py_list_fast.append(b'AB')
return c_list, py_list_slow, py_list_fast
Timings
c_list2, py_list_slow, py_list_fast = make_lists2()
%timeit set(c_list2)
3.01 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit set(py_list_slow)
3.05 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit set(py_list_fast)
1.08 ms ± 38.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
edit
I found the function PyUnicode_InternFromString
in the unicode Python C API and am getting performance on par with regular python lists. This 'interns' the string - not sure what that means
Your c_list
is a list of 100000 distinct bytestrings with the same contents. Cython has to convert each char[3]
to a bytestring separately, and it doesn't bother to do any object deduplication.
Your py_list
is a list of the same bytestring object 100000 times. Every py_list.append(b'AB')
appends the same object to py_list
; without the trip through a C array, Cython never needs to copy the bytestring.
set(c_list)
is slower than set(py_list)
because set(c_list)
has to actually perform string comparison, while set(py_list)
gets to skip that with an object identity check.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.