I am using a python program that uses numpy array as standard data type for arrays. For the heavy computation I pass the arrays to a C++ library. In order to do so, I use pybind . However, I am required to use python list
. I do the conversion from numpy
array and list
via:
NativeSolver.vector_add(array1.tolist(), array2.tolist(), ...)
How much overhead does this conversion generate? I hope it doesn't create a whole new copy. Numpy reference says:
ndarray.tolist()
Return a copy of the array data as a (nested) Python list. Data items are converted to the nearest compatible Python type.
A lot. For simple built-in types, you can use sys.getsizeof
on an object to determine the memory overhead associated with that object (for containers, this does not include the values stored in it, only the pointers used to store them).
So for example, a list
of 100 smallish int
s (but greater than 256 to avoid small int
cache) is (on my 3.5.1 Windows x64 install):
>>> sys.getsizeof([0] * 100) + sys.getsizeof(0) * 100
3264
or about 3 KB of memory required. If those same values were stored in a numpy
array
of int32
s, with no Python objects per number, and no per object pointers, the size would drop to roughly 100 * 4 (plus another few dozen bytes, for the array
object overhead itself), somewhere under 500 bytes. The incremental cost for each additional small int
is 24 bytes for the object (though it's free if it's in the small int cache for values from -5 to 256 IIRC), and 8 bytes for the storage in the list
, 32 bytes total, vs. 4 for the C level type, roughly 8x the storage requirements (and you're still storing the original object too).
If you have enough memory to deal with it, so be it. But otherwise, you might trying looking at a wrapping that lets you pass in buffer protocol supporting objects ( numpy.array
, array.array
on Py3, ctypes
arrays populated via memoryview slice assignment, etc.) so conversion to Python level types isn't needed.
Yes it will be new copy. The data layout for an array is very different from that of a list.
An array has attributes like shape and strides, and a 1d data buffer that contains the elements - just a contiguous set of bytes. It's the other attributes and code that treats them as floats, int, strings, 1d, 2d etc.
A list is a buffer of pointers, with each pointer pointing to an object else where in memory. It may point to a number, a string, or another list. It is not going to point to the array's databuffer or elements in it.
There are interfacing numpy arrays with compiled code and C arrays that make use of the array data buffer. cython
is a common on. There is also a whole documentation section on the C API for numpy. I know anything about pbind
. If it requires a list interface it may not be the best.
When I've done timeit
tests with tolist()
it hasn't appeared to be that expensive.
=======================
But looking at the pybind11
github I find a number of references to numpy
, and this
http://pybind11.readthedocs.io/en/latest/advanced.html#numpy-support
documentation page. It supports the buffer protocol, and numpy arrays. So you shouldn't have to go through the tolist
step.
#include <pybind11/numpy.h>
void f(py::array_t<double> array);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.