将大型复杂数组从Python传递到C ++ - 我最好的选择是什么？

Question

2017/06/13 EDIT: I tried using boost as was suggested, but after spending more than 3 days trying to get it to compile and link, and failing, I decided that the stupid painful way was probably the fastest and less painfull.... so now my code just saves a mess of gigantic text files (splitting arrays and the complex/ imaginary parts of the numbers across files) that C++ then reads. 2017/06/13编辑：我尝试按照建议使用提升，但在花了超过3天试图让它进行编译和链接，并且失败之后，我认为愚蠢的痛苦方式可能是最快且不那么痛苦的.. ..所以现在我的代码只是保存了一堆巨大的文本文件（拆分数组和文件中数字的复杂/虚部），然后C ++读取。 Elegant... no.... effective... yes. 优雅......不......有效......是的。

I have some scientific code, currently written in Python, that is being slowed down by a numerical 3d integration step within a loop. 我有一些科学代码，目前用Python编写，在循环中通过数字3d集成步骤减慢。 To overcome this I am re-writing this particular step in C++. 为了克服这个问题，我在C ++中重写了这一特定步骤。 (Cython etc is not an option). （Cython等不是一个选项）。

Long story short: I want to transfer several very large arrays of complex numbers from the python code to the C++ integrator as conveniently and painlessly as possible. 简而言之：我希望尽可能方便，轻松地将几个非常大的复数数组从python代码传输到C ++集成器。 I could do this manually and painfully using text or binary files - but before I embark on this, I was wondering if I have any better options? 我可以使用文本或二进制文件手动和痛苦地做到这一点 - 但在我开始之前，我想知道我是否有更好的选择？

I'm using visual studio for C++ and anaconda for python (not my choice!) 我正在使用Visual Studio for C ++和anaconda for python（不是我的选择！）

Is there any file format or method that would make it quick and convenient to save an array of complex numbers from python and then recreate it in C++? 是否有任何文件格式或方法可以快速方便地从python中保存一组复数，然后在C ++中重新创建它？

Many thanks, Ben 非常感谢，本

Answer 1

An easy solution that I used many times is to build your "C++ side" as a dll (=shared object on Linux/OS X), provide a simple, C-like entrypoint (straight integers, pointers & co., no STL stuff) and pass the data through ctypes . 我多次使用的简单解决方案是将您的“C ++端”构建为dll（= Linux / OS X上的共享对象），提供一个简单的， 类似C的入口点（直整数，指针和co。，没有STL的东西））并通过ctypes传递数据。

This avoids boost/SIP/Swig/... build nightmares, can be kept zero-copy (with ctypes you can pass a straight pointer to your numpy data) and allow you to do whatever you want (especially on the build-side - no friggin' distutils, no boost, no nothing - build it with whatever can build a C-like dll) on the C++ side. 这避免了boost / SIP / Swig / ...构建的噩梦，可以保持零拷贝（使用ctypes可以直接指向你的numpy数据）并允许你做任何你想做的事情（特别是在构建方面 - 没有friggin'distutils，没有提升，没有任何东西 - 用C ++方面的任何可以构建类似C的dll来构建它。 It has also the nice side-effect of having your C++ algorithm callable from other languages (virtually any language has some way to interface with C libraries). 它还具有使用其他语言调用C ++算法的良好副作用（实际上任何语言都有某种方式可以与C库进行交互）。

Here 'sa quick artificial example. 这是一个快速的人为例子。 The C++ side is just: C ++方面只是：

extern "C" {
double sum_it(double *array, int size) {
    double ret = 0.;
    for(int i=0; i<size; ++i) {
        ret += array[i];
    }
    return ret;
}
}

This has to be compiled to a dll (on Windows) or a .so (on Linux), making sure to export the sum_it function (automatic with gcc, requires a .def file with VC++). 这必须编译为dll（在Windows上）或.so （在Linux上），确保导出sum_it函数（使用gcc自动，需要带VC ++的.def文件）。

On the Python side, we can have a wrapper like 在Python方面，我们可以有一个包装器

import ctypes
import os
import sys
import numpy as np

path = os.path.dirname(__file__)
cdll = ctypes.CDLL(os.path.join(path, "summer.dll" if sys.platform.startswith("win") else "summer.so"))
_sum_it = cdll.sum_it
_sum_it.restype = ctypes.c_double

def sum_it(l):
    if isinstance(l, np.ndarray) and l.dtype == np.float64 and len(l.shape)==1:
        # it's already a numpy array with the right features - go zero-copy
        a = l.ctypes.data
    else:
        # it's a list or something else - try to create a copy
        arr_t = ctypes.c_double * len(l)
        a = arr_t(*l)
    return _sum_it(a, len(l))

which makes sure that the data is marshaled correctly; 这可以确保数据正确封送; then invoking the function is as trivial as 然后调用该函数就像微不足道的那样

import summer
import numpy as np
# from a list (with copy)
print summer.sum_it([1, 2, 3, 4.5])
# from a numpy array of the right type - zero-copy
print summer.sum_it(np.array([3., 4., 5.]))

See the ctypes documentation for more information on how to use it. 有关如何使用它的更多信息，请参阅ctypes文档。 See also the relevant documentation in numpy . 另请参阅numpy中的相关文档。

For complex numbers, the situation is slightly more complicated, as there's no builtin for it in ctypes; 对于复杂的数字，情况稍微复杂一些，因为在ctypes中没有内置的东西; if we want to use std::complex<double> on the C++ side (which is pretty much guaranteed to work fine with the numpy complex layout, namely a sequence of two doubles), we can write the C++ side as: 如果我们想在C ++端使用std::complex<double> （几乎可以保证在numpy复杂布局中工作正常，即两个双精度序列），我们可以将C ++编写为：

extern "C" {
std::complex<double> sum_it_cplx(std::complex<double> *array, int size) {
    std::complex<double> ret(0., 0.);
    for(int i=0; i<size; ++i) {
        ret += array[i];
    }
    return ret;
}
}

Then, on the Python side, we have to replicate the c_complex layout to retrieve the return value (or to be able to build complex arrays without numpy): 然后，在Python方面，我们必须复制c_complex布局以检索返回值（或者能够构建没有numpy的复杂数组）：

class c_complex(ctypes.Structure):
    # Complex number, compatible with std::complex layout
    _fields_ = [("real", ctypes.c_double), ("imag", ctypes.c_double)]

    def __init__(self, pycomplex):
        # Init from Python complex
        self.real = pycomplex.real
        self.imag = pycomplex.imag

    def to_complex(self):
        # Convert to Python complex
        return self.real + (1.j) * self.imag

Inheriting from ctypes.Structure enables the ctypes marshalling magic, which is performed according to the _fields_ member; 继承自ctypes.Structure启用ctypes编组魔法，这是根据_fields_成员执行的; the constructor and extra methods are just for ease of use on the Python side. 构造函数和额外的方法只是为了易于在Python端使用。

Then, we have to tell ctypes the return type 然后，我们必须告诉ctypes返回类型

_sum_it_cplx = cdll.sum_it_cplx
_sum_it_cplx.restype = c_complex

and finally write our wrapper, in a similar fashion to the previous one: 最后以与前一个类似的方式编写我们的包装器：

def sum_it_cplx(l):
    if isinstance(l, np.ndarray) and l.dtype == np.complex and len(l.shape)==1:
        # the numpy array layout for complexes (sequence of two double) is already
        # compatible with std::complex (see https://stackoverflow.com/a/5020268/214671)
        a = l.ctypes.data
    else:
        # otherwise, try to build our c_complex
        arr_t = c_complex * len(l)
        a = arr_t(*(c_complex(r) for r in l))
    ret = _sum_it_cplx(a, len(l))
    return ret.to_complex()

Testing it as above 如上所述进行测试

# from a complex list (with copy)
print summer.sum_it_cplx([1. + 0.j, 0 + 1.j, 2 + 2.j])
# from a numpy array of the right type - zero-copy
print summer.sum_it_cplx(np.array([1. + 0.j, 0 + 1.j, 2 + 2.j]))

yields the expected results: 产生预期结果：

(3+3j)
(3+3j)

Answer 2

Note added in edit. 注意在编辑中添加。 As mentioned in the comments, python itself, being an interpreted language, has little potential for computational efficiency. 正如评论中所提到的，作为解释语言的python本身几乎没有计算效率的潜力。 So in order to make python scripts efficient, one must use modules which aren't all interpreted, but under the hood call compiled (and optimized) code written in, say, C/C++. 因此，为了使python脚本高效，必须使用并非全部解释的模块，但必须使用编写（和优化）编写的代码，例如C / C ++。 This is exactly what numpy does for you, in particular for operations on whole arrays. 这正是numpy为您所做的，特别是对整个数组的操作。

Therefore, the first step towards efficient python scripts is the usage of numpy . 因此，实现高效python脚本的第一步是使用numpy 。 Only the second step is to try to use your own compiled (and optimized) code. 只有第二步是尝试使用您自己编译（和优化）的代码。 Therefore, I have assumed in my example below that you were using numpy to store the array of complex numbers. 因此，我在下面的示例中假设您使用numpy来存储复数数组。 Everything else would be ill-advised. 其他一切都是不明智的。

There are various ways in which you can access python's original data from within a C/C++ program. 有多种方法可以从C / C ++程序中访问python的原始数据。 I personally have done this with boost.Python , but must warn you that the documentation and support are lousy at best: you're pretty much on your own (and stack overflow, of course). 我个人已经用boost.Python做了这个，但是必须警告你文档和支持充其量是糟糕的：你几乎是你自己（当然是堆栈溢出）。

For example your C++ file may look like this 例如，您的C ++文件可能如下所示

// file.cc
#include <boost/python.hpp>
#include <boost/python/numpy.hpp>

namespace p = boost::python;
namespace n = p::numpy;

n::ndarray func(const n::ndarray&input, double control_variable)
{
  /* 
     your code here, see documentation for boost python
     you pass almost any python variable, doesn't have to be numpy stuff
  */
}

BOOST_PYTHON_MODULE(module_name)
{
  Py_Initialize();
  n::initialize();   // only needed if you use numpy in the interface
  p::def("function", func, "doc-string");
}

to compile this, you may use a python script such as 要编译它，你可以使用python脚本，如

# setup.py

from distutils.core import setup
from distutils.extension import Extension

module_name = Extension(
    'module_name',
    extra_compile_args=['-std=c++11','-stdlib=libc++','-I/some/path/','-march=native'],
    extra_link_args=['-stdlib=libc++'],
    sources=['file.cc'],
    libraries=['boost_python','boost_numpy'])

setup(
    name='module_name',
    version='0.1',
    ext_modules=[module_name])

and run it as python setup.py build , which will create an appropriate .so file in a sub-directory of build , which you can import from python. 并运行它作为python setup.py build ，这将创造一个合适的.so在一个子目录的文件build ，您可以从蟒蛇进口。

Answer 3

I see the OP is over a year old now, but I recently addressed a similar problem using the native Python-C/C++ API and its Numpy-C/C++ extension, and since I personally don't enjoy using ctypes for various reasons (eg, complex number workarounds, messy code), nor Boost, wanted to post my answer for future searchers. 我看到OP已经超过一年了，但我最近使用原生的Python-C / C ++ API及其Numpy-C / C ++扩展解决了类似的问题，因为我个人不喜欢因各种原因使用ctypes（例如，复杂的数字变通办法，凌乱的代码），也没有Boost想要为未来的搜索者发布我的答案。

Documentation for the Python-C API and Numpy-C API are both quite extensive (albeit a little overwhelming at first). Python-C API和Numpy-C API的文档都非常广泛（虽然起初有点压倒性）。 But after one or two successes, writing native C/C++ extensions becomes very easy. 但经过一两次成功，编写本机C / C ++扩展变得非常容易。

Here is an example C++ function that can be called from Python. 这是一个可以从Python调用的示例C ++函数。 It integrates a 3D numpy array of either real or complex ( numpy.double or numpy.cdouble ) type. 它集成了一个真实或复杂（ numpy.double或numpy.cdouble ）类型的3D numpy数组。 The function will be imported through a DLL ( .so ) via the module cintegrate.so . 该函数将通过DLL（ .so ）通过模块cintegrate.so 。

#include "Python.h"
#include "numpy/arrayobject.h"
#include <math.h>

static PyObject * integrate3(PyObject * module, PyObject * args)
{
    PyObject * argy=NULL;        // Regular Python/C API
    PyArrayObject * yarr=NULL;   // Extended Numpy/C API
    double dx,dy,dz;

    // "O" format -> read argument as a PyObject type into argy (Python/C API)
    if (!PyArg_ParseTuple(args, "Oddd", &argy,&dx,&dy,&dz)
    {
        PyErr_SetString(PyExc_ValueError, "Error parsing arguments.");
        return NULL;
    }

    // Determine if it's a complex number array (Numpy/C API)
    int DTYPE = PyArray_ObjectType(argy, NPY_FLOAT); 
    int iscomplex = PyTypeNum_ISCOMPLEX(DTYPE);      

    // parse python object into numpy array (Numpy/C API)
    yarr = (PyArrayObject *)PyArray_FROM_OTF(argy, DTYPE, NPY_ARRAY_IN_ARRAY);
    if (yarr==NULL) {
        Py_INCREF(Py_None);
        return Py_None;
    }

    //just assume this for 3 dimensional array...you can generalize to N dims
    if (PyArray_NDIM(yarr) != 3) {
        Py_CLEAR(yarr);
        PyErr_SetString(PyExc_ValueError, "Expected 3 dimensional integrand");
        return NULL;
    }

    npy_intp * dims = PyArray_DIMS(yarr);
    npy_intp i,j,k,m;
    double * p;

    //initialize variable to hold result
    Py_complex result = {.real = 0, .imag = 0};

    if (iscomplex) {
        for (i=0;i<dims[0];i++) 
            for (j=0;j<dims[1];j++) 
                for (k=0;k<dims[1];k++) {
                    p = (double*)PyArray_GETPTR3(yarr, i,j,k);
                    result.real += *p;
                    result.imag += *(p+1);
                }
    } else {
        for (i=0;i<dims[0];i++) 
            for (j=0;j<dims[1];j++) 
                for (k=0;k<dims[1];k++) {
                    p = (double*)PyArray_GETPTR3(yarr, i,j,k);
                    result.real += *p;
                }
    }

    //multiply by step size
    result.real *= (dx*dy*dz);
    result.imag *= (dx*dy*dz);

    Py_CLEAR(yarr);

    //copy result into returnable type with new reference
    if (iscomplex) {
        return Py_BuildValue("D", &result);
    } else {
        return Py_BuildValue("d", result.real);
    }

};

Simply put that into a source file (we'll call it cintegrate.cxx along with the module definition stuff, inserted at the bottom: 简单地把它放到一个源文件中（我们称之为cintegrate.cxx以及模块定义的东西，插在底部：

static PyMethodDef cintegrate_Methods[] = {
    {"integrate3",  integrate3, METH_VARARGS,
     "Pass 3D numpy array (double or complex) and dx,dy,dz step size. Returns Reimman integral"},
    {NULL, NULL, 0, NULL}        /* Sentinel */
};


static struct PyModuleDef module = {
   PyModuleDef_HEAD_INIT,
   "cintegrate",   /* name of module */
   NULL, /* module documentation, may be NULL */
   -1,       /* size of per-interpreter state of the module,
                or -1 if the module keeps state in global variables. */
   cintegrate_Methods
};

Then compile that via setup.py much like Walter's boost example with just a couple obvious changes- replacing file.cc there with our file cintegrate.cxx , removing boost dependencies, and making sure the path to "numpy/arrayobject.h" is included. 然后编译通过setup.py很像只有几个明显的changes-取代沃尔特的升压例如file.cc与我们的文件中，有cintegrate.cxx ，消除升压依赖关系，并确保路径"numpy/arrayobject.h"包括。

In python then you can use it like: 在python中，你可以使用它：

import cintegrate
import numpy as np

arr = np.random.randn(4,8,16) + 1j*np.random.randn(4,8,16)

# arbitrary step size dx = 1., y=0.5, dz = 0.25
ans = cintegrate.integrate3(arr, 1.0, 0.5, .25)

This specific code hasn't been tested but is mostly copied from working code. 此特定代码尚未经过测试，但主要是从工作代码中复制而来。

将大型复杂数组从Python传递到C ++ - 我最好的选择是什么？

问题描述

3 个解决方案

解决方案1
6 已采纳 2017-06-13 08:24:01

解决方案2
1 2017-06-07 15:06:00

解决方案3
1 2018-10-23 23:05:39

将大型复杂数组从Python传递到C ++ - 我最好的选择是什么？

问题描述

3 个解决方案

解决方案1 6 已采纳 2017-06-13 08:24:01

解决方案2 1 2017-06-07 15:06:00

解决方案3 1 2018-10-23 23:05:39

解决方案1
6 已采纳 2017-06-13 08:24:01

解决方案2
1 2017-06-07 15:06:00

解决方案3
1 2018-10-23 23:05:39