简体   繁体   English

提高将 numpy 数组转换为 MATLAB double 的性能

[英]Improve performance of converting numpy array to MATLAB double

Calling MATLAB from Python is bound to give some performance reduction that I could avoid by rewriting (a lot of) code in Python.从 Python 调用 MATLAB 必然会降低性能,而我可以通过在 Python 中重写(大量)代码来避免这种情况。 However, this isn't a realistic option for me, but it annoys me that a huge loss of efficiency lies in the simple conversion from a numpy array to a MATLAB double.然而,这对我来说不是一个现实的选择,但让我烦恼的是效率的巨大损失在于从 numpy 数组到 MATLAB double 的简单转换。

I'm talking about the following conversion from data1 to data1m, where我说的是以下从data1到data1m的转换,其中

data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
data1m = matlab.double(list(data1))

Here matlab.double comes from Mathworks own MATLAB package / engine.这里的 matlab.double 来自 Mathworks 自己的 MATLAB 包/引擎。 The second line of code takes 20 s on my system, which just seems like too much for a conversion that doesn't really do anything other than making the numbers 'edible' for MATLAB.第二行代码在我的系统上需要 20 秒,这对于一个转换来说似乎太多了,除了使数字“可食用”之外,它实际上并没有做任何其他事情。

So basically I'm looking for a trick opposite to the one given here that works for converting MATLAB output back to Python.所以基本上我正在寻找一种与这里给出的技巧相反的技巧,用于将 MATLAB 输出转换回 Python。

Passing numpy arrays efficiently有效地传递 numpy 数组

Take a look at the file mlarray_sequence.py in the folder PYTHONPATH\\Lib\\site-packages\\matlab\\_internal .看看文件mlarray_sequence.py在文件夹PYTHONPATH\\Lib\\site-packages\\matlab\\_internal There you will find the construction of the MATLAB array object.在那里您将找到 MATLAB 数组对象的构造。 The performance problem comes from copying data with loops within the generic_flattening function.性能问题来自在generic_flattening函数中使用循环复制数据。

To avoid this behavior we will edit the file a bit.为了避免这种行为,我们将稍微编辑文件。 This fix should work on complex and non-complex datatypes.此修复程序应适用于复杂和非复杂数据类型。

  1. Make a backup of the original file in case something goes wrong.如果出现问题,请备份原始文件。

  2. Add import numpy as np to the other imports at the beginning of the fileimport numpy as np添加到文件开头的其他导入

  3. In line 38 you should find:在第 38 行,您应该找到:

     init_dims = _get_size(initializer)

    replace this with:将其替换为:

     try: init_dims=initializer.shape except: init_dims = _get_size(initializer)
  4. In line 48 you should find:在第 48 行,您应该找到:

     if is_complex: complex_array = flat(self, initializer, init_dims, typecode) self._real = complex_array['real'] self._imag = complex_array['imag'] else: self._data = flat(self, initializer, init_dims, typecode)

    Replace this with:将其替换为:

     if is_complex: try: self._real = array.array(typecode,np.ravel(initializer, order='F').real) self._imag = array.array(typecode,np.ravel(initializer, order='F').imag) except: complex_array = flat(self, initializer,init_dims, typecode) self._real = complex_array['real'] self._imag = complex_array['imag'] else: try: self._data = array.array(typecode,np.ravel(initializer, order='F')) except: self._data = flat(self, initializer, init_dims, typecode)

Now you can pass a numpy array directly to the MATLAB array creation method.现在您可以将一个 numpy 数组直接传递给 MATLAB 数组创建方法。

data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
#faster
data1m = matlab.double(data1)
#or slower method
data1m = matlab.double(data1.tolist())

data2 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,)).astype(np.complex128)
#faster
data1m = matlab.double(data2,is_complex=True)
#or slower method
data1m = matlab.double(data2.tolist(),is_complex=True)

The performance in MATLAB array creation increases by a factor of 15 and the interface is easier to use now. MATLAB 数组创建的性能提高了 15 倍,界面现在更易于使用。

While awaiting better suggestions, I'll post the best trick I've come up with so far.在等待更好的建议的同时,我会发布迄今为止我想出的最好的技巧。 It comes down to saving the file with `scipy.io.savemat´ and then loading this file in MATLAB.归结为使用“scipy.io.savemat”保存文件,然后在 MATLAB 中加载该文件。

This is not the prettiest hack and it requires some care to ensure different processes relying on the same script don't end up writing and loading each other's .mat files, but the performance gain is worth it for me.这不是最漂亮的 hack,它需要注意确保依赖相同脚本的不同进程不会最终编写和加载彼此的 .mat 文件,但性能提升对我来说是值得的。

As a test case I wrote two simple, almost identical MATLAB functions that require 2 numpy arrays (I tested with length 1000000) and one int as input.作为一个测试用例,我编写了两个简单的、几乎相同的 MATLAB 函数,它们需要 2 个 numpy 数组(我测试长度为 1000000)和一个 int 作为输入。

function d = test(x, y, fs_signal)
d = sum((x + y))./double(fs_signal);

function d = test2(path)
load(path)
d = sum((x + y))./double(fs_signal);

The function test requires conversion, while test2 requires saving.功能test需要转换,而test2需要保存。

Testing test : Converting the two numpy arrays takes cirka 40 s on my system.测试test :在我的系统上转换两个 numpy 数组需要大约 40 秒。 The total time to prepare for and run test comes down to 170 s准备和运行测试的总时间降至170 秒

Testing test2 : Saving the arrays and int takes cirka 0.35 s on my system.测试test2 :在我的系统上保存数组和 int 需要大约 0.35 秒。 Suprisingly, loading the .mat file in MATLAB is extremely efficient (or more suprisingly, it is extremely ineffcient at dealing with its doubles)... The total time to prepare for and run test2 comes down to 0.38 s令人惊讶的是,在 MATLAB 中加载 .mat 文件非常有效(或者更令人惊讶的是,它在处理其双打方面效率极低)......准备和运行 test2 的总时间下降到0.38 秒

That's a performance gain of almost 450x...这几乎是 450 倍的性能提升......

My situation was a bit different (python script called from matlab) but for me converting the ndarray into an array.array massively speed up the process.我的情况有点不同(从 matlab 调用的 python 脚本),但对我来说,将 ndarray 转换为 array.array 大大加快了这个过程。 Basically it is very similar to Alexandre Chabot solution but without the need to alter any files:基本上它与 Alexandre Chabot 解决方案非常相似,但无需更改任何文件:

#untested i.e. only deducted from my "matlab calls python" situation
import numpy
import array

data1 = numpy.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
ar = array.array('d',data1.flatten('F').tolist())
p = matlab.double(ar)
C = matlab.reshape(p,data1.shape) #this part I am definitely not sure about if it will work like that

At least if done from Matlab the combination of "array.array" and "double" is relative fast.至少如果从 Matlab 完成,“array.array”和“double”的组合相对较快。 Tested with Matlab 2016b + python 3.5.4 64bit.使用 Matlab 2016b + python 3.5.4 64 位测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM