简体   繁体   English

Scipy径向基函数中的Python MemoryError(scipy.interpolate.rbf)

[英]Python MemoryError in Scipy Radial Basis Function (scipy.interpolate.rbf)

I'm trying to interpolate a not-so-large (~10.000 samples) pointcloud representing a 2D surface, using Scipy Radial Basis Function (Rbf). 我正在尝试使用Scipy径向基函数(Rbf)插值表示2D表面的不太大(~10,000个样本)的pointcloud。 I got some good results, but with my last datasets I'm consistently getting MemoryError , even though the error appears almost instantly during execution (the RAM is obviously not being eaten up). 我得到了一些好的结果,但是在我的最后一个数据集中,我一直得到MemoryError ,即使在执行期间几乎立即出现错误(RAM显然没有被吃掉)。

I decided to hack a copy of the rbf.py file from Scipy, starting by filling it up with some print statements, which have been very useful. 我决定从Scipy破解rbf.py文件的副本,首先填写一些非常有用的打印语句。 By decomposing the _euclidean_norm method line by line, like this: 通过_euclidean_norm分解_euclidean_norm方法,如下所示:

def _euclidean_norm(self, x1, x2):
    d = x1 - x2
    s = d**2
    su = s.sum(axis=0)
    sq = sqrt(su)
    return sq

I get the error in the first line: 我在第一行收到错误:

File "C:\MyRBF.py", line 68, in _euclidean_norm
    d = x1 - x2
MemoryError

That norm is called upon an array X1 in the form [[x1, y1], [x2, y2], [x3, y3], ..., [xn, yn]], and X2, which is X1 transposed by the following method inside Rbf class, already hacked by me with debugging purposes: 该规范被称为数组X1,形式为[[x1,y1],[x2,y2],[x3,y3],...,[xn,yn]]和X2,它们是X1转换的Rbf类中的以下方法,已经被我用于调试目的:

def _call_norm(self, x1, x2):
    print x1.shape
    print x2.shape
    print

    if len(x1.shape) == 1:
        x1 = x1[newaxis, :]
    if len(x2.shape) == 1:
        x2 = x2[newaxis, :]
    x1 = x1[..., :, newaxis]
    x2 = x2[..., newaxis, :]

    print x1.shape
    print x2.shape
    print

    return self._euclidean_norm(x1, x2)

Please notice that I print the shapes of inputs. 请注意我打印输入的形状。 With my current dataset, that's what I get (I added the comments manually): 使用我当前的数据集,这就是我得到的(我手动添加了注释):

(2, 10744)         ## Input array of 10744 x,y pairs
(2, 10744)         ## The same array, which is to be "reshaped/transposed"

(2, 10744, 1)      ## The first "reshaped/transposed" form of the array
(2, 1, 10744)      ## The second "reshaped/transposed" form of the array

The rationale is, according to documentation, to get "a matrix of the distances from each point in x1 to each point in x2", which mean, since the arrays are the same, a matrix of distances between every pair of the entry array (which contains the X and Y dimensions). 根据文档,理由是得到“从x1中的每个点到x2中的每个点的距离的矩阵”,这意味着,因为阵列是相同的,所以每对入口阵列之间的距离矩阵(其中包含X和Y维度)。

I tested the operation manually with much smaller arrays (shapes (2,5,1) and (2,1,5), for example) and the subtraction works. 我用更小的阵列(例如形状(2,5,1)和(2,1,5))手动测试操作,减法工作。

How can I find out why it is not working with my dataset? 如何找出不使用我的数据集的原因? Is there any other obvious error? 还有其他明显错误吗? Should I check some form of ill-conditioning of my dataset, or perform some pre-processing on it? 我应该检查我的数据集的某种形式的病态调节,还是对它进行一些预处理? I think it is well-conditioned, since I can plot it in 3D and the cloudpoint is visually very well formed. 我认为它条件良好,因为我可以用3D绘制它,并且cloudpoint在视觉上非常好。

Any help would be very much appreciated. 任何帮助将非常感谢。

Thanks for reading. 谢谢阅读。

Your dataset should be fine: the error appears because you don't have enough RAM to store the result of the subtraction. 您的数据集应该没问题:出现错误是因为您没有足够的RAM来存储减法的结果。

According to the broadcasting rules, the result will have shape 根据广播规则,结果将有形

 (2, 10744,     1)
-(2,     1, 10744)
------------------
 (2, 10744, 10744)

Assuming these are arrays of dtype float64, you need 2*10744**2*8 = 1.72 GiB of free memory. 假设这些是dtype float64的数组,则需要2 * 10744 ** 2 * 8 = 1.72 GiB的可用内存。 If there isn't enough free memory, numpy won't be able to allocate the output array and will immediately fail with the error you see. 如果没有足够的可用内存,numpy将无法分配输出数组,并且会立即失败并显示您看到的错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM