简体   繁体   English

在大型 numpy 数组中插入 NaN 值

[英]Interpolate NaN values in a large numpy array

I want to replace all NaN values in a numpy array (63*479060).我想替换 numpy 数组(63*479060)中的所有 NaN 值。 I referred to this question Interpolate NaN values in a numpy array and tried the following code but it does not give the results of interpolation (because of the large size of the array I think).我提到了这个问题Interpolate NaN values in a numpy array并尝试了以下代码,但它没有给出插值的结果(因为我认为数组的大小很大)。

a = np.arange(30180780).reshape((63, 479060)).astype(float)
a[np.random.randint(2, size=(63, 479060)).astype(bool)] = np.NaN
x, y = np.indices(a.shape)
interp = np.array(a)
interp[np.isnan(interp)] = griddata(
(x[~np.isnan(a)], y[~np.isnan(a)]), 
a[~np.isnan(a)],                    
(x[np.isnan(a)], y[np.isnan(a)]))   

Is there an efficient way to interpolate NaN in such a large array?有没有一种有效的方法可以在如此大的数组中插入 NaN? Thanks a lot.非常感谢。

Interpolation on unstructured meshes turns out to be very expensive .非结构化网格上的插值结果非常昂贵 The Scipy code is a bit optimized as it is written in Cython and use the QHull library internally. Scipy 代码经过一些优化,因为它是用 Cython 编写的,并且在内部使用QHull库。 The algorithm first construct the interpolants by triangulating the input data and then performs a linear barycentric interpolation on each triangle.该算法首先通过对输入数据进行三角剖分来构造插值,然后对每个三角形执行线性重心插值。 The computation of the Delaunay triangulation (running in O(n log n) time) is very slow in this case despite the use of a specialized native C library: nearly all the time is computing it.尽管使用了专门的本地 C 库,但在这种情况下, Delaunay 三角剖分(在O(n log n)时间内运行)的计算非常慢:几乎所有时间都在计算它。

The code executed by QHull is appear to be clearly sub-optimal as it is sequential, not vectorized using SIMD instructions and binaries do not benefit from FMA instruction sets. QHull 执行的代码显然是次优的,因为它是顺序的,没有使用 SIMD 指令进行矢量化,并且二进制文件不能从 FMA 指令集中受益。 It is also generic: not specifically optimized for the 2D case.它也是通用的:没有专门针对 2D 案例进行优化。 An optimized specific implementation can certainly be much faster but it is hard/tedious to implement efficiently (even for quite skilled developers).优化的特定实现当然可以快得多,但高效实现很难/乏味(即使对于非常熟练的开发人员也是如此)。 * Recompiling the QHull library with more aggressive compiler optimizations should certainly help (like -O3 and -march=native ). * 使用更积极的编译器优化重新编译 QHull 库肯定会有所帮助(如-O3-march=native )。

Another possible optimization consists in splitting the space in N parts and perform the linear interpolation on each part independently in N separate threads .另一种可能的优化包括将空间分成 N 个部分,并在N 个单独的线程中独立地对每个部分执行线性插值。 This can be faster because SciPy disable the Global Interpreter Lock (GIL) when doing this computation and the GIL is usually what prevent threads to speed compute-bound operations.这可能会更快,因为 SciPy 在执行此计算时禁用全局解释器锁 (GIL),而 GIL 通常会阻止线程加速计算绑定操作。 That being said, this is not easy to split the space correctly because some point can be on the boundary.话虽如此,正确分割空间并不容易,因为某些点可能位于边界上。 In practice, one need to include an additional ghost area in each parts of the unstructured mesh to do that correctly (which is unfortunately not trivial to do).在实践中,需要在非结构化网格的每个部分中包含一个额外的重影区域才能正确执行此操作(不幸的是,这并非易事)。

Another solution consists in using approximations .另一种解决方案在于使用近似值 Indeed, you can find the K neast point using a Ball-Tree algorithm (implemented in ScipPy) and then perform a linear interpolation based on the gathered points.实际上,您可以使用Ball-Tree 算法(在 ScipPy 中实现)找到 K 近点,然后根据收集的点执行线性插值。

Finally, a last solution consist in reimplementing the SciPy method using possibly more optimized libraries like CGAL which is known to be quite fast (there is a Python binding but I am not sure about its performance).最后,最后一个解决方案是使用可能更优化的库(如 CGAL)重新实现 SciPy 方法,CGAL 已知速度非常快(有一个 Python 绑定,但我不确定它的性能)。 It can easily compute the triangulation of the unstructured mesh (which should take few seconds if optimized).它可以很容易地计算非结构化网格的三角剖分(如果优化应该需要几秒钟)。 Then, one can match the facets with the points using a KD tree.然后,可以使用 KD 树将分面与点进行匹配。 That being said, CGAL appears to supports interpolations directly .话虽如此, CGAL 似乎直接支持插值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM