简体   繁体   English

插入一个 numpy 数组以适合另一个数组

[英]Interpolating a numpy array to fit another array

Say I have some_data of shape (1, n) .假设我有形状(1, n) some_data I have new incoming_data of shape (1, n±x) , where x is some positive integer much smaller than n .我有形状(1, n±x)incoming_data ,其中 x 是一些比n小得多的正整数。 I would like to squeeze or stretch incoming_data such that it is of the same length as n .我想挤压或拉伸incoming_data ,使其与n长度相同。 How might this be done, using the SciPy stack?使用 SciPy 堆栈如何做到这一点?

Here's an example of what I'm trying to accomplish.这是我试图完成的一个例子。

# Stretch arr2 to arr1's shape while "filling in" interpolated value
arr1 = np.array([1, 5, 2, 3, 7, 2, 1])
arr2 = np.array([1, 5, 2, 3, 7, 1])
result
> np.array([1, 5, 2, 3, 6.x, 2.x 1])  # of shape (arr1.shape)

As another example:再举一个例子:

# Squeeze arr2 to arr1's shape while placing interpolated value.
arr1 = np.array([1, 5, 2, 3, 7, 2, 1])
arr2 = np.array([1, 5, 2, 3, 4, 7, 2, 1])
result
> np.array([1, 5, 2, 3.x, 7.x, 2.x, 1])  # of shape (arr1.shape)

You can implement this simple compression or stretching of your data using scipy.interpolate.interp1d .您可以使用scipy.interpolate.interp1d实现这种简单的数据压缩或拉伸。 I'm not saying it necessarily makes sense (it makes a huge difference what kind of interpolation you're using, and you'll generally only get a reasonable result if you can correctly guess the behaviour of the underlying function), but you can do it.我并不是说这一定有意义(它对您使用的插值类型产生巨大影响,并且如果您能够正确猜测基础函数的行为,通常只会得到合理的结果),但是您可以做吧。

The idea is to interpolate your original array over its indices as x values, then perform interpolation with a sparser x mesh, while keeping its end points the same.这个想法是将原始数组插入其索引作为x值,然后使用稀疏x网格执行插值,同时保持其端点相同。 So essentially you have to do a continuum approximation to your discrete data, and resample that at the necessary points:所以基本上你必须对离散数据进行连续近似,并在必要的点重新采样:

import numpy as np
import scipy.interpolate as interp
import matplotlib.pyplot as plt

arr_ref = np.array([1, 5, 2, 3, 7, 1])  # shape (6,), reference
arr1 = np.array([1, 5, 2, 3, 7, 2, 1])  # shape (7,), to "compress"
arr2 = np.array([1, 5, 2, 7, 1])        # shape (5,), to "stretch"
arr1_interp = interp.interp1d(np.arange(arr1.size),arr1)
arr1_compress = arr1_interp(np.linspace(0,arr1.size-1,arr_ref.size))
arr2_interp = interp.interp1d(np.arange(arr2.size),arr2)
arr2_stretch = arr2_interp(np.linspace(0,arr2.size-1,arr_ref.size))

# plot the examples, assuming same x_min, x_max for all data
xmin,xmax = 0,1
fig,(ax1,ax2) = plt.subplots(ncols=2)
ax1.plot(np.linspace(xmin,xmax,arr1.size),arr1,'bo-',
         np.linspace(xmin,xmax,arr1_compress.size),arr1_compress,'rs')
ax2.plot(np.linspace(xmin,xmax,arr2.size),arr2,'bo-',
         np.linspace(xmin,xmax,arr2_stretch.size),arr2_stretch,'rs') 
ax1.set_title('"compress"')
ax2.set_title('"stretch"')

The resulting plot:结果图:

结果

In the plots, blue circles are the original data points, and red squares are the interpolated ones (these overlap at the boundaries).在图中,蓝色圆圈是原始数据点,红色方块是插值点(它们在边界处重叠)。 As you can see, what I called compressing and stretching is actually upsampling and downsampling of an underlying (linear, by default) function.如您所见,我所说的压缩和拉伸实际上是底层(默认为线性)函数的上采样和下采样。 This is why I said you must be very careful with interpolation: you can get very wrong results if your expectations don't match your data.这就是为什么我说你必须非常小心插值:如果你的期望与你的数据不匹配,你可能会得到非常错误的结果。

There's another package that works very well for upsampling and downsampling: resampy .还有另一个非常适合上采样和下采样的包: resampy It has a simpler command than scipy.interpolate.interp1d but only uses a single interpolation function.它有一个比scipy.interpolate.interp1d更简单的命令,但只使用一个插值函数。 As @Andras Deak said, you have to be careful in choosing interpolation functions.正如@Andras Deak 所说,选择插值函数时必须小心。

MWE: MWE:

import numpy as np
import resampy
from matplotlib import pyplot as plt

x_mesh = np.linspace(0,1,10)
short_arr = np.sin(x_mesh*2*np.pi)
plt.plot(short_arr)

粗略图

interp_arr = resampy.resample(short_arr, 20, 100)
plt.plot(interp_arr)

精细绘图
Two words of caution:两个警告:

  1. resampy uses a "band-limited sinc interpolation". resampy使用“带限正弦插值”。 Check the documentation for more info.查看文档以获取更多信息。 It works best if your array originally came from data with local frequency components, eg sound, images, and other time-series data.如果您的阵列最初来自具有本地频率分量的数据,例如声音、图像和其他时间序列数据,则效果最佳。 It's used in some of the tensorflow examples on audio, which is what I use.它用于音频的一些 tensorflow 示例中,这就是我使用的。 I'm not sure whether your example array was small for demonstration purposes, but if that truly is the size of your array, interpolating may be bad whatever method you use, linear, spline, or otherwise.我不确定您的示例数组是否出于演示目的很小,但如果这确实是您的数组的大小,那么无论您使用线性、样条或其他方法,插值都可能不好。

  2. Your examples demonstrated more than interpolation.你的例子展示的不仅仅是插值。 It seems you found a portion of the arrays that matched (eg [1,5,2,3] ) then interpolated the rest.您似乎找到了匹配的数组的一部分(例如[1,5,2,3] ),然后对其余部分进行插值。 Depending on whether you want to match the beginning of the array or an arbitrary number of patches, you may be asking for a two methods: one to identify the correct portions of an array to interpolate, and one to interpolate those portions.根据您是要匹配数组的开头还是任意数量的补丁,您可能需要两种方法:一种用于识别要插入的数组的正确部分,另一种用于插入这些部分。 If that's the case, look at numpy.isin for a basic method or levenshtein distance for more generally matching a set of substrings.如果是这种情况,请查看numpy.isin以获取基本方法或查看 levenshtein distance 以更一般地匹配一组子字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM