插入一個 numpy 數組以適合另一個數組

Question

假設我有形狀(1, n) some_data 。 我有形狀(1, n±x)新incoming_data ，其中 x 是一些比n小得多的正整數。 我想擠壓或拉伸incoming_data ，使其與n長度相同。 使用 SciPy 堆棧如何做到這一點？

這是我試圖完成的一個例子。

# Stretch arr2 to arr1's shape while "filling in" interpolated value
arr1 = np.array([1, 5, 2, 3, 7, 2, 1])
arr2 = np.array([1, 5, 2, 3, 7, 1])
result
> np.array([1, 5, 2, 3, 6.x, 2.x 1])  # of shape (arr1.shape)

再舉一個例子：

# Squeeze arr2 to arr1's shape while placing interpolated value.
arr1 = np.array([1, 5, 2, 3, 7, 2, 1])
arr2 = np.array([1, 5, 2, 3, 4, 7, 2, 1])
result
> np.array([1, 5, 2, 3.x, 7.x, 2.x, 1])  # of shape (arr1.shape)

Answer 1

您可以使用scipy.interpolate.interp1d實現這種簡單的數據壓縮或拉伸。 我並不是說這一定有意義（它對您使用的插值類型產生巨大影響，並且如果您能夠正確猜測基礎函數的行為，通常只會得到合理的結果），但是您可以做吧。

這個想法是將原始數組插入其索引作為x值，然后使用稀疏x網格執行插值，同時保持其端點相同。 所以基本上你必須對離散數據進行連續近似，並在必要的點重新采樣：

import numpy as np
import scipy.interpolate as interp
import matplotlib.pyplot as plt

arr_ref = np.array([1, 5, 2, 3, 7, 1])  # shape (6,), reference
arr1 = np.array([1, 5, 2, 3, 7, 2, 1])  # shape (7,), to "compress"
arr2 = np.array([1, 5, 2, 7, 1])        # shape (5,), to "stretch"
arr1_interp = interp.interp1d(np.arange(arr1.size),arr1)
arr1_compress = arr1_interp(np.linspace(0,arr1.size-1,arr_ref.size))
arr2_interp = interp.interp1d(np.arange(arr2.size),arr2)
arr2_stretch = arr2_interp(np.linspace(0,arr2.size-1,arr_ref.size))

# plot the examples, assuming same x_min, x_max for all data
xmin,xmax = 0,1
fig,(ax1,ax2) = plt.subplots(ncols=2)
ax1.plot(np.linspace(xmin,xmax,arr1.size),arr1,'bo-',
         np.linspace(xmin,xmax,arr1_compress.size),arr1_compress,'rs')
ax2.plot(np.linspace(xmin,xmax,arr2.size),arr2,'bo-',
         np.linspace(xmin,xmax,arr2_stretch.size),arr2_stretch,'rs') 
ax1.set_title('"compress"')
ax2.set_title('"stretch"')

結果圖：

在圖中，藍色圓圈是原始數據點，紅色方塊是插值點（它們在邊界處重疊）。 如您所見，我所說的壓縮和拉伸實際上是底層（默認為線性）函數的上采樣和下采樣。 這就是為什么我說你必須非常小心插值：如果你的期望與你的數據不匹配，你可能會得到非常錯誤的結果。

Answer 2

還有另一個非常適合上采樣和下采樣的包： resampy 。 它有一個比scipy.interpolate.interp1d更簡單的命令，但只使用一個插值函數。 正如@Andras Deak 所說，選擇插值函數時必須小心。

MWE：

import numpy as np
import resampy
from matplotlib import pyplot as plt

x_mesh = np.linspace(0,1,10)
short_arr = np.sin(x_mesh*2*np.pi)
plt.plot(short_arr)

interp_arr = resampy.resample(short_arr, 20, 100)
plt.plot(interp_arr)

兩個警告：

resampy使用“帶限正弦插值”。 查看文檔以獲取更多信息。 如果您的陣列最初來自具有本地頻率分量的數據，例如聲音、圖像和其他時間序列數據，則效果最佳。 它用於音頻的一些 tensorflow 示例中，這就是我使用的。 我不確定您的示例數組是否出於演示目的很小，但如果這確實是您的數組的大小，那么無論您使用線性、樣條或其他方法，插值都可能不好。
你的例子展示的不僅僅是插值。 您似乎找到了匹配的數組的一部分（例如[1,5,2,3] ），然后對其余部分進行插值。 根據您是要匹配數組的開頭還是任意數量的補丁，您可能需要兩種方法：一種用於識別要插入的數組的正確部分，另一種用於插入這些部分。 如果是這種情況，請查看numpy.isin以獲取基本方法或查看 levenshtein distance 以更一般地匹配一組子字符串。

插入一個 numpy 數組以適合另一個數組

問題描述

2 個解決方案

解決方案1
17 已采納 2016-06-28 00:44:47

解決方案2
3 2020-02-07 06:55:18

MWE：

插入一個 numpy 數組以適合另一個數組

問題描述

2 個解決方案

解決方案1 17 已采納 2016-06-28 00:44:47

解決方案2 3 2020-02-07 06:55:18

MWE：

解決方案1
17 已采納 2016-06-28 00:44:47

解決方案2
3 2020-02-07 06:55:18