使用插值重新索引 Pandas DataFrame

Question

我有一个带有 DateTimeIndex 和列“阈值”、“路径”的 pandas DataFrame：

                             Path  Threshold
2020-12-11 04:00:25.729  0.000104  -1.107422
2020-12-11 04:00:25.731  0.000387  -1.107422
2020-12-11 04:00:25.733  0.000899  -1.107422
2020-12-11 04:00:25.735  0.001561  -1.117676
2020-12-11 04:00:25.737  0.002272  -1.117676
...                           ...        ...
2020-12-11 04:01:03.063  9.085985  -1.209961
2020-12-11 04:01:03.065  9.085985  -1.209961
2020-12-11 04:01:03.067  9.085985  -1.209961
2020-12-11 04:01:03.069  9.085985  -1.199707
2020-12-11 04:01:03.071  9.085985  -1.199707

现在我想创建一个新的 DataFrame 索引在“路径”的线性间隔版本上，即

>>> np.arange(df["Path"].min(), df["Path"].max(), 0.05)
array([1.040000e-04, 5.010400e-02, 1.001040e-01, 1.501040e-01,
       2.001040e-01, 2.501040e-01, ...

“路径”中的值是单调的（但不是严格单调的）。 作为这个新的 DataFrame 的专栏，我想从“阈值”设置适当的插值，但是使用 pandas 的interpolate和 numpy 的interp方法我没能做到这一点。 有没有办法做到这一点？

Answer 1

一个想法是使用merge_asof ：

a = np.arange(df["Path"].min(), df["Path"].max(), 0.05)

df1 = pd.merge_asof(df.reset_index(), 
                    pd.DataFrame({'new':a}), 
                    left_on='Path', 
                    right_on='new', 
                    direction='nearest')

删除重复和DataFrame.reindex的另一个想法：

df2 = (df.drop_duplicates('Path')
         .reset_index()
         .set_index('Path')
         .reindex(a, method='nearest'))

使用插值重新索引 Pandas DataFrame

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-16 09:35:55

使用插值重新索引 Pandas DataFrame

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-16 09:35:55

解决方案1
1 已采纳 2020-12-16 09:35:55