插入多索引熊猫数据框

Question

我需要插入多索引数据帧：

例如：

这是主要的数据框：

a    b    c    result
1    1    1    6
1    1    2    9
1    2    1    8
1    2    2    11
2    1    1    7
2    1    2    10
2    2    1    9
2    2    2    12

我需要找到以下结果：

1.3    1.7    1.55

到目前为止，我一直在为每个索引单独附加一个带有 NaN 的 pd.Series。

如你看到的。 这似乎是一种非常低效的方式。

如果有人能丰富我，我会很高兴。

PS 我花了一些时间查看 SO，如果答案在那里，我错过了：

用插值填充多索引 Pandas DataFrame

在 Pandas MultiIndex 中重新采样

pandas 多索引数据框，缺失值的 ND 插值

用插值填充多索引 Pandas DataFrame

算法：

阶段1：

a    b    c    result
1    1    1    6
1    1    2    9
1    2    1    8
1    2    2    11
1.3    1    1    6.3
1.3    1    2    9.3
1.3    2    1    8.3
1.3    2    2    11.3
2    1    1    7
2    1    2    10
2    2    1    9
2    2    2    12

第 2 阶段：

a    b    c    result
1    1    1    6
1    1    2    9
1    2    1    8
1    2    2    11
1.3    1    1    6.3
1.3    1    2    9.3
1.3    1.7    1    7.7
1.3    1.7    2    10.7
1.3    2    1    8.3
1.3    2    2    11.3
2    1    1    7
2    1    2    10
2    2    1    9
2    2    2    12

第 3 阶段：

a    b    c    result
1    1    1    6
1    1    2    9
1    2    1    8
1    2    2    11
1.3    1    1    6.3
1.3    1    2    9.3
1.3    1.7    1    7.7
1.3    1.7    1.55    9.35
1.3    1.7    2    10.7
1.3    2    1    8.3
1.3    2    2    11.3
2    1    1    7
2    1    2    10
2    2    1    9
2    2    2    12

Answer 1

你可以使用scipy.interpolate.LinearNDInterpolator来做你想做的事。 如果数据框是一个包含“a”、“b”和“c”列的 MultiIndex，则：

from scipy.interpolate import LinearNDInterpolator as lNDI
print (lNDI(points=df.index.to_frame().values, values=df.result.values)([1.3, 1.7, 1.55]))

现在，如果您有包含所有元组 (a、b、c) 的数据框作为您要计算的索引，您可以执行以下操作：

def pd_interpolate_MI (df_input, df_toInterpolate):
    from scipy.interpolate import LinearNDInterpolator as lNDI
    #create the function of interpolation
    func_interp = lNDI(points=df_input.index.to_frame().values, values=df_input.result.values)
    #calculate the value for the unknown index
    df_toInterpolate['result'] = func_interp(df_toInterpolate.index.to_frame().values)
    #return the dataframe with the new values
    return pd.concat([df_input, df_toInterpolate]).sort_index()

然后例如使用df和df_toI = pd.DataFrame(index=pd.MultiIndex.from_tuples([(1.3, 1.7, 1.55),(1.7, 1.4, 1.9)],names=df.index.names))然后你得到

print (pd_interpolate_MI(df, df_toI))
              result
a   b   c           
1.0 1.0 1.00    6.00
        2.00    9.00
    2.0 1.00    8.00
        2.00   11.00
1.3 1.7 1.55    9.35
1.7 1.4 1.90   10.20
2.0 1.0 1.00    7.00
        2.00   10.00
    2.0 1.00    9.00
        2.00   12.00

插入多索引熊猫数据框

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-12-20 17:09:51

插入多索引熊猫数据框

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-12-20 17:09:51

解决方案1
4 已采纳 2018-12-20 17:09:51