简体   繁体   English

大熊猫数据框列的插值

[英]interpolation on pandas dataframe columns

I need to do interpolation between 2 columns of pandas.DataFrame , to fill the column between them.我需要在 2 列pandas.DataFrame之间进行插值,以填充它们之间的列。 Here are a few rows of my data frame , the column to be filled is col2 :这是我的data frame的几行,要填充的列是col2

col1  col2  col3
2.35    1   2.37
2.47    1   2.49
2.51    1   2.53
2.57    1   2.58
2.54    1   2.57

So for interpolation I want to use numpy.interp(x,xp,fp) , but I can't figure out how to organize my data so that I will be able to use it.所以对于插值,我想使用numpy.interp(x,xp,fp) ,但我不知道如何组织我的数据以便我能够使用它。 That is because the interpolation should be between col1 and col3 for each row.那是因为每行的插值应该在col1col3之间 For example, for the first row I need it to look like that:例如,对于第一行,我需要它看起来像这样:

xp=[1,3]
fp=[2.47,2.49]
x=2
y=numpy.interp(x,xp,fp)

and then fill first row of col2 with y .然后用y填充col2第一行。 And I need to do that again and again for each row.我需要为每一行一次又一次地这样做。 How ?如何 ?

This will get you to iterate over every row, replacing the value between two cells.这将使您遍历每一行,替换两个单元格之间的值。 But the interpolation does not seem to be working.但插值似乎不起作用。 I don't have much experience with it, so I couldn't find an easy fix online.我没有太多的经验,所以我在网上找不到简单的解决方法。 That's the only line not doing the change of values.这是唯一不改变值的行。 (Idon't know what xp or x do, so I kept them) (我不知道 xp 或 x 是做什么的,所以我保留了它们)

xp=[1,3]
x = 2
for rowNr in range(len(df.index)):
    fp=[df.iat[rowNr, 0], df.iat[rowNr, 2]]
    df.iat[rowNr, 1] = numpy.interp(x, xp, fp)

As written, the x-values are static (unless I misunderstand your problem) with values of 1 and 3. You want to do a linear interpolation between these values and two y-values that change.正如所写,x 值是静态的(除非我误解了您的问题),值为 1 和 3。您希望在这些值和两个变化的 y 值之间进行线性插值。 You simply average the y-values and that is the linear-interpolated value.您只需平均 y 值,这就是线性插值。 Don't overlook simple/obvious solutions for something fancy (advice I try to remember all the time).不要忽视一些花哨的简单/明显的解决方案(我一直努力记住的建议)。

df.col2 = df[["col1", "col3"]].mean(axis=1)

BEGIN EDIT开始编辑

Andre's solution should work (haven't tested it myself, but should).安德烈的解决方案应该有效(我自己没有测试过,但应该有效)。 However, this requires iterating over every row, which can be slow.但是,这需要迭代每一行,这可能很慢。 Further, there is a simple mathematical solution that allows you to operate on arrays, which should be faster.此外,有一个简单的数学解决方案可以让您对数组进行操作,这应该会更快。

Linear interpolation follows the general form of:线性插值遵循以下一般形式:

y = y0 + (x - x0) * (y1 - y0) / (x1 - x0)

Putting this in terms of dataframes/code:把它放在数据帧/代码方面:

df.col2 = df.col1 + (x - xp[0]) * (df.col2 - df.col1) / (xp[1] - xp[0])

I think that got translated correctly, but the formula above holds.我认为翻译正确,但上面的公式成立。 Just implement it in your code or loop through each row and call the numpy.interp function.只需在您的代码中实现它或遍历每一行并调用 numpy.interp 函数。 Either way, you should be fine.无论哪种方式,你都应该没事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM