简体   繁体   English

Python / Pandas:用顺序填充 NaN - 线性插值 --> ffill --> bfill

[英]Python / Pandas: Fill NaN with order - linear interpolation --> ffill --> bfill

I have a df :我有一个df

     company  year      revenues
0  company 1  2019   1,425,000,000
1  company 1  2018   1,576,000,000
2  company 1  2017   1,615,000,000
3  company 1  2016   1,498,000,000
4  company 1  2015   1,569,000,000
5  company 2  2019             nan
6  company 2  2018   1,061,757,075
7  company 2  2017             nan
8  company 2  2016     573,414,893
9  company 2  2015     599,402,347

I would like to fill the nan values, with an order.我想用订单fill nan值。 I want to linearly interpolate first, then forward fill and then backward fill.我想先线性插值,然后是前向填充,然后是后向填充。 I currently have:我目前有:

f_2_impute = [x for x in cl_data.columns if cl_data[x].dtypes != 'O' and 'total' not in x and 'year' not in x]

def ffbf(x):
    return x.ffill().bfill()

group_with = ['company']

for x in cl_data[f_2_impute]:
    cl_data[x] = cl_data.groupby(group_with)[x].apply(lambda fill_it: ffbf(fill_it))

which performs ffill() and bfill() .它执行ffill()bfill() Ideally I want a function that tries first to linearly intepolate the missing values, then try forward filling them and then backward filling them.理想情况下,我想要一个 function 尝试首先线性插入缺失值,然后尝试向前填充它们,然后向后填充它们。

Any quick ways of achieving it?有什么快速实现的方法吗? Thanking you in advance.提前谢谢你。

I believe you need first convert columns to floats if , there:我相信你需要首先将列转换为浮点数,如果有:

df = pd.read_csv(file, thousands=',')

Or:或者:

df['revenues'] = df['revenues'].replace(',','', regex=True).astype(float)

and then add DataFrame.interpolate :然后添加DataFrame.interpolate

def ffbf(x):
    return x.interpolate().ffill().bfill()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM