[英]Backwards fill dataframe column where limit of rows filled is based on value of cell, perhaps with bfill() and limit=x
I have a dataframe that looks like this:我有一个看起来像这样的 dataframe:
import pandas as pd, numpy as np
df = pd.DataFrame({'Fill' : [0, 0, 0, 3, 0, 0, 0, 2, 0, 0, 1]})
df['flag'] = (df['Fill'] > 0)
df = df.replace(0,np.nan)
df
Fill flag
0 NaN False
1 NaN False
2 NaN False
3 3.0 True
4 NaN False
5 NaN False
6 NaN False
7 2.0 True
8 NaN False
9 NaN False
10 1.0 True
My goal is to backwards fill with bfill()
and pass a dynamic limit
based on the value of the cells in the Fill
column.我的目标是使用
bfill()
反向填充,并根据Fill
列中单元格的值传递动态limit
。 I have also created a flag
column, which is True
for any cell > 0. I did this to protect against the fact that values in the Fill
column might become floats as they are filled, so I didn't want to apply the logic o those cells, which started as NaN.我还创建了一个
flag
列,对于任何 > 0 的单元格都是True
。我这样做是为了防止Fill
列中的值在填充时可能会变成浮点数,所以我不想应用逻辑 o那些以 NaN 开头的单元格。 This is what I have tried:这是我尝试过的:
df['Fill'] = np.where((df['Fill'].notnull()) & (df.flag==True),
df['Fill'].apply(lambda x: x.bfill(limit=int(x-1))),
df['Fill'])
I am receiving an error: AttributeError: 'float' object has no attribute 'bfill'
, but I thought that since I was filtering for the relevant rows with np.where
that I could get around the nan values and that with int(x-1)
, I could avoid the float issue.我收到一个错误:
AttributeError: 'float' object has no attribute 'bfill'
,但我认为因为我正在使用np.where
过滤相关行,所以我可以绕过 nan 值和int(x-1)
,我可以避免浮动问题。 I also tried something similar with the np.where on the inside of the .apply
.我还尝试了与 .apply 内部的
.apply
类似的东西。 Any help is much appreciated.任何帮助深表感谢。 See expected output below:
请参阅下面的预期 output:
expected output:预期 output:
Fill flag
0 NaN False
1 3.0 False
2 3.0 False
3 3.0 True
4 NaN False
5 NaN False
6 2.0 False
7 2.0 True
8 NaN False
9 NaN False
10 1.0 True
You can create groups for each missing and last non missing values and replace by last values in custom function, if-else
is necessaary for avoid error ValueError: Limit must be greater than 0
:您可以为每个缺失值和最后一个非缺失值创建组,并用自定义 function 中的最后一个值替换,
if-else
是避免错误ValueError: Limit must be greater than 0
:
m = df['Fill'].notnull() & df.flag
g = m.iloc[::-1].cumsum().iloc[::-1]
f = lambda x: x.bfill(limit=int(x.iat[-1]-1)) if x.iat[-1] > 1 else x
df['Fill'] = df.groupby(g)['Fill'].apply(f)
print (df)
Fill flag
0 NaN False
1 3.0 False
2 3.0 False
3 3.0 True
4 NaN False
5 NaN False
6 2.0 False
7 2.0 True
8 NaN False
9 NaN False
10 1.0 True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.