向后填充 dataframe 列，其中填充的行数限制基于单元格的值，可能使用 bfill() 和 limit=x

Question

I have a dataframe that looks like this:我有一个看起来像这样的 dataframe：

import pandas as pd, numpy as np
df = pd.DataFrame({'Fill' : [0, 0, 0, 3, 0, 0, 0, 2, 0, 0, 1]})
df['flag'] = (df['Fill'] > 0)
df = df.replace(0,np.nan)
df

    Fill    flag
0   NaN     False
1   NaN     False
2   NaN     False
3   3.0     True
4   NaN     False
5   NaN     False
6   NaN     False
7   2.0     True
8   NaN     False
9   NaN     False
10  1.0     True

My goal is to backwards fill with bfill() and pass a dynamic limit based on the value of the cells in the Fill column.我的目标是使用bfill()反向填充，并根据Fill列中单元格的值传递动态limit 。 I have also created a flag column, which is True for any cell > 0. I did this to protect against the fact that values in the Fill column might become floats as they are filled, so I didn't want to apply the logic o those cells, which started as NaN.我还创建了一个flag列，对于任何 > 0 的单元格都是True 。我这样做是为了防止Fill列中的值在填充时可能会变成浮点数，所以我不想应用逻辑 o那些以 NaN 开头的单元格。 This is what I have tried:这是我尝试过的：

df['Fill'] = np.where((df['Fill'].notnull()) & (df.flag==True),
                      df['Fill'].apply(lambda x: x.bfill(limit=int(x-1))),
                      df['Fill'])

I am receiving an error: AttributeError: 'float' object has no attribute 'bfill' , but I thought that since I was filtering for the relevant rows with np.where that I could get around the nan values and that with int(x-1) , I could avoid the float issue.我收到一个错误： AttributeError: 'float' object has no attribute 'bfill' ，但我认为因为我正在使用np.where过滤相关行，所以我可以绕过 nan 值和int(x-1) ，我可以避免浮动问题。 I also tried something similar with the np.where on the inside of the .apply .我还尝试了与 .apply 内部的.apply类似的东西。 Any help is much appreciated.任何帮助深表感谢。 See expected output below:请参阅下面的预期 output：

expected output:预期 output：

    Fill    flag
0   NaN     False
1   3.0     False
2   3.0     False
3   3.0     True
4   NaN     False
5   NaN     False
6   2.0     False
7   2.0     True
8   NaN     False
9   NaN     False
10  1.0     True

Answer 1

You can create groups for each missing and last non missing values and replace by last values in custom function, if-else is necessaary for avoid error ValueError: Limit must be greater than 0 :您可以为每个缺失值和最后一个非缺失值创建组，并用自定义 function 中的最后一个值替换， if-else是避免错误ValueError: Limit must be greater than 0 ：

m = df['Fill'].notnull() & df.flag
g = m.iloc[::-1].cumsum().iloc[::-1]

f = lambda x: x.bfill(limit=int(x.iat[-1]-1)) if x.iat[-1] > 1 else x
df['Fill'] = df.groupby(g)['Fill'].apply(f)
print (df)
    Fill   flag
0    NaN  False
1    3.0  False
2    3.0  False
3    3.0   True
4    NaN  False
5    NaN  False
6    2.0  False
7    2.0   True
8    NaN  False
9    NaN  False
10   1.0   True

向后填充 dataframe 列，其中填充的行数限制基于单元格的值，可能使用 bfill() 和 limit=x

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-07-08 09:02:44

向后填充 dataframe 列，其中填充的行数限制基于单元格的值，可能使用 bfill() 和 limit=x

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-07-08 09:02:44

解决方案1
2 已采纳 2020-07-08 09:02:44