[英]Pandas:Fill null values with last available values and a flag
I am looking for a logic to produce an output/update values in value column based on flag Y. Notice the second N in bold.我正在寻找一种逻辑来根据标志 Y 在 value 列中生成输出/更新值。注意第二个 N 粗体。 We won't be filling values for next two Ys since the last value is N and it's null.
我们不会为接下来的两个 Y 填充值,因为最后一个值是 N,它是 null。 If N has a value we can ffill next Y row.
如果 N 有一个值,我们可以填充下一个 Y 行。
I have tried using df_latest.loc[(df_latest['flag'] == 'Y'), 'value'] =df_latest['value'].fillna(method='ffill') This logic doesn't cover the scenario when N is null and it forward fills all the preceding the NUll row.我试过使用 df_latest.loc[(df_latest['flag'] == 'Y'), 'value'] =df_latest['value'].fillna(method='ffill') 这个逻辑不包括场景当 N 为 null 并且它向前填充 NUll 行之前的所有内容时。
flag value new_val
Y 1 1
Y 2 2
Y NaN 2
N 3 3
Y NaN 3
Y 5 5
N NaN NaN
Y NaN NaN
Y NaN NaN
N 6 6
Y NaN 6
Y NaN 6
Y NaN 6
Y NaN 6
Y NaN 6
We can use GroupBy.ffill
to fill by groups, so whenever flag == N
and value
is null it will not be filled until value is other than null, to fill only when flag is Y
you can use the commented code.我们可以使用
GroupBy.ffill
来按组填充,所以每当flag == N
并且value
null 时,它才会被填充,直到值不是 null 时才填充,仅当 flag 为Y
时才填充,您可以使用注释代码。
blocks = (df['flag'].eq('N') & df['value'].isnull()).cumsum()
df['new_val'] = df['value'].groupby(blocks).ffill()
# if you want fill only if flag is Y
#df['new_val'] = df['value'].fillna(df['value'].groupby(blocks)
# .ffill()
# .where(df['flag'].eq('Y'))
# )
print(df)
Output Output
flag value new_val
0 Y 1.0 1.0
1 Y 2.0 2.0
2 Y NaN 2.0
3 N 3.0 3.0
4 Y NaN 3.0
5 Y 5.0 5.0
6 N NaN NaN
7 Y NaN NaN
8 Y NaN NaN
9 N 6.0 6.0
10 Y NaN 6.0
11 Y NaN 6.0
12 Y NaN 6.0
13 Y NaN 6.0
14 Y NaN 6.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.