[英]Dataframe forward-fill till column-specific last valid index
How do I go from:我如何 go 来自:
[In]: df = pd.DataFrame({
'col1': [100, np.nan, np.nan, 100, np.nan, np.nan],
'col2': [np.nan, 100, np.nan, np.nan, 100, np.nan]
})
df
[Out]: col1 col2
0 100 NaN
1 NaN 100
2 NaN NaN
3 100 NaN
4 NaN 100
5 NaN NaN
To:到:
[Out]: col1 col2
0 100 NaN
1 100 100
2 100 100
3 100 100
4 NaN 100
5 NaN NaN
My current approach is a to apply a custom method that works on one column at a time:我目前的方法是应用一次在一列上工作的自定义方法:
[In]: def ffill_last_valid(s):
last_valid = s.last_valid_index()
s = s.ffill()
s[s.index > last_valid] = np.nan
return s
df.apply(ffill_last_valid)
But it seems like an overkill to me.但这对我来说似乎有点矫枉过正。 Is there a one-liner that works on the dataframe directly?
是否有直接在 dataframe 上运行的单行程序?
Note on accepted answer:关于接受的答案的注释:
See the accepted answer from mozway
below.请参阅下面
mozway
接受的答案。
I know it's a tiny dataframe but:我知道这是一个很小的 dataframe 但是:
You can ffill
, then keep only the values before the last stretch of NaN with a combination of where
and notna
/reversed- cummax
:您可以
ffill
,然后仅保留最后一段 NaN 之前的值,结合使用where
和notna
/reversed cummax
:
out = df.ffill().where(df[::-1].notna().cummax())
variant:变体:
out = df.ffill().mask(df[::-1].isna().cummin())
Output: Output:
col1 col2
0 100.0 NaN
1 100.0 100.0
2 100.0 100.0
3 100.0 100.0
4 NaN 100.0
5 NaN NaN
interpolate
: interpolate
: In theory, df.interpolate(method='ffill', limit_area='inside')
should work, but while both options work as expected separately, for some reason it doesn't when combined (pandas 1.5.2).从理论上讲,
df.interpolate(method='ffill', limit_area='inside')
应该可以工作,但是虽然这两个选项分别按预期工作,但由于某种原因它在组合时不起作用(pandas 1.5.2)。 This works with df.interpolate(method='zero', limit_area='inside')
, though.不过,这适用于
df.interpolate(method='zero', limit_area='inside')
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.