简体   繁体   English

Dataframe 向前填充直到列特定的最后一个有效索引

[英]Dataframe forward-fill till column-specific last valid index

How do I go from:我如何 go 来自:

[In]:   df = pd.DataFrame({
            'col1': [100, np.nan, np.nan, 100, np.nan, np.nan],
            'col2': [np.nan, 100, np.nan, np.nan, 100, np.nan]
        })
        df

[Out]:        col1    col2
        0      100     NaN
        1      NaN     100
        2      NaN     NaN
        3      100     NaN
        4      NaN     100
        5      NaN     NaN

To:到:

[Out]:        col1    col2
        0      100     NaN
        1      100     100
        2      100     100
        3      100     100
        4      NaN     100
        5      NaN     NaN

My current approach is a to apply a custom method that works on one column at a time:我目前的方法是应用一次在一列上工作的自定义方法:

[In]:   def ffill_last_valid(s):
            last_valid = s.last_valid_index()
            s = s.ffill()
            s[s.index > last_valid] = np.nan
            return s

        df.apply(ffill_last_valid)

But it seems like an overkill to me.但这对我来说似乎有点矫枉过正。 Is there a one-liner that works on the dataframe directly?是否有直接在 dataframe 上运行的单行程序?


Note on accepted answer:关于接受的答案的注释:

See the accepted answer from mozway below.请参阅下面mozway接受的答案。

I know it's a tiny dataframe but:我知道这是一个很小的 dataframe 但是:

在此处输入图像描述

You can ffill , then keep only the values before the last stretch of NaN with a combination of where and notna /reversed- cummax :您可以ffill ,然后仅保留最后一段 NaN 之前的值,结合使用wherenotna /reversed cummax

out = df.ffill().where(df[::-1].notna().cummax())

variant:变体:

out = df.ffill().mask(df[::-1].isna().cummin())

Output: Output:

    col1   col2
0  100.0    NaN
1  100.0  100.0
2  100.0  100.0
3  100.0  100.0
4    NaN  100.0
5    NaN    NaN

interpolate : interpolate

In theory, df.interpolate(method='ffill', limit_area='inside') should work, but while both options work as expected separately, for some reason it doesn't when combined (pandas 1.5.2).从理论上讲, df.interpolate(method='ffill', limit_area='inside')应该可以工作,但是虽然这两个选项分别按预期工作,但由于某种原因它在组合时不起作用(pandas 1.5.2)。 This works with df.interpolate(method='zero', limit_area='inside') , though.不过,这适用于df.interpolate(method='zero', limit_area='inside')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM