简体   繁体   中英

find first non NaN value in shift pandas

I have a following issue. I would like to compute lag of a column in my df. However, I have a condition that the lagged value cannot my nan. See example bellow:

import numpy as np

d = {'col1': [1, 2, 10, 5, 3, 2], 'col2': [3, 4, np.nan, np.nan, 23, 42]}
df = pd.DataFrame(data=d)

when I try this:

df["col2_lag"] = df["col2"].shift(1)

I got this result:

   col1  col2  col2_lag
0     1   3.0       NaN
1     2   4.0       3.0
2    10   NaN       4.0
3     5   NaN       NaN
4     3  23.0       NaN
5     2  42.0      23.0

However, desired output is this:

   col1  col2  col2_lag
0     1   3.0       NaN
1     2   4.0       3.0
2    10   NaN       4.0
3     5   NaN       4.0 #because we skip NaN and find first non NaN
4     3  23.0       4.0 #because we skip NaN and find first non NaN
5     2  42.0      23.0

Is there and elegant way, how to do this? Ideally without writting my own function. Thanks

Use ffill:

 df["col2_lag"] = df["col2"].shift(1).ffill()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM