find first non NaN value in shift pandas

Question

I have a following issue. I would like to compute lag of a column in my df. However, I have a condition that the lagged value cannot my nan. See example bellow:

import numpy as np

d = {'col1': [1, 2, 10, 5, 3, 2], 'col2': [3, 4, np.nan, np.nan, 23, 42]}
df = pd.DataFrame(data=d)

when I try this:

df["col2_lag"] = df["col2"].shift(1)

I got this result:

   col1  col2  col2_lag
0     1   3.0       NaN
1     2   4.0       3.0
2    10   NaN       4.0
3     5   NaN       NaN
4     3  23.0       NaN
5     2  42.0      23.0

However, desired output is this:

   col1  col2  col2_lag
0     1   3.0       NaN
1     2   4.0       3.0
2    10   NaN       4.0
3     5   NaN       4.0 #because we skip NaN and find first non NaN
4     3  23.0       4.0 #because we skip NaN and find first non NaN
5     2  42.0      23.0

Is there and elegant way, how to do this? Ideally without writting my own function. Thanks

Answer 1

Use ffill:

 df["col2_lag"] = df["col2"].shift(1).ffill()

find first non NaN value in shift pandas

Question

1 answers

solution1
1 ACCPTED 2021-04-30 14:17:39

find first non NaN value in shift pandas

Question

1 answers

solution1 1 ACCPTED 2021-04-30 14:17:39

solution1
1 ACCPTED 2021-04-30 14:17:39