简体   繁体   中英

What is the best way to check if the last rows of a pandas dataframe meet a condition?

I got stuck trying to create a new column that is a check column based on the 'signal' column. If the last five rows(including the last) are 1, it would return 1, If the last five rows(including the last) are 0, it would return 0, everything else would be the last value of check, like this:

I have the following data frame:

       signal
index        
0           1
1           1
2           1
3           1
4           1
5           1
6           0
7           0
8           0
9           0
10          0
11          0
12          0
13          1
14          0
15          1
16          1
17          1
18          1
19          1

I'd like something like this:

       signal  check
index                
0           1       1
1           1       1
2           1       1
3           1       1
4           1       1
5           1       1
6           0       1
7           0       1
8           0       1
9           0       1
10          0       0
11          0       0
12          0       0
13          1       0
14          0       0
15          1       0
16          1       0
17          1       0
18          1       0
19          1       1

I would appreciate any kind of help!

Thank you!

You want to use a rolling window over your dataframe, followed by fillna :

def allSame(x):
    if (x == 1).all():
        return 1.0
    elif (x == 0).all():
        return 0.0
    else:
        return np.nan

df['signal'] = df.rolling(5).apply(allSame, raw=False).fillna(method="ffill")

rolling returns a rolling window object over a number of elements (5 in this case). The window object is similar to a dataframe, but instead of having rows it has windows over the original dataframe's rows. We can use its apply method to convert each rolling window to a value, converting the rolling window object to a dataframe. The apply method takes a function that can convert from an ndarray to an appropriate output value.

Here we pass to apply a function that returns 1 or 0 if the 5 rows in the window are all 1 or 0 respectively, and otherwise returns NaN. As a result we get a new dataframe with values that are either 1, 0, or NaN. We then use fillna on this dataframe to overwrite the NaN values with the first preceeding 1 or 0 value. Finally, we merge the resulting dataframe back into the original dataframe, creating the "signal" column.

Try rolling.sum to slice df and ffill , bfill

df['check'] = df[df.rolling(5).sum().isin([0, 5])].ffill().bfill()

Out[540]:
       signal  check
index
0           1    1.0
1           1    1.0
2           1    1.0
3           1    1.0
4           1    1.0
5           1    1.0
6           0    1.0
7           0    1.0
8           0    1.0
9           0    1.0
10          0    0.0
11          0    0.0
12          0    0.0
13          1    0.0
14          0    0.0
15          1    0.0
16          1    0.0
17          1    0.0
18          1    0.0
19          1    1.0

If you want check as integer, just chain addition astype(int)

df['check'] = df[df.rolling(5).sum().isin([0, 5])].ffill().bfill().astype(int)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM