简体   繁体   中英

How to select a row by checking columns value and its previous value

There is a column with the name decimal. I would like to have only those rows which have decimal 3 and its previous value as 4. I need to put if Decimal==3 and Decimal-1==4 Below is the table and the next table is the expected output

I have created a sample dataframe using Pandas to show how my data looks

import pandas as pd

df=pd.DataFrame([{'Unnamed: 0': 0, 'A_O': 1792, 'A_1': 0, 'Decimal': 7, 'A_Bin': 111}, 
                 {'Unnamed: 0': 1, 'A_O': 512, 'A_1': 128, 'Decimal': 2, 'A_Bin': 10}, 
                 {'Unnamed: 0': 2, 'A_O': 1024, 'A_1': 57778, 'Decimal': 4, 'A_Bin': 100}, 
                 {'Unnamed: 0': 3, 'A_O': 768, 'A_1': 65491, 'Decimal': 3, 'A_Bin': 11}, 
                 {'Unnamed: 0': 4, 'A_O': 1280, 'A_1': 6039, 'Decimal': 5, 'A_Bin': 101}, 
                 {'Unnamed: 0': 5, 'A_O': 1536, 'A_1': 9, 'Decimal': 6, 'A_Bin': 110}, 
                 {'Unnamed: 0': 6, 'A_O': 1792, 'A_1': 0, 'Decimal': 7, 'A_Bin': 111}, 
                 {'Unnamed: 0': 7, 'A_O': 512, 'A_1': 129, 'Decimal': 2, 'A_Bin': 10}, 
                 {'Unnamed: 0': 8, 'A_O': 1024, 'A_1': 33550, 'Decimal': 4, 'A_Bin': 100}, 
                 {'Unnamed: 0': 9, 'A_O': 768, 'A_1': 15196, 'Decimal': 3, 'A_Bin': 11}, 
                 {'Unnamed: 0': 10, 'A_O': 1280, 'A_1': 9495, 'Decimal': 5, 'A_Bin': 101}, 
                 {'Unnamed: 0': 11, 'A_O': 1536, 'A_1': 9, 'Decimal': 6, 'A_Bin': 110}, 
                 {'Unnamed: 0': 12, 'A_O': 1792, 'A_1': 0, 'Decimal': 7, 'A_Bin': 111}, 
                 {'Unnamed: 0': 13, 'A_O': 512, 'A_1': 130, 'Decimal': 2, 'A_Bin': 10}, 
                 {'Unnamed: 0': 14, 'A_O': 1024, 'A_1': 8686, 'Decimal': 4, 'A_Bin': 100}, 
                 {'Unnamed: 0': 15, 'A_O': 768, 'A_1': 32768, 'Decimal': 3, 'A_Bin': 11}, 
                 {'Unnamed: 0': 16, 'A_O': 1280, 'A_1': 12855, 'Decimal': 5, 'A_Bin': 101}, 
                 {'Unnamed: 0': 17, 'A_O': 1536, 'A_1': 9, 'Decimal': 6, 'A_Bin': 110}, 
                 {'Unnamed: 0': 18, 'A_O': 1792, 'A_1': 0, 'Decimal': 7, 'A_Bin': 111}, 
                 {'Unnamed: 0': 19, 'A_O': 512, 'A_1': 131, 'Decimal': 2, 'A_Bin': 10}])
df

I would like to have the output as below

import pandas as pd

df=pd.DataFrame([{'Unnamed: 0': 2, 'A_O': 1024, 'A_1': 57778, 'Decimal': 4, 'A_Bin': 100}, 
                 {'Unnamed: 0': 3, 'A_O': 768, 'A_1': 65491, 'Decimal': 3, 'A_Bin': 11}, 
                 {'Unnamed: 0': 8, 'A_O': 1024, 'A_1': 33550, 'Decimal': 4, 'A_Bin': 100}, 
                 {'Unnamed: 0': 9, 'A_O': 768, 'A_1': 15196, 'Decimal': 3, 'A_Bin': 11}, 
                 {'Unnamed: 0': 14, 'A_O': 1024, 'A_1': 8686, 'Decimal': 4, 'A_Bin': 100}, 
                 {'Unnamed: 0': 15, 'A_O': 768, 'A_1': 32768, 'Decimal': 3, 'A_Bin': 11}])
df

below is the picture of data and required output:-

在此处输入图像描述

Required output as below:-

在此处输入图像描述

I have created the same using pandas in the above code. So you may use the code to recreate the table of data and required output. Kindly help

First I create a shifted and lagged column

df['shift_Dec'] = df['Decimal'].shift(1)
df['lag_Dec'] = df['Decimal'].shift(-1)

Then we create a bool column with a 1 where each Decimal 3 value is preceded by a 4 and where each Decimal 4 value has a 3 following.

df['bool'] = df.apply(lambda row: 1 if (row['Decimal'] == 3) & (row['shift_Dec']==4) else (1 if (row['Decimal'] == 4) & (row['lag_Dec']==3) else 0),
    axis=1)

Then finally just filter for the bool = 1

df[df['bool']==1][['Unnamed: 0','A_O','A_1','Decimal','A_Bin']]

在此处输入图像描述

Use boolean indexing with chained both mask with shifting columns by Series.shift with & for bitwise AND and then both mask by | for bitwise OR :

m1 = df['Decimal'].eq(4) & df['Decimal'].shift(-1).eq(3)
m2 = df['Decimal'].eq(3) & df['Decimal'].shift().eq(4)

df2 = df[m1 | m2].drop('Unnamed: 0', axis=1)
print (df2)
     A_O    A_1  Decimal  A_Bin
2   1024  57778        4    100
3    768  65491        3     11
8   1024  33550        4    100
9    768  15196        3     11
14  1024   8686        4    100
15   768  32768        3     11
df2 = pd.DataFrame(columns=df.columns)

pattern = [4,3]

for i in range(len(pattern), len(df)):
    if list(df['Decimal'][i-len(pattern):i].values) == pattern:
        df2 = df2.append(df[i-len(pattern):i])

print(df2)

Output

2   57778   100  1024       4          2
3   65491    11   768       3          3
8   33550   100  1024       4          8
9   15196    11   768       3          9
14   8686   100  1024       4         14
15  32768    11   768       3         15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM