简体   繁体   中英

Python filter row with multiple columns conditions

I have a CSV dataset and I need to filter it with conditions but the problem is that the condition can be true for multiple days. What I want is to keep the last true value for this condition.

My dataset looks like this

Date           City             Summary                  No.
2-18-2019       NY            Airplane land              23
2-18-2019     London          Cargo handling              4
2-18-2019      Dubai          Airplane land              92
2-19-2019      Dubai          Airplane stay              92
2-19-2019      Paris          Flight canceled            78
2-19-2019       LA            Airplane Land              7
2-20-2019      Dubai          Airplane land              92
2-20-2019       LA            Airplane land              29
2-20-2019       NY            Airplane left              23
2-21-2019      Paris          Airplane reschedule        78
2-21-2019      London         Airplane land              4
2-21-2019       LA            Airplane from NY land      29
~~~
3-10-2019      London         Airplane land              5
3-10-2019      Paris          Airplane Land              78
3-10-2019       LA            Reschedule                 29
3-11-2019       NY            Cargo handled              23
3-11-2019      Dubai          Arrived be4 2 days         34
~~~
3-21-2019      Dubai          Airplane land              92
3-21-2019     New Delhi       Reschedule                 9
3-21-2019      London         Cargo handling             5
3-22-2019     New Delhi       Airplane Land              9
3-22-2019       NY            Reschedule                 23
3-22-2019      Dubai          Airplane land              35

So the code should give us the last entry for plane landing where City == City and No. == No. and as you can see this condition can be true for multiple days. what I want is to check if the condition is true for two days then keep the last.

The desired output should look like the dataset below:

Date           City             Summary                  No.
2-18-2019       NY            Airplane land              23
2-19-2019       LA            Airplane Land              7
2-20-2019      Dubai          Airplane land              92
2-21-2019      London         Airplane land              4
2-21-2019       LA            Airplane from NY land      29
~~~
3-10-2019      London         Airplane land              5
3-10-2019      Paris          Airplane Land              78
~~~
3-21-2019      Dubai          Airplane land              92
3-22-2019     New Delhi       Airplane Land              9
3-22-2019      Dubai          Airplane land              35

My code is below but it doesn't work


import pandas as pd
import openpyxl
import numpy as np
import io
from datetime import timedelta

df = pd.read_csv(r"C:\Airplanes.csv")

pd.set_option('display.max_columns', 500)
df = df.astype(str)



count = df.groupby(['City', 'No.'])['No.'].transform('size')



df['Date'] = pd.to_datetime(df['Date'])

df = df[(df.Summary.str.contains('Airplane ') & df.Summary.str.contains('Land'))]


def filter(grp):
    a = grp.Date + timedelta(days=2)
    return grp[~grp.Date.isin(a)]

df.groupby(['City']).apply(filter).reset_index(drop=True)


export_excel = df.to_excel(r'C:\MS.xlsx', index=None, header=True)

Please help to fix it

I think you need:

#convert to datetimes
df['Date'] = pd.to_datetime(df['Date'])

#filter case non sensitive
df=df[(df.Summary.str.contains('Airplane ') & df.Summary.str.contains('Land', case=False))]

#mask for match if exist dates with subtract one day
m = df['Date'].isin(df['Date'] - pd.Timedelta(days=1))

#filter out duplicates if exist previous days
df = df[(m & ~df['Date'].duplicated()) | ~m]
print (df)
         Date       City                Summary  No.
0  2019-02-18         NY          Airplane land   23
5  2019-02-19         LA          Airplane Land    7
6  2019-02-20      Dubai          Airplane land   92
10 2019-02-21     London          Airplane land    4
11 2019-02-21         LA  Airplane from NY land   29
12 2019-03-10     London          Airplane land    5
13 2019-03-10      Paris          Airplane Land   78
17 2019-03-21      Dubai          Airplane land   92
20 2019-03-22  New Delhi          Airplane Land    9
22 2019-03-22      Dubai          Airplane land   92

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM