簡體   English   中英

Python 過濾具有多列條件的行

[英]Python filter row with multiple columns conditions

我有一個 CSV 數據集,我需要使用條件對其進行過濾,但問題是條件可以持續多天。 我想要的是保留此條件的最后一個真實值。

我的數據集看起來像這樣

Date           City             Summary                  No.
2-18-2019       NY            Airplane land              23
2-18-2019     London          Cargo handling              4
2-18-2019      Dubai          Airplane land              92
2-19-2019      Dubai          Airplane stay              92
2-19-2019      Paris          Flight canceled            78
2-19-2019       LA            Airplane Land              7
2-20-2019      Dubai          Airplane land              92
2-20-2019       LA            Airplane land              29
2-20-2019       NY            Airplane left              23
2-21-2019      Paris          Airplane reschedule        78
2-21-2019      London         Airplane land              4
2-21-2019       LA            Airplane from NY land      29
~~~
3-10-2019      London         Airplane land              5
3-10-2019      Paris          Airplane Land              78
3-10-2019       LA            Reschedule                 29
3-11-2019       NY            Cargo handled              23
3-11-2019      Dubai          Arrived be4 2 days         34
~~~
3-21-2019      Dubai          Airplane land              92
3-21-2019     New Delhi       Reschedule                 9
3-21-2019      London         Cargo handling             5
3-22-2019     New Delhi       Airplane Land              9
3-22-2019       NY            Reschedule                 23
3-22-2019      Dubai          Airplane land              35

因此代碼應該為我們提供飛機着陸的最后一個條目,其中City == CityNo. == No. ,正如您所見,這種情況可能會持續數天。 我想要的是檢查條件是否為真兩天,然后保留最后一天。

所需的 output 應類似於以下數據集:

Date           City             Summary                  No.
2-18-2019       NY            Airplane land              23
2-19-2019       LA            Airplane Land              7
2-20-2019      Dubai          Airplane land              92
2-21-2019      London         Airplane land              4
2-21-2019       LA            Airplane from NY land      29
~~~
3-10-2019      London         Airplane land              5
3-10-2019      Paris          Airplane Land              78
~~~
3-21-2019      Dubai          Airplane land              92
3-22-2019     New Delhi       Airplane Land              9
3-22-2019      Dubai          Airplane land              35

我的代碼在下面,但它不起作用


import pandas as pd
import openpyxl
import numpy as np
import io
from datetime import timedelta

df = pd.read_csv(r"C:\Airplanes.csv")

pd.set_option('display.max_columns', 500)
df = df.astype(str)



count = df.groupby(['City', 'No.'])['No.'].transform('size')



df['Date'] = pd.to_datetime(df['Date'])

df = df[(df.Summary.str.contains('Airplane ') & df.Summary.str.contains('Land'))]


def filter(grp):
    a = grp.Date + timedelta(days=2)
    return grp[~grp.Date.isin(a)]

df.groupby(['City']).apply(filter).reset_index(drop=True)


export_excel = df.to_excel(r'C:\MS.xlsx', index=None, header=True)

請幫忙修復

我認為你需要:

#convert to datetimes
df['Date'] = pd.to_datetime(df['Date'])

#filter case non sensitive
df=df[(df.Summary.str.contains('Airplane ') & df.Summary.str.contains('Land', case=False))]

#mask for match if exist dates with subtract one day
m = df['Date'].isin(df['Date'] - pd.Timedelta(days=1))

#filter out duplicates if exist previous days
df = df[(m & ~df['Date'].duplicated()) | ~m]
print (df)
         Date       City                Summary  No.
0  2019-02-18         NY          Airplane land   23
5  2019-02-19         LA          Airplane Land    7
6  2019-02-20      Dubai          Airplane land   92
10 2019-02-21     London          Airplane land    4
11 2019-02-21         LA  Airplane from NY land   29
12 2019-03-10     London          Airplane land    5
13 2019-03-10      Paris          Airplane Land   78
17 2019-03-21      Dubai          Airplane land   92
20 2019-03-22  New Delhi          Airplane Land    9
22 2019-03-22      Dubai          Airplane land   92

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM