简体   繁体   中英

manipulating pandas dataframe - conditional

I have a pandas dataframe that looks like this:

ID    Date         Event_Type
1     01/01/2019   A
1     01/01/2019   B
2     02/01/2019   A
3     02/01/2019   A

I want to be left with:

ID    Date
1     01/01/2019
2     02/01/2019
3     02/01/2019

Where my condition is: If the ID is the same AND the dates are within 2 days of each other then drop one of the rows. If however the dates are more than 2 days apart then keep both rows.

How do I do this?

I believe you need first convert values to datetimes by to_datetime , then get diff and get first values per groups by isnull() chained with comparing if next values are higher like timedelta treshold:

df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')

s =  df.groupby('ID')['Date'].diff()
df = df[(s.isnull() | (s > pd.Timedelta(2, 'd')))]
print (df)
   ID       Date Event_Type
0   1 2019-01-01          A
2   2 2019-02-01          A
3   3 2019-02-01          A

Check solution with another data:

print (df)
   ID        Date Event_Type
0   1  01/01/2019          A
1   1  04/01/2019          B <-difference 3 days
2   2  02/01/2019          A
3   3  02/01/2019          A

df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')

s =  df.groupby('ID')['Date'].diff()
df = df[(s.isnull() | (s > pd.Timedelta(2, 'd')))]
print (df)
   ID       Date Event_Type
0   1 2019-01-01          A
1   1 2019-01-04          B
2   2 2019-01-02          A
3   3 2019-01-02          A

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM