简体   繁体   中英

Python Dataframes: Filter a dataframe according to groupby condition

hi I have a dataframe like below:

ID    date          
1     01.01.2017        
1     01.01.2017        
1     01.04.2017        
2     01.01.2017        
2     01.01.2017        
2     01.02.2017       

What I want is to filter the id's which the related min and max of the dates' difference is 3 days. The final dataframe should be like this since only id 1 matches the condition:

ID    date          
1     01.01.2017        
1     01.01.2017        
1     01.04.2017 

Thank you.

You can create a mask and then use it as a filter:

import pandas as pd

# create sample data-frame
data = [[1, '01.01.2017'], [1, '01.01.2017'], [1, '01.04.2017'],
        [2, '01.01.2017'], [2, '01.01.2017'], [2, '01.02.2017']]
df = pd.DataFrame(data=data, columns=['id', 'date'])
df['date'] = pd.to_datetime(df.date)

# create mask
mask = df.groupby('id')['date'].transform(lambda x: (x.max() - x.min()).days == 3)

# filter
result = df[mask]

print(result)

Output

   id       date
0   1 2017-01-01
1   1 2017-01-01
2   1 2017-01-04

You can use Groupby.filter with a custom lambda function to check if the difference between the maximum date and the minimum is of 3 days:

d = datetime.timedelta(days=3)
df.groupby('ID').date.filter(lambda x: (x.max() - x.min()) == d)

ID
1   2017-01-01
1   2017-01-01
1   2017-01-04
Name: date, dtype: datetime64[ns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM