简体   繁体   中英

Python2.7: Compare within DataFrame groups and filter based on condition

I have a pandas dataframe which i plan to group by 'name', 'driverRef', 'tyre' and filter only groups which have similar values in one column.

Within the group, all rows have the same value in that column.

Similar is defined as at most a range of 3 in difference between values. eg. if the unique numbers in the column are 5, 10, 12, 13, only groups with 10,12,13 are kept.

EDIT: The similarity criteria i planned originally is ambiguous, i have changed it to simply the mode of the group.

    name                   driverRef stint  tyre      lap   stint length     
0   Australian Grand Prix   ham     1.0     Super soft  1    5      
1   Australian Grand Prix   vettel  1.0     Super soft  2    10       
2   Australian Grand Prix   bottas  1.0     Super soft  3    10      
3   Australian Grand Prix   alonso  2.0     Super soft  20   13        
4   Australian Grand Prix   alonso  2.0     Super soft  21   13  
5   Australian Grand Prix   alonso  2.0     Super soft  22   13  
6   Bahrain Grand Prix   ham     1.0     Super soft  1    5      
7   Bahrain Grand Prix   vettel  1.0     Super soft  2    6       
8   Bahrain Grand Prix   bottas  1.0     Super soft  3    6      
9   Bahrain Grand Prix   alonso  2.0     Super soft  20   13        
10  Bahrain Grand Prix   alonso  2.0     Super soft  21   13  
11  Bahrain Grand Prix   alonso  2.0     Super soft  22   13 

Expected output:

    name                   driverRef stint  tyre      lap   stint length         

4   Australian Grand Prix   alonso  2.0     Super soft  21   13  
5   Australian Grand Prix   alonso  2.0     Super soft  22   13  
9   Bahrain Grand Prix   alonso  2.0     Super soft  20   13        
10  Bahrain Grand Prix   alonso  2.0     Super soft  21   13  
11  Bahrain Grand Prix   alonso  2.0     Super soft  22   13   

I believe you need:

s = df.groupby(['name','tyre'])['stint length'].transform(lambda x: x.mode().iat[0])
#alternative
#s=df.groupby(['name','tyre'])['stint length'].transform(lambda x:x.value_counts().index[0])

df = df[df['stint length'] == s]
print (df)
                     name driverRef  stint        tyre  lap  stint length
3   Australian Grand Prix    alonso    2.0  Super soft   20            13
4   Australian Grand Prix    alonso    2.0  Super soft   21            13
5   Australian Grand Prix    alonso    2.0  Super soft   22            13
9      Bahrain Grand Prix    alonso    2.0  Super soft   20            13
10     Bahrain Grand Prix    alonso    2.0  Super soft   21            13
11     Bahrain Grand Prix    alonso    2.0  Super soft   22            13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM