Python2.7: Compare within DataFrame groups and filter based on condition

Question

I have a pandas dataframe which i plan to group by 'name', 'driverRef', 'tyre' and filter only groups which have similar values in one column.

Within the group, all rows have the same value in that column.

~~Similar is defined as at most a range of 3 in difference between values.~~ ~~eg.~~ ~~if the unique numbers in the column are 5, 10, 12, 13, only groups with 10,12,13 are kept.~~

EDIT: The similarity criteria i planned originally is ambiguous, i have changed it to simply the mode of the group.

    name                   driverRef stint  tyre      lap   stint length     
0   Australian Grand Prix   ham     1.0     Super soft  1    5      
1   Australian Grand Prix   vettel  1.0     Super soft  2    10       
2   Australian Grand Prix   bottas  1.0     Super soft  3    10      
3   Australian Grand Prix   alonso  2.0     Super soft  20   13        
4   Australian Grand Prix   alonso  2.0     Super soft  21   13  
5   Australian Grand Prix   alonso  2.0     Super soft  22   13  
6   Bahrain Grand Prix   ham     1.0     Super soft  1    5      
7   Bahrain Grand Prix   vettel  1.0     Super soft  2    6       
8   Bahrain Grand Prix   bottas  1.0     Super soft  3    6      
9   Bahrain Grand Prix   alonso  2.0     Super soft  20   13        
10  Bahrain Grand Prix   alonso  2.0     Super soft  21   13  
11  Bahrain Grand Prix   alonso  2.0     Super soft  22   13

Expected output:

    name                   driverRef stint  tyre      lap   stint length         

4   Australian Grand Prix   alonso  2.0     Super soft  21   13  
5   Australian Grand Prix   alonso  2.0     Super soft  22   13  
9   Bahrain Grand Prix   alonso  2.0     Super soft  20   13        
10  Bahrain Grand Prix   alonso  2.0     Super soft  21   13  
11  Bahrain Grand Prix   alonso  2.0     Super soft  22   13

Answer 1

I believe you need:

s = df.groupby(['name','tyre'])['stint length'].transform(lambda x: x.mode().iat[0])
#alternative
#s=df.groupby(['name','tyre'])['stint length'].transform(lambda x:x.value_counts().index[0])

df = df[df['stint length'] == s]
print (df)
                     name driverRef  stint        tyre  lap  stint length
3   Australian Grand Prix    alonso    2.0  Super soft   20            13
4   Australian Grand Prix    alonso    2.0  Super soft   21            13
5   Australian Grand Prix    alonso    2.0  Super soft   22            13
9      Bahrain Grand Prix    alonso    2.0  Super soft   20            13
10     Bahrain Grand Prix    alonso    2.0  Super soft   21            13
11     Bahrain Grand Prix    alonso    2.0  Super soft   22            13

Python2.7: Compare within DataFrame groups and filter based on condition

Question

1 answers

solution1
0 2018-02-25 14:13:05

Python2.7: Compare within DataFrame groups and filter based on condition

Question

1 answers

solution1 0 2018-02-25 14:13:05

solution1
0 2018-02-25 14:13:05