简体   繁体   English

Python2.7:在DataFrame组中进行比较并根据条件进行过滤

[英]Python2.7: Compare within DataFrame groups and filter based on condition

I have a pandas dataframe which i plan to group by 'name', 'driverRef', 'tyre' and filter only groups which have similar values in one column. 我有一个熊猫数据框,我打算按“名称”,“ driverRef”,“轮胎”进行分组,并只过滤一列中具有相似值的组。

Within the group, all rows have the same value in that column. 在该组中,所有行在该列中具有相同的值。

Similar is defined as at most a range of 3 in difference between values. 相似被定义为值之间的差异最大为3的范围。 eg. 例如。 if the unique numbers in the column are 5, 10, 12, 13, only groups with 10,12,13 are kept. 如果列中的唯一数字是5、10、12、13,则仅保留具有10、12、13的组。

EDIT: The similarity criteria i planned originally is ambiguous, i have changed it to simply the mode of the group. 编辑:我最初计划的相似性标准是模棱两可的,我已将其更改为简单的小组模式。

    name                   driverRef stint  tyre      lap   stint length     
0   Australian Grand Prix   ham     1.0     Super soft  1    5      
1   Australian Grand Prix   vettel  1.0     Super soft  2    10       
2   Australian Grand Prix   bottas  1.0     Super soft  3    10      
3   Australian Grand Prix   alonso  2.0     Super soft  20   13        
4   Australian Grand Prix   alonso  2.0     Super soft  21   13  
5   Australian Grand Prix   alonso  2.0     Super soft  22   13  
6   Bahrain Grand Prix   ham     1.0     Super soft  1    5      
7   Bahrain Grand Prix   vettel  1.0     Super soft  2    6       
8   Bahrain Grand Prix   bottas  1.0     Super soft  3    6      
9   Bahrain Grand Prix   alonso  2.0     Super soft  20   13        
10  Bahrain Grand Prix   alonso  2.0     Super soft  21   13  
11  Bahrain Grand Prix   alonso  2.0     Super soft  22   13 

Expected output: 预期产量:

    name                   driverRef stint  tyre      lap   stint length         

4   Australian Grand Prix   alonso  2.0     Super soft  21   13  
5   Australian Grand Prix   alonso  2.0     Super soft  22   13  
9   Bahrain Grand Prix   alonso  2.0     Super soft  20   13        
10  Bahrain Grand Prix   alonso  2.0     Super soft  21   13  
11  Bahrain Grand Prix   alonso  2.0     Super soft  22   13   

I believe you need: 我相信您需要:

s = df.groupby(['name','tyre'])['stint length'].transform(lambda x: x.mode().iat[0])
#alternative
#s=df.groupby(['name','tyre'])['stint length'].transform(lambda x:x.value_counts().index[0])

df = df[df['stint length'] == s]
print (df)
                     name driverRef  stint        tyre  lap  stint length
3   Australian Grand Prix    alonso    2.0  Super soft   20            13
4   Australian Grand Prix    alonso    2.0  Super soft   21            13
5   Australian Grand Prix    alonso    2.0  Super soft   22            13
9      Bahrain Grand Prix    alonso    2.0  Super soft   20            13
10     Bahrain Grand Prix    alonso    2.0  Super soft   21            13
11     Bahrain Grand Prix    alonso    2.0  Super soft   22            13

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM