简体   繁体   English

Python2.7:基于groupby第一行条件的子集数据帧

[英]Python2.7: Subset dataframe based on condition in first row of groupby

I would like to subset a pandas dataframe based on a condition which only the first row in the groupby is subjected to. 我想基于只有groupby中的第一行所经历的条件来对pandas数据帧进行子集化。

Dataframe is to be grouped by "name", "driverRef", "tyre", "stint" Dataframe将按“name”,“driverRef”,“tire”,“stint”分组

For eg, in the df below, because alonso started his stint 2 in position 12, i want to remove all of alonso's records from the df. 例如,在下面的df中,因为alonso在第12位开始了他的限制2,我想从df中删除所有alonso的记录。

    name                   driverRef stint  tyre      lap   pos     
0   Australian Grand Prix   alonso  1.0     Super soft  1   9        
1   Australian Grand Prix   alonso  1.0     Super soft  2   9        
2   Australian Grand Prix   alonso  1.0     Super soft  3   9       
3   Australian Grand Prix   alonso  2.0     Super soft  20   12        
4   Australian Grand Prix   alonso  2.0     Super soft  21   11     
5   Australian Grand Prix   alonso  2.0     Super soft  22   10       

Expected output: 预期产量:

    name                   driverRef stint  tyre      lap   pos     
0   Australian Grand Prix   alonso  1.0     Super soft  2   9        
1   Australian Grand Prix   alonso  1.0     Super soft  3   9        
2   Australian Grand Prix   alonso  1.0     Super soft  4   9        

I tried this, but it doesn't implemenent the effect correctly: 我试过这个,但它没有正确实现效果:

df.loc[df.groupby(['name', 'driverRef', 'tyre', 'stint']).first().reset_index()['position'].isin(list(range(1,11))).index]

EDIT: My code does work, but please see @jezrael's answer for a more succint/better way of writing. 编辑:我的代码确实有效,但请查看@ jezrael的答案,了解更多的写作/更好的写作方式。

You are really close, need transform for return Series with same length as original df : 你真的很接近,需要transform为与原始df长度相同的返回系列:

s = df.groupby(['name', 'driverRef', 'tyre', 'stint'])['pos'].transform('first')
print (s)
0     9
1     9
2     9
3    12
4    12
5    12
Name: pos, dtype: int64

df = df[s.isin(list(range(1,11)))]
print (df)
                    name driverRef  stint        tyre  lap  pos
0  Australian Grand Prix    alonso    1.0  Super soft    1    9
1  Australian Grand Prix    alonso    1.0  Super soft    2    9
2  Australian Grand Prix    alonso    1.0  Super soft    3    9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM