[英]Python2.7: Subset dataframe based on condition in first row of groupby
I would like to subset a pandas dataframe based on a condition which only the first row in the groupby is subjected to. 我想基于只有groupby中的第一行所经历的条件来对pandas数据帧进行子集化。
Dataframe is to be grouped by "name", "driverRef", "tyre", "stint" Dataframe将按“name”,“driverRef”,“tire”,“stint”分组
For eg, in the df below, because alonso started his stint 2 in position 12, i want to remove all of alonso's records from the df. 例如,在下面的df中,因为alonso在第12位开始了他的限制2,我想从df中删除所有alonso的记录。
name driverRef stint tyre lap pos
0 Australian Grand Prix alonso 1.0 Super soft 1 9
1 Australian Grand Prix alonso 1.0 Super soft 2 9
2 Australian Grand Prix alonso 1.0 Super soft 3 9
3 Australian Grand Prix alonso 2.0 Super soft 20 12
4 Australian Grand Prix alonso 2.0 Super soft 21 11
5 Australian Grand Prix alonso 2.0 Super soft 22 10
Expected output: 预期产量:
name driverRef stint tyre lap pos
0 Australian Grand Prix alonso 1.0 Super soft 2 9
1 Australian Grand Prix alonso 1.0 Super soft 3 9
2 Australian Grand Prix alonso 1.0 Super soft 4 9
I tried this, but it doesn't implemenent the effect correctly: 我试过这个,但它没有正确实现效果:
df.loc[df.groupby(['name', 'driverRef', 'tyre', 'stint']).first().reset_index()['position'].isin(list(range(1,11))).index]
EDIT: My code does work, but please see @jezrael's answer for a more succint/better way of writing. 编辑:我的代码确实有效,但请查看@ jezrael的答案,了解更多的写作/更好的写作方式。
You are really close, need transform
for return Series with same length as original df
: 你真的很接近,需要
transform
为与原始df
长度相同的返回系列:
s = df.groupby(['name', 'driverRef', 'tyre', 'stint'])['pos'].transform('first')
print (s)
0 9
1 9
2 9
3 12
4 12
5 12
Name: pos, dtype: int64
df = df[s.isin(list(range(1,11)))]
print (df)
name driverRef stint tyre lap pos
0 Australian Grand Prix alonso 1.0 Super soft 1 9
1 Australian Grand Prix alonso 1.0 Super soft 2 9
2 Australian Grand Prix alonso 1.0 Super soft 3 9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.