[英]Pandas create new dataframe choosing max value from multiple observations
I would like to make a new dataframe based on a max value from a column. 我想基于列的最大值创建一个新的数据框。 However I have multiple observations from the same respondent and I only want to select the maximum value from the column value1 for each respondent. 但是,我从同一个响应者那里得到了多个观察结果,我只想从每个响应者的列value1中选择最大值。 Here is a simplifyed example: 这是一个简化的示例:
df: DF:
respondent value1 value2
0 1 3 12
1 1 5 34
2 1 1 43
3 2 4 12
4 2 6 34
5 2 9 54
6 3 2 32
7 3 1 2
8 3 3 21
Here is the result I would like to have: 这是我想要的结果:
newdf: newdf:
respondent value1 value2
0 1 5 34
1 2 9 54
2 3 3 21
Any ideas? 有任何想法吗?
The following achieves what you want and appears to be faster than @CT Zhu's answer: 以下实现了您想要的目标,并且看起来比@CT Zhu的答案要快:
In [30]:
df.loc[df.groupby('respondent').value1.idxmax().values]
Out[30]:
respondent value1 value2
1 1 5 34
5 2 9 54
8 3 3 21
In [31]:
%timeit df.loc[df.groupby('respondent').value1.idxmax().values]
%timeit df[df.groupby('respondent').value1.transform(lambda x: x==x.max())]
%timeit df.sort(['respondent', 'value1'], ascending=[1,0]).groupby('respondent').head(1)
100 loops, best of 3: 1.76 ms per loop
100 loops, best of 3: 2.99 ms per loop
100 loops, best of 3: 4.42 ms per loop
Also the above was achieved on pandas version 0.12.0 64-bit using python 3.3 上面的内容也是使用python 3.3在熊猫版本0.12.0 64位上实现的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.