[英]pandas nlargest is returning more than n rows
I have a DataFrame
that looks like this: 我有一个看起来像这样的DataFrame
:
name value
date
2016-05-01 kelly 20
2016-05-05 john 12
2016-05-05 sarah 25
2016-05-05 george 3
2016-05-05 tom 40
2016-05-07 kara 24
2016-05-07 jane 90
2016-05-07 sally 39
2016-05-07 sam 28
I want to get the top 3 rows (according to value) preferably per date. 我想最好按日期获取前3行(根据值)。 I'm expecting something like this: 我期待这样的事情:
name value
date
2016-05-01 kelly 20
2016-05-05 john 12
2016-05-05 sarah 25
2016-05-05 tom 40
2016-05-07 jane 90
2016-05-07 sally 39
2016-05-07 sam 28
but I'm ok also with this: 但我也可以:
name value
date
2016-05-05 tom 40
2016-05-07 jane 90
2016-05-07 sally 39
I tried df.nlargest(3, 'value')
but I get this weird result: 我尝试了df.nlargest(3, 'value')
但是得到了这个奇怪的结果:
name value
date
2016-05-01 kelly 20
2016-05-01 kelly 20
2016-05-01 kelly 20
2016-05-05 tom 40
2016-05-05 tom 40
2016-05-05 tom 40
2016-05-05 sarah 25
2016-05-05 sarah 25
2016-05-05 sarah 25
2016-05-07 kara 24
2016-05-07 kara 24
...
2016-05-07 sally 39
2016-05-07 sally 39
2016-05-07 jane 90
2016-05-07 jane 90
2016-05-07 jane 90
I tried running it day by day: 我尝试每天运行它:
[df.ix[day].nlargest(3, 'value') for day in df.index.unique()]
but I got the same problem (each name is duplicated 3 times) 但是我遇到了同样的问题(每个名字重复了3次)
首先,这将完成工作:
df.sort_values('value', ascending=False).groupby(level=0).head(3).sort_index()
[:n]
slice of sort_values()
result 使用[:n]
sort_values()
结果切片 Use sort_values()
in descending mode and take the first n
results in a slice , then use sort_index()
to keep the days monotonically increasing . 在降序模式下使用sort_values()
,并在切片中获取前n
结果 ,然后使用sort_index()
来使天数单调增加 。
import pandas as pd
import cStringIO
df = pd.read_table(cStringIO.StringIO('''
date name value
2016-05-01 kelly 20
2016-05-05 john 12
2016-05-05 sarah 25
2016-05-05 george 3
2016-05-05 tom 40
2016-05-07 kara 24
2016-05-07 jane 90
2016-05-07 sally 39
2016-05-07 sam 28
'''), sep=' *', index_col=0, engine='python')
print 'Original DataFrame:'
print df
print
df_top3 = df.sort_values('value', ascending=False)[:3].sort_index()
print 'Top 3 Largest value DataFrame:'
print df_top3
print
Original DataFrame:
name value
date
2016-05-01 kelly 20
2016-05-05 john 12
2016-05-05 sarah 25
2016-05-05 george 3
2016-05-05 tom 40
2016-05-07 kara 24
2016-05-07 jane 90
2016-05-07 sally 39
2016-05-07 sam 28
Top 3 Largest value DataFrame:
name value
date
2016-05-05 tom 40
2016-05-07 jane 90
2016-05-07 sally 39
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.