[英]pd.sort_values not doing what it should
I have a csv file which I've already imported using df = pd.read_csv("af.csv")
我有一个已经使用df = pd.read_csv("af.csv")
导入的csv文件
The CSV file looks like this (preview): CSV文件如下所示(预览):
"match_id","start_time","win","leaguename","opposing_team","team","min"
2992096687,1486840800,True,"CaptainsDraft",3729377,2642171,1453382256
2992217489,1486845476,true,"Captains Draft",3729377,2642171,1453382256
2994454005,1486926905,false,"Captains Draft",2586976,2642171,1453382256
2659805546,1474478411,false,"BTSSeries",55,2642171,1454281287
2659879628,1474481141,false,"BTSSeries",55,2642171,1454281287
2661783205,1474563571,false,"BTSSeries",2537636,2642171,1454281287
2661875544,1474566865,false,"BTSSeries",2537636,2642171,1454281287
2662027296,1474573160,true,"BTSSeries",59,2642171,1454281287
2758086417,1478352060,true,"ESLManila16",2163,2642171,1454692269
2758241073,1478355547,true,"ESLManila16",2163,2642171,1454692269
2747710178,1477941012,false,"ESLFrankfurt16",2850016,2642171,1459782261
2747808587,1477945318,true,"ESLFrankfurt16",2850016,2642171,1459782261
2747861268,1477947994,true,"ESLFrankfurt16",2850016,2642171,1459782261
Now what I'm trying to do is keep the first match of a league followed by the number of wins (True being a win, and False being a loss) of all matches on that league and then sorting it by start_time 现在,我想要做的就是保持联赛之后(真正是一个双赢,而假是一个损失)在该联盟 的所有比赛 ,然后由START_TIME分类整理胜场数的第一次比赛
I have below code to do this: 我有以下代码可以做到这一点:
df1 = df.groupby(['leaguename', 'team']).sum().reset_index()
df1 = df1[['win','leaguename','team']]
df2 = df.sort_values("start_time").groupby("leaguename", as_index=False).first()
df2 = df2[['leaguename', 'start_time']]
output = pd.merge(df1, df2, 'inner', on = 'leaguename')
The output returns with jumbled unordered start_time: 输出以混乱的无序start_time返回:
,win,leaguename,team,start_time
0,5.0,ASUSROGSeason6,2642171,1478022101
1,6.0,CaptainsDraft,2642171,1486840800
2,3.0,Dota2Asia17,2642171,1486130597
3,2.0,DotaPitSeason5,2642171,1476903919
4,5.0,ESLFrankfurt16,2642171,1477941012
5,2.0,ESLManila16,2642171,1478352060
6,6.0,GlobalGrandMasters,2642171,1466176095
7,4.0,NanyangChampionshipsSeason2,2642171,1464178206
Desired output: 所需的输出:
,win,leaguename,team,start_time
0,4.0,NanyangChampionshipsSeason2,2642171,1464178206
1,6.0,GlobalGrandMasters,2642171,1466176095
2,2.0,DotaPitSeason5,2642171,1476903919
3,5.0,ESLFrankfurt16,2642171,1477941012
4,5.0,ASUSROGSeason6,2642171,1478022101
5,2.0,ESLManila16,2642171,1478352060
6,3.0,Dota2Asia17,2642171,1486130597
7,6.0,CaptainsDraft,2642171,1486840800
How can I achieve desired output? 如何获得所需的输出?
I think you need DataFrame.sort_values
by column start_time
with DataFrame.reset_index
and parameter drop=True
for default unique monotonic index: 我认为您需要按数据DataFrame.sort_values
为start_time
的DataFrame.reset_index
使用DataFrame.reset_index
和参数drop=True
作为默认唯一单调索引:
output = output.sort_values('start_time').reset_index(drop=True)
#data by output sample
print (output)
win leaguename team start_time
0 4.0 NanyangChampionshipsSeason2 2642171 1464178206
1 6.0 GlobalGrandMasters 2642171 1466176095
2 2.0 DotaPitSeason5 2642171 1476903919
3 5.0 ESLFrankfurt16 2642171 1477941012
4 5.0 ASUSROGSeason6 2642171 1478022101
5 2.0 ESLManila16 2642171 1478352060
6 3.0 Dota2Asia17 2642171 1486130597
7 6.0 CaptainsDraft 2642171 1486840800
Another solution is add sort=False
to both groupby
: 另一种解决方案是对两个groupby
都添加sort=False
:
df1 = df.groupby(['leaguename', 'team'], sort=False).sum().reset_index()
df1 = df1[['win','leaguename','team']]
df2 = df.sort_values("start_time").groupby("leaguename", as_index=False, sort=False).first()
df2 = df2[['leaguename', 'start_time']]
output = pd.merge(df1, df2, on = 'leaguename')
#data by input sample
print (output)
win leaguename team start_time
0 2.0 Captains Draft 2642171 1486840800
1 1.0 BTSSeries 2642171 1474478411
2 2.0 ESLManila16 2642171 1478352060
3 2.0 ESLFrankfurt16 2642171 1477941012
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.