[英]Pandas GroupedBy Dataframe sorting by values of column
My question is about sorting.我的问题是关于排序。 I've read here multiple questions about groupby dataframes sorting but none of them solved my problem or maybe I'm doing something wrong because I get different errors like bool not callable or sort_values not available or stuff like that.
我在这里阅读了有关 groupby 数据帧排序的多个问题,但它们都没有解决我的问题,或者我可能做错了什么,因为我遇到了不同的错误,例如bool not callable或sort_values not available或类似的东西。
I got a dataframe with some info and columns.我有一个 dataframe 有一些信息和列。 Then I created a groupby dataframe properly, based on 2 fields.
然后我根据 2 个字段正确创建了一个 groupby dataframe。
Then I do this:然后我这样做:
for name, group in mydfgrouped:
print(name, len(group))
The output I get is this:我得到的 output 是这样的:
('A', 'HH') 414
('A', 'LW') 1413
('A', 'ME') 1458
('A', 'MR') 339
('B', 'HH') 126
('B', 'LW') 288
('B', 'ME') 315
('B', 'MR') 54
('C', 'HH') 879
('C', 'LW') 672
('C', 'ME') 984
('C', 'MR') 186
('D', 'HH') 246
('D', 'LW') 756
('D', 'ME') 795
('D', 'MR') 255
I would like to sort this dataframe based on 2 criteria:我想根据 2 个标准对这个 dataframe 进行排序:
HH
descending order HH
组的 len降序排列 So the idea is to show the same list but descending sorted depending on the value of HH
, but showing all stats for that name.所以想法是显示相同的列表,但根据
HH
的值降序排序,但显示该名称的所有统计信息。
My expected output is:我预期的 output 是:
('C', 'HH') 879
('C', 'LW') 672
('C', 'ME') 984
('C', 'MR') 186
('A', 'HH') 414
('A', 'LW') 1413
('A', 'ME') 1458
('A', 'MR') 339
('D', 'HH') 246
('D', 'LW') 756
('D', 'ME') 795
('D', 'MR') 255
('B', 'HH') 126
('B', 'LW') 288
('B', 'ME') 315
('B', 'MR') 54
As you can see, name C
is first because it got the highest len in group HH
(879).如您所见,名称
C
在第一位,因为它在HH
组 (879) 中获得了最高的 len。 And I want to get all groups from C
.我想从
C
获取所有组。 The last one is B
because it got the lowest len of group HH
(126)最后一个是
B
,因为它获得了HH
组中最低的 len (126)
sort_values and sort did not work for me sort_values 和 sort 对我不起作用
Assuming the second column that you're grouping by is called y
, you can do:假设您分组的第二列称为
y
,您可以执行以下操作:
# mydfgrouped = df.groupby(['x', 'y'])
s = mydfgrouped.size().unstack('y').sort_values('HH', ascending=False).stack()
for (i, l) in s.items():
print(i, l)
Output: Output:
('C', 'HH') 879
('C', 'LW') 672
('C', 'ME') 984
('C', 'MR') 186
('A', 'HH') 414
('A', 'LW') 1413
('A', 'ME') 1458
('A', 'MR') 339
('D', 'HH') 246
('D', 'LW') 756
('D', 'ME') 795
('D', 'MR') 255
('B', 'HH') 126
('B', 'LW') 288
('B', 'ME') 315
('B', 'MR') 54
Brief explanation:简要说明:
# unstack `y` into columns
mydfgrouped.size().unstack('y')
# y HH LW ME MR
# x
# A 414 1413 1458 339
# B 126 288 315 54
# C 879 672 984 186
# D 246 756 795 255
# sort by HH
mydfgrouped.size().unstack('y').sort_values('HH', ascending=False)
# y HH LW ME MR
# x
# C 879 672 984 186
# A 414 1413 1458 339
# D 246 756 795 255
# B 126 288 315 54
# stack `y` back into rows
mydfgrouped.size().unstack('y').sort_values('HH', ascending=False).stack()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.