简体   繁体   English

Pandas GroupedBy Dataframe 按列值排序

[英]Pandas GroupedBy Dataframe sorting by values of column

My question is about sorting.我的问题是关于排序。 I've read here multiple questions about groupby dataframes sorting but none of them solved my problem or maybe I'm doing something wrong because I get different errors like bool not callable or sort_values not available or stuff like that.我在这里阅读了有关 groupby 数据帧排序的多个问题,但它们都没有解决我的问题,或者我可能做错了什么,因为我遇到了不同的错误,例如bool not callablesort_values not available或类似的东西。

I got a dataframe with some info and columns.我有一个 dataframe 有一些信息和列。 Then I created a groupby dataframe properly, based on 2 fields.然后我根据 2 个字段正确创建了一个 groupby dataframe。

Then I do this:然后我这样做:

for name, group in mydfgrouped:
    print(name, len(group))

The output I get is this:我得到的 output 是这样的:

('A', 'HH') 414
('A', 'LW') 1413
('A', 'ME') 1458
('A', 'MR') 339
('B', 'HH') 126
('B', 'LW') 288
('B', 'ME') 315
('B', 'MR') 54
('C', 'HH') 879
('C', 'LW') 672
('C', 'ME') 984
('C', 'MR') 186
('D', 'HH') 246
('D', 'LW') 756
('D', 'ME') 795
('D', 'MR') 255

I would like to sort this dataframe based on 2 criteria:我想根据 2 个标准对这个 dataframe 进行排序:

  1. len of group HH descending order HH组的 len降序排列
  2. Name ascending order (this is just in case there's a tie in 'HH')名称升序(这是为了以防“HH”中有平局)

So the idea is to show the same list but descending sorted depending on the value of HH , but showing all stats for that name.所以想法是显示相同的列表,但根据HH的值降序排序,但显示该名称的所有统计信息。

My expected output is:我预期的 output 是:

('C', 'HH') 879
('C', 'LW') 672
('C', 'ME') 984
('C', 'MR') 186
('A', 'HH') 414
('A', 'LW') 1413
('A', 'ME') 1458
('A', 'MR') 339
('D', 'HH') 246
('D', 'LW') 756
('D', 'ME') 795
('D', 'MR') 255
('B', 'HH') 126
('B', 'LW') 288
('B', 'ME') 315
('B', 'MR') 54

As you can see, name C is first because it got the highest len in group HH (879).如您所见,名称C在第一位,因为它在HH组 (879) 中获得了最高的 len。 And I want to get all groups from C .我想从C获取所有组。 The last one is B because it got the lowest len of group HH (126)最后一个是B ,因为它获得了HH组中最低的 len (126)

sort_values and sort did not work for me sort_values 和 sort 对我不起作用

Assuming the second column that you're grouping by is called y , you can do:假设您分组的第二列称为y ,您可以执行以下操作:

# mydfgrouped = df.groupby(['x', 'y'])

s = mydfgrouped.size().unstack('y').sort_values('HH', ascending=False).stack()

for (i, l) in s.items():
    print(i, l)

Output: Output:

('C', 'HH') 879
('C', 'LW') 672
('C', 'ME') 984
('C', 'MR') 186
('A', 'HH') 414
('A', 'LW') 1413
('A', 'ME') 1458
('A', 'MR') 339
('D', 'HH') 246
('D', 'LW') 756
('D', 'ME') 795
('D', 'MR') 255
('B', 'HH') 126
('B', 'LW') 288
('B', 'ME') 315
('B', 'MR') 54

Brief explanation:简要说明:

# unstack `y` into columns
mydfgrouped.size().unstack('y')

# y   HH    LW    ME   MR
# x                      
# A  414  1413  1458  339
# B  126   288   315   54
# C  879   672   984  186
# D  246   756   795  255

# sort by HH
mydfgrouped.size().unstack('y').sort_values('HH', ascending=False)

# y   HH    LW    ME   MR
# x                      
# C  879   672   984  186
# A  414  1413  1458  339
# D  246   756   795  255
# B  126   288   315   54

# stack `y` back into rows
mydfgrouped.size().unstack('y').sort_values('HH', ascending=False).stack()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM