简体   繁体   English

大熊猫按群体排序

[英]Pandas sorting by group aggregate

I've already seen this question , but the desired outcome there is slightly different from mine. 我已经看过这个问题了 ,但是我想要的结果与我的略有不同。

Imagine a dataframe grouped thusly: 想象一下如此分组的数据帧:

df.groupby(['product_name', 'usage_type']).total_cost.sum()

product_name   usage_type
Lorem          A               30.694665
               B                0.000634
               C                1.659360
               D                0.000031
               E             3339.140042
               F                0.074340
Ipsum          G                9.627360
               A               19.053377
               D               14.492155
Dolor          B                9.698245
               H             6993.792163
               C            31947.955679
               D             2150.400001
               E               26.337789
Name: total_cost, dtype: float6

The output I want is the same structure, but with two properties: 我想要的输出是相同的结构,但有两个属性:

  1. Order the product names by the sum of the costs 按成本总和订购产品名称
  2. Order the usage types lexicographically (happy alternative: ordering these by descending cost) 按字典顺序排列使用类型(快乐的替代方案:按降低成本排序)

Such that the highest-cost products show up first, but still preserving the breakdown. 这样,成本最高的产品首先出现,但仍然保留了故障。

If it is significantly simpler, I'm okay with dropping the secondary sorting by usage type. 如果它显然更简单,我可以通过使用类型删除二级排序。

Starting with your grouped DataFrame: 从分组的DataFrame开始:

import pandas as pd
df2 = pd.read_table('data', sep='\s+').set_index(['product_name', 'usage_type'])
#                                   val
# product_name usage_type              
# Lorem        A              30.694665
#              B               0.000634
#              C               1.659360
#              D               0.000031
#              E            3339.140042
#              F               0.074340
# Ipsum        G               9.627360
#              A              19.053377
#              D              14.492155
# Dolor        B               9.698245
#              H            6993.792163
#              C           31947.955679
#              D            2150.400001
#              E              26.337789

You could store the key values in new columns: 您可以将键值存储在新列中:

df2['key1'] = df2.groupby(level='product_name')['val'].transform('sum')
df2['key2'] = df2.index.get_level_values('usage_type')

and then sort by those key columns: 然后按这些键列排序:

# >>> df2.sort(['key1', 'key2'], ascending=[False,True])
#                                   val          key1 key2
# product_name usage_type                                 
# Dolor        B               9.698245  41128.183877    B
#              C           31947.955679  41128.183877    C
#              D            2150.400001  41128.183877    D
#              E              26.337789  41128.183877    E
#              H            6993.792163  41128.183877    H
# Lorem        A              30.694665   3371.569072    A
#              B               0.000634   3371.569072    B
#              C               1.659360   3371.569072    C
#              D               0.000031   3371.569072    D
#              E            3339.140042   3371.569072    E
#              F               0.074340   3371.569072    F
# Ipsum        A              19.053377     43.172892    A
#              D              14.492155     43.172892    D
#              G               9.627360     43.172892    G

result = df2.sort(['key1', 'key2'], ascending=[False,True])['val']
print(result)

yields 产量

product_name  usage_type
Dolor         B                 9.698245
              C             31947.955679
              D              2150.400001
              E                26.337789
              H              6993.792163
Lorem         A                30.694665
              B                 0.000634
              C                 1.659360
              D                 0.000031
              E              3339.140042
              F                 0.074340
Ipsum         A                19.053377
              D                14.492155
              G                 9.627360

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM