[英]Pandas sorting by group aggregate
I've already seen this question , but the desired outcome there is slightly different from mine. 我已经看过这个问题了 ,但是我想要的结果与我的略有不同。
Imagine a dataframe grouped thusly: 想象一下如此分组的数据帧:
df.groupby(['product_name', 'usage_type']).total_cost.sum()
product_name usage_type
Lorem A 30.694665
B 0.000634
C 1.659360
D 0.000031
E 3339.140042
F 0.074340
Ipsum G 9.627360
A 19.053377
D 14.492155
Dolor B 9.698245
H 6993.792163
C 31947.955679
D 2150.400001
E 26.337789
Name: total_cost, dtype: float6
The output I want is the same structure, but with two properties: 我想要的输出是相同的结构,但有两个属性:
Such that the highest-cost products show up first, but still preserving the breakdown. 这样,成本最高的产品首先出现,但仍然保留了故障。
If it is significantly simpler, I'm okay with dropping the secondary sorting by usage type. 如果它显然更简单,我可以通过使用类型删除二级排序。
Starting with your grouped DataFrame: 从分组的DataFrame开始:
import pandas as pd
df2 = pd.read_table('data', sep='\s+').set_index(['product_name', 'usage_type'])
# val
# product_name usage_type
# Lorem A 30.694665
# B 0.000634
# C 1.659360
# D 0.000031
# E 3339.140042
# F 0.074340
# Ipsum G 9.627360
# A 19.053377
# D 14.492155
# Dolor B 9.698245
# H 6993.792163
# C 31947.955679
# D 2150.400001
# E 26.337789
You could store the key values in new columns: 您可以将键值存储在新列中:
df2['key1'] = df2.groupby(level='product_name')['val'].transform('sum')
df2['key2'] = df2.index.get_level_values('usage_type')
and then sort by those key columns: 然后按这些键列排序:
# >>> df2.sort(['key1', 'key2'], ascending=[False,True])
# val key1 key2
# product_name usage_type
# Dolor B 9.698245 41128.183877 B
# C 31947.955679 41128.183877 C
# D 2150.400001 41128.183877 D
# E 26.337789 41128.183877 E
# H 6993.792163 41128.183877 H
# Lorem A 30.694665 3371.569072 A
# B 0.000634 3371.569072 B
# C 1.659360 3371.569072 C
# D 0.000031 3371.569072 D
# E 3339.140042 3371.569072 E
# F 0.074340 3371.569072 F
# Ipsum A 19.053377 43.172892 A
# D 14.492155 43.172892 D
# G 9.627360 43.172892 G
result = df2.sort(['key1', 'key2'], ascending=[False,True])['val']
print(result)
yields 产量
product_name usage_type
Dolor B 9.698245
C 31947.955679
D 2150.400001
E 26.337789
H 6993.792163
Lorem A 30.694665
B 0.000634
C 1.659360
D 0.000031
E 3339.140042
F 0.074340
Ipsum A 19.053377
D 14.492155
G 9.627360
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.