简体   繁体   English

两列的 Groupby 合计百分比

[英]Percentage of Total with Groupby for two columns

I have a DataFrame:我有一个 DataFrame:

df = pd.DataFrame({
    'Product': ['AA', 'AA', 'AA', 'AA', 'BB', 'BB', 'BB', 'BB'],
    'Type': ['AC', 'AC', 'AD', 'AD', 'BC', 'BC', 'BD', 'BD'],
    'Sales': [ 200, 100, 400, 100, 300, 100, 200, 500], 
    'Qty': [ 5, 3, 3, 6, 4, 7, 4, 1]})

I want to try and get the percentage of total by "Product" and "Type" for both "Sales" and "Qty".我想尝试获取“销售额”和“数量”的“产品”和“类型”占总数的百分比。 I can get the percentage of total for "Sales" and "Qty" separately.我可以分别获得“销售额”和“数量”占总数的百分比。 But I was wondering if there was a way of doing so for both columns.但我想知道是否有一种方法可以对两个列都这样做。

To get the percentage of total for one column, the code is:要获得一列的总百分比,代码是:

df['Sales'] = df['Sales'].astype(float)
df['Qty'] = df['Qty'].astype(float)
df = df[['Product', 'Type', 'Sales']]

df = df.groupby(['Product', 'Type']).agg({'Sales': 'sum'})
pcts = df.groupby(level= [0]).apply(lambda x: 100 * x / float(x.sum()))

Is there a way of get this for both columns in one go?有没有办法在一个 go 中为两列获取此信息?

You can chain groupby :您可以链接groupby

pct = lambda x: 100 * x / x.sum()

out = df.groupby(['Product', 'Type']).sum().groupby('Product').apply(pct)
print(out)

# Output
                  Sales        Qty
Product Type                      
AA      AC    37.500000  47.058824
        AD    62.500000  52.941176
BB      BC    36.363636  68.750000
        BD    63.636364  31.250000

You could groupby "Product" and "Type" get the totals for each group.您可以groupby “产品”和“类型”进行分组以获得每个组的总计。 Then groupby "Product" (which is level=0) again and transform sum ;然后groupby “Product”(level=0)分组并转换sum then divide the sum from the previous step with it:然后将上一步的总和除以它:

sm = df.groupby(['Product','Type']).sum()
out = sm / sm.groupby(level=0).transform('sum') * 100

Output: Output:

                  Sales        Qty
Product Type                      
AA      AC    37.500000  47.058824
        AD    62.500000  52.941176
BB      BC    36.363636  68.750000
        BD    63.636364  31.250000

One option is to get the values from individual groupbys and divide:一种选择是从各个 groupbys 中获取值并除以:

numerator = df.groupby(["Product", "Type"]).sum()
denominator = df.groupby("Product").sum()
numerator.div(denominator, level = 0, axis = 'index') * 100

                  Sales        Qty
Product Type                      
AA      AC    37.500000  47.058824
        AD    62.500000  52.941176
BB      BC    36.363636  68.750000
        BD    63.636364  31.250000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM