（Python）如何获取熊猫中多列之和的平均值

Question

这是我的数据框的示例：

company_name country_code state_code software finance commerce etc......
google       USA           CA          1        0          0
jimmy        GBR           unknown     0        0          1
microsoft    USA           NY          1        0          0

例如，我想获取每个州的每个行业的平均数量：我可以假设CA中14％的行业是软件，CA中15％的行业是医疗保健等。

显然，我需要获取每个州所有行业的公司总数，然后将每个行业的公司数量除以此，即可得出每个州每个行业的百分比。

我只是想不出一个可行的方法来做到这一点。

显然，我尝试以不同的方式使用类似的方法，但无济于事：

new_df = df['state_code'].value_counts(normalize=True)

我想获取软件，金融，商业等所有列的总和，然后给出与其他列相比时各列的百分比。

预期产量：

State_Code software finance commerce etc..... 
CA           20%      10%     5%       65%
NY           10%      20%     10%      60%
AH           5%       5%      20%      70%

Answer 1

我相信需要首先汇总sum ，然后除以每行列的div总和：

print (df)
  company_name country_code state_code  software  finance  commerce
0       google          USA         CA         1        0         4
1        jimmy          GBR    unknown         5        6         1
2    microsoft          USA         NY         1        0         0


#convert all columns without first to floats or ints
cols = df.columns.difference(['company_name', 'country_code', 'state_code'])
df[cols] = df[cols].astype(float)
#if not possible use astype because some non numeric values 
#df[cols] = df[cols].apply(lambda x: pd.to_numeric(x, errors='coerce'))

a = df.groupby(['state_code']).sum()
df = a.div(a.sum(axis=1), axis=0)
print (df)
            software  finance  commerce
state_code                             
CA          0.200000      0.0  0.800000
NY          1.000000      0.0  0.000000
unknown     0.416667      0.5  0.083333

如果还需要百分比，则乘以100并在必要时加上round并转换为integer s：

df = a.div(a.sum(1), axis=0).mul(100).round(0).astype(int)
print (df)
            software  finance  commerce
state_code                             
CA                20        0        80
NY               100        0         0
unknown           42       50         8

最后添加percentage ，但随后的值不是数字，因此以后无法进行任何处理：

df = a.div(a.sum(1), axis=0).mul(100).round(0).astype(int).astype(str).add('%')
print (df)
           software finance commerce
state_code                          
CA              20%      0%      80%
NY             100%      0%       0%
unknown         42%     50%       8%

Answer 2

最好的方法是将整个行业整合在一起。 在我的解决方案中，我称这个为“证人”。

首先获取所有行业的总和。

count = 0
for i in testy:
count += int(usa_df[i].sum())

然后将该总和除以每个行业的总数，再除以计数再乘以100％。这将为您提供市场上每个行业的百分比。

for i in testy:
    tot = usa_df[i].sum()
    percent = (tot/count)*100
    print(i+" - "+str(percent)+"%"

输出将如下所示：

software - 20%
finance  - 30%
commerce - 10%
etc........

（Python）如何获取熊猫中多列之和的平均值

问题描述

2 个解决方案

解决方案1
2 2018-04-11 17:46:07

解决方案2
0 2018-04-12 09:17:39

（Python）如何获取熊猫中多列之和的平均值

问题描述

2 个解决方案

解决方案1 2 2018-04-11 17:46:07

解决方案2 0 2018-04-12 09:17:39

解决方案1
2 2018-04-11 17:46:07

解决方案2
0 2018-04-12 09:17:39