简体   繁体   English

(Python)如何获取熊猫中多列之和的平均值

[英](Python) How to get the average of the sum of multiple columns in pandas

This is a sample what my dataframe looks like: 这是我的数据框的示例:

company_name country_code state_code software finance commerce etc......
google       USA           CA          1        0          0
jimmy        GBR           unknown     0        0          1
microsoft    USA           NY          1        0          0

I want to get the average number of each industry in each state for example: I could have that 14% of the industry in CA is in software, 15% of the industry in CA is healthcare etc... 例如,我想获取每个州的每个行业的平均数量:我可以假设CA中14%的行业是软件,CA中15%的行业是医疗保健等。

Obviously I need to get the total number of companies across all industries in each state and divide the number of companies in each individual industry by this to get the percentage of each industry in each state. 显然,我需要获取每个州所有行业的公司总数,然后将每个行业的公司数量除以此,即可得出每个州每个行业的百分比。

I just can't figure out a functioning way to do this. 我只是想不出一个可行的方法来做到这一点。

Obviously I have tried using something like this in different ways, but to no avail: 显然,我尝试以不同的方式使用类似的方法,但无济于事:

new_df = df['state_code'].value_counts(normalize=True)

I want to get the sum of all the columns software, finance, commerce etc... and then give the percentage of each column when compared to the other columns. 我想获取软件,金融,商业等所有列的总和,然后给出与其他列相比时各列的百分比。

Expected output: 预期产量:

State_Code software finance commerce etc..... 
CA           20%      10%     5%       65%
NY           10%      20%     10%      60%
AH           5%       5%      20%      70%

I believe need first aggregate sum and then divide by div sum of columns per rows: 我相信需要首先汇总sum ,然后除以每行列的div总和:

print (df)
  company_name country_code state_code  software  finance  commerce
0       google          USA         CA         1        0         4
1        jimmy          GBR    unknown         5        6         1
2    microsoft          USA         NY         1        0         0


#convert all columns without first to floats or ints
cols = df.columns.difference(['company_name', 'country_code', 'state_code'])
df[cols] = df[cols].astype(float)
#if not possible use astype because some non numeric values 
#df[cols] = df[cols].apply(lambda x: pd.to_numeric(x, errors='coerce'))

a = df.groupby(['state_code']).sum()
df = a.div(a.sum(axis=1), axis=0)
print (df)
            software  finance  commerce
state_code                             
CA          0.200000      0.0  0.800000
NY          1.000000      0.0  0.000000
unknown     0.416667      0.5  0.083333

If you need also percentage, multiply by 100 and if necessesary add round and cast to integer s: 如果还需要百分比,则乘以100并在必要时加上round并转换为integer s:

df = a.div(a.sum(1), axis=0).mul(100).round(0).astype(int)
print (df)
            software  finance  commerce
state_code                             
CA                20        0        80
NY               100        0         0
unknown           42       50         8

Last add percentage , but then values are not numeric, so no possible process later: 最后添加percentage ,但随后的值不是数字,因此以后无法进行任何处理:

df = a.div(a.sum(1), axis=0).mul(100).round(0).astype(int).astype(str).add('%')
print (df)
           software finance commerce
state_code                          
CA              20%      0%      80%
NY             100%      0%       0%
unknown         42%     50%       8%

The best way to do this is to put all the industry in an array. 最好的方法是将整个行业整合在一起。 In my solution, I have called this testy. 在我的解决方案中,我称这个为“证人”。

First get the sum of all industries. 首先获取所有行业的总和。

count = 0
for i in testy:
count += int(usa_df[i].sum())

Then divide this sum by the total of each industry and divide this by count and multiply by 100% This will then get you a percentage of each industry in the market. 然后将该总和除以每个行业的总数,再除以计数再乘以100%。这将为您提供市场上每个行业的百分比。

for i in testy:
    tot = usa_df[i].sum()
    percent = (tot/count)*100
    print(i+" - "+str(percent)+"%"

The output will be as follows: 输出将如下所示:

software - 20%
finance  - 30%
commerce - 10%
etc........ 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM