简体   繁体   中英

Sum values of a column for each value based on another column and divide it by total

Today i'm struggling once again with python and data-analytics.

I got a dataframe wich looks like this:

    name         totdmgdealt
0   Warwick      96980.0
1   Nami         25995.0
2   Draven       171568.0
3   Fiora        113721.0
4   Viktor       185302.0
5   Skarner      148791.0
6   Galio        130692.0
7   Ahri         145731.0
8   Jinx         182680.0
9   VelKoz       85785.0
10  Ziggs        46790.0
11  Cassiopeia   62444.0
12  Yasuo        117896.0
13  Warwick      129156.0
14  Evelynn      179252.0
15  Caitlyn      163342.0
16  Wukong       122919.0
17  Syndra       146754.0
18  Karma        35766.0
19  Warwick      117790.0
20  Draven       74879.0
21  Janna        11242.0
22  Lux          66424.0
23  Amumu        87826.0
24  Vayne        76085.0
25  Ahri         93334.0
..
..
..

this is a dataframe, which includes the total damage of a champion for one game. Now i want to group these information, so i can see which champion overall has the most damage dealt. I tried groupby('name') but it didn't work at all. I allready went through some threads about groupby and summing values but i didn't solve my specific problem.

The dealt damage of each champion should also be shown as percentage of the total.

I'm looking for something like this as an output:

    name     totdmgdealt  percentage
0   Warwick  2378798098     2.1  %
1   Nami     2837491074     2.3  %
2   Draven   1231451224     ..
3   Fiora    1287301724     ..
4   Viktor   1239808504     ..
5   Skarner  1487911234     ..
6   Galio    1306921234     ..

We can groupby on name and get the sum then we divide each value by the total with .div and multiply it by 100 with .mul and finally round it to one decimal with .round :

total = df['totdmgdealt'].sum()

summed = df.groupby('name', sort=False)['totdmgdealt'].sum().reset_index()

summed['percentage'] = summed.groupby('name', sort=False)['totdmgdealt']\
                             .sum()\
                             .div(total)\
                             .mul(100)\
                             .round(1).values
          name  totdmgdealt  percentage
0      Warwick     343926.0        12.2
1         Nami      25995.0         0.9
2       Draven     246447.0         8.7
3        Fiora     113721.0         4.0
4       Viktor     185302.0         6.6
5      Skarner     148791.0         5.3
6        Galio     130692.0         4.6
7         Ahri     239065.0         8.5
8         Jinx     182680.0         6.5
9       VelKoz      85785.0         3.0
10       Ziggs      46790.0         1.7
11  Cassiopeia      62444.0         2.2
12       Yasuo     117896.0         4.2
13     Evelynn     179252.0         6.4
14     Caitlyn     163342.0         5.8
15      Wukong     122919.0         4.4
16      Syndra     146754.0         5.2
17       Karma      35766.0         1.3
18       Janna      11242.0         0.4
19         Lux      66424.0         2.4
20       Amumu      87826.0         3.1
21       Vayne      76085.0         2.7

you can use sum() to get the total dmg, and apply to calculate the precent relevant for each row, like this:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO("""
    name         totdmgdealt
0   Warwick      96980.0
1   Nami         25995.0
2   Draven       171568.0
3   Fiora        113721.0
4   Viktor       185302.0
5   Skarner      148791.0
6   Galio        130692.0
7   Ahri         145731.0
8   Jinx         182680.0
9   VelKoz       85785.0
10  Ziggs        46790.0
11  Cassiopeia   62444.0
12  Yasuo        117896.0
13  Warwick      129156.0
14  Evelynn      179252.0
15  Caitlyn      163342.0
16  Wukong       122919.0
17  Syndra       146754.0
18  Karma        35766.0
19  Warwick      117790.0
20  Draven       74879.0
21  Janna        11242.0
22  Lux          66424.0
23  Amumu        87826.0
24  Vayne        76085.0
25  Ahri         93334.0"""), sep=r"\s+")

summed_df = df.groupby('name')['totdmgdealt'].agg(['sum']).rename(columns={"sum": "totdmgdealt"}).reset_index()
summed_df['percentage'] = summed_df.apply(
    lambda x: "{:.2f}%".format(x['totdmgdealt'] / summed_df['totdmgdealt'].sum() * 100), axis=1)
print(summed_df)

Output:

          name  totdmgdealt percentage
0         Ahri     239065.0      8.48%
1        Amumu      87826.0      3.12%
2      Caitlyn     163342.0      5.79%
3   Cassiopeia      62444.0      2.21%
4       Draven     246447.0      8.74%
5      Evelynn     179252.0      6.36%
6        Fiora     113721.0      4.03%
7        Galio     130692.0      4.64%
8        Janna      11242.0      0.40%
9         Jinx     182680.0      6.48%
10       Karma      35766.0      1.27%
11         Lux      66424.0      2.36%
12        Nami      25995.0      0.92%
13     Skarner     148791.0      5.28%
14      Syndra     146754.0      5.21%
15       Vayne      76085.0      2.70%
16      VelKoz      85785.0      3.04%
17      Viktor     185302.0      6.57%
18     Warwick     343926.0     12.20%
19      Wukong     122919.0      4.36%
20       Yasuo     117896.0      4.18%
21       Ziggs      46790.0      1.66%

Maybe You can Try this: I tried to achieve the same using my sample data and try to run the below code into your Jupyter Notebook:


import pandas as pd
name=['abhit','mawa','vaibhav','dharam','sid','abhit','vaibhav','sid','mawa','lakshya']
totdmgdealt=[24,45,80,22,89,55,89,51,93,85]
name=pd.Series(name,name='name')               #converting into series 
totdmgdealt=pd.Series(totdmgdealt,name='totdmgdealt')  #converting into series
data=pd.concat([name,totdmgdealt],axis=1)
data=pd.DataFrame(data)                      #converting into Dataframe 
final=data.pivot_table(values="totdmgdealt",columns="name",aggfunc="sum").transpose()  #actual aggregating method
total=data['totdmgdealt'].sum()            #calculating total for calculating percentage
def calPer(row,total):                     #actual Function for Percentage
    return ((row/total)*100).round(2)
total=final['totdmgdealt'].sum()
final['Percentage']=calPer(final['totdmgdealt'],total)  #assigning the function to the column
final

Sample Data :

    name    totdmgdealt
0   abhit   24
1   mawa    45
2   vaibhav 80
3   dharam  22
4   sid     89
5   abhit   55
6   vaibhav 89
7   sid     51
8   mawa    93
9   lakshya 85

Output:

        totdmgdealt     Percentage
name        
abhit     79               12.48
dharam    22               3.48
lakshya   85               13.43
mawa      138              21.80
sid       140              22.12
vaibhav   169              26.70

Understand and run the code and just replace the dataset with Yours. Maybe This Helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM