简体   繁体   中英

How to get percentage for summing row wise in multiple column in pandas?

I am trying to get percentage tabular data where I tried to use crosstab function from pandas but row wises sum for each column wasn't correct (I doubled checked this with Excel sum). Basically, in my import-export trade data, I am trying to get a period percentage of each individual country.

tabular data :

here is the tabular data on public gist that I want to get percentage for each individual country by period.

to get column wise sum I did like this:

import pandas as pd

df=pd.read_csv('minimal_data.csv', encoding='utf-8')
df.loc[:,'Total'] = df.sum(axis=1)

but this sum is not the same as by doing the way of excel sum. I don't know why.

then I tried following to get percentage tabular data:

pd.crosstab(index=df.index, 
                     columns=df.columns, 
                     values=df.columns.value, 
                     aggfunc='sum', 
                     normalize='index').applymap('{:.2f}%'.format)

I am expecting the percentage of tabular data where the percentage of each individual country by period. I don't know why, in my attempt, I didn't get the correct sum and expected percentage table. can anyone point me out? any quick solution to get this done?

I think using crosstab is right here but I didn't get the correct percentage table by keeping the same row and column name convention. Any idea to make this work?

It's unclear what you mean by the 'sum' being wrong or different from Excel. If you want the percent of the total that you have calculated, you could just do this (it would have been easier, ie, without needing to set the index, if you had read the csv with the dates as the index already):

df = df.set_index('quarter')

df.div(df.Total, axis=0).applymap(lambda x: f'{x * 100:.2f}%')

结果表

To get the percentage,

df.set_index('quarter').apply(lambda x: (x / x.sum())*100, axis=1)

Output

              AUSTRAL     CANADA     N ZEAL     MEXICO   NICARAG   URUGUAY    C RICA    BRAZIL   HONDURA   IRELAND
quarter                                                                                                           
2014-01-01  25.440018  25.682501  26.799560  13.356812  4.645008  2.502126  1.185601  0.000000  0.388373  0.000000
2014-04-01  34.489028  20.473965  27.223601  10.739338  3.545756  2.637722  0.645318  0.000000  0.245270  0.000000
2014-07-01  41.388462  19.418827  17.413776  13.046643  4.365293  3.062794  1.000460  0.000000  0.303746  0.000000
2014-10-01  45.921175  19.947340  12.453399  10.987784  6.659666  2.472346  1.220976  0.000000  0.337314  0.000000
2015-01-01  34.779864  18.914200  23.802183  12.789158  4.607413  3.750432  1.113557  0.000000  0.242027  0.001166
2015-04-01  40.115581  15.889617  24.620569  12.233570  2.614697  3.684628  0.669135  0.000000  0.140994  0.031210
2015-07-01  44.545033  19.933480  16.419047  13.207045  1.903940  3.151725  0.706372  0.000000  0.000000  0.133357
2015-10-01  36.019231  25.727244  12.442655  16.527229  4.201449  3.803939  0.998293  0.000000  0.000000  0.279961
2016-01-01  29.991387  22.293687  24.963800  15.665886  3.364758  2.537703  0.964889  0.000000  0.000000  0.217890
2016-04-01  28.368131  22.124064  26.707744  16.011170  2.974021  2.736466  0.902486  0.000000  0.008214  0.167704
2016-07-01  25.368992  28.843584  17.562638  18.601159  4.361163  4.197427  0.900461  0.001082  0.000000  0.163494
2016-10-01  19.623932  30.095599  11.720699  27.695783  5.386881  3.950341  1.098037  0.262948  0.000000  0.165780
2017-01-01  20.799706  22.871970  23.475104  23.519770  4.726189  2.564349  1.105563  0.777981  0.000000  0.159366
2017-04-01  20.961391  24.807151  22.372555  20.141108  4.201882  3.848614  0.717434  2.847786  0.000000  0.102079
2017-07-01  26.326774  27.124571  16.796464  20.485338  4.180663  3.973982  0.748360  0.050250  0.122305  0.191292
2017-10-01  26.996354  29.432880  11.569669  22.702213  5.579304  2.623607  0.794317  0.000000  0.156468  0.145188
2018-01-01  20.148823  25.861165  24.566617  19.748647  5.864245  2.507594  0.946862  0.000000  0.218396  0.137650
2018-04-01  22.281189  26.300865  24.879217  18.074004  4.368848  3.058836  0.757353  0.000000  0.196459  0.083229
2018-07-01  24.996713  28.873588  16.749910  19.016680  5.816461  3.499820  0.757308  0.000000  0.140196  0.149324
2018-10-01  25.305780  31.831372   9.842619  22.351502  6.039240  3.353802  0.824540  0.000000  0.236478  0.214668

To plot in a line chart

>>> df.plot(kind='line')
<matplotlib.axes._subplots.AxesSubplot object at 0x7f418a3710b8>
>>> from matplotlib import pyplot as plt
>>> plt.show()

线图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM