My data frame is below:
Date Country GDP
0 2011 United States 345.0
1 2012 United States 0.0
2 2013 United States 457.0
3 2014 United States 577.0
4 2015 United States 0.0
5 2016 United States 657.0
6 2011 UK 35.0
7 2012 UK 64.0
8 2013 UK 54.0
9 2014 UK 67.0
10 2015 UK 687.0
11 2016 UK 0.0
12 2011 China 34.0
13 2012 China 54.0
14 2013 China 678.0
15 2014 China 355.0
16 2015 China 5678.0
17 2016 China 345.0
I want to calculate what is the GDP percentage of one country among all 3 countries each year. I would like to add one more column called parc in the dataframe:
I implemented below code:
import pandas as pd
countrylist=['United States','UK','China']
for country in countrylist:
for year in range (2011,2016):
df['perc']=(df['GDP'][(df['Country']==country) & (df['Date']==year)]).astype(float)/df['GDP'][df['Date']==year].sum()
print (df['perc'])
My output is like
0 0.833333
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
0 NaN
1 0.0
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
0 NaN
1 NaN
2 0.384357
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
....
I noticed that my previous results got wipe out when new loop start. So ultimately I only have last perc value. I should provide some position info when df['perc'] happen such as:
df['perc'][([(df['Country']==country) & (df['Date']==year)])]=(df['GDP'][(df['Country']==country) & (df['Date']==year)]).astype(float)/df['GDP'][df['Date']==year].sum()
But it doesn't work. How can I dynamically insert value?
Ideally, I should have:
Date Country GDP perc
0 2011 United States 345.0 0.81
1 2012 United States 0.0 0.0
2 2013 United States 457.0 0.23
3 2014 United States 577.0 xx
4 2015 United States 0.0 xx
5 2016 United States 657.0 xx
6 2011 UK 35.0 xx
7 2012 UK 64.0 xx
8 2013 UK 54.0 xx
9 2014 UK 67.0 xx
10 2015 UK 687.0 xx
11 2016 UK 0.0 xx
12 2011 China 34.0 xx
13 2012 China 54.0 xx
14 2013 China 678.0 xx
15 2014 China 355.0 xx
16 2015 China 5678.0 xx
17 2016 China 345.0 xx
You can using transform
sum
here
df.GDP/df.groupby('Date').GDP.transform('sum')
Out[161]:
0 0.833333
1 0.000000
2 0.384357
3 0.577578
4 0.000000
5 0.655689
6 0.084541
7 0.542373
8 0.045416
9 0.067067
10 0.107934
11 0.000000
12 0.082126
13 0.457627
14 0.570227
15 0.355355
16 0.892066
17 0.344311
Name: GDP, dtype: float64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.