简体   繁体   中英

Python Pandas Dataframe Dynamic Adding column

My data frame is below:

    Date        Country     GDP
0   2011  United States   345.0
1   2012  United States     0.0
2   2013  United States   457.0
3   2014  United States   577.0
4   2015  United States     0.0
5   2016  United States   657.0
6   2011             UK    35.0
7   2012             UK    64.0
8   2013             UK    54.0
9   2014             UK    67.0
10  2015             UK   687.0
11  2016             UK     0.0
12  2011          China    34.0
13  2012          China    54.0
14  2013          China   678.0
15  2014          China   355.0
16  2015          China  5678.0
17  2016          China   345.0

I want to calculate what is the GDP percentage of one country among all 3 countries each year. I would like to add one more column called parc in the dataframe:

I implemented below code:

import pandas as pd
countrylist=['United States','UK','China']
for country in countrylist:
    for year in range (2011,2016):      
        df['perc']=(df['GDP'][(df['Country']==country) & (df['Date']==year)]).astype(float)/df['GDP'][df['Date']==year].sum()
        print (df['perc'])

My output is like

    0     0.833333
    1          NaN
    2          NaN
    3          NaN
    4          NaN
    5          NaN
    6          NaN
    7          NaN
    8          NaN
    9          NaN
    10         NaN
    11         NaN
    12         NaN
    13         NaN
    14         NaN
    15         NaN
    16         NaN
    17         NaN
    0     NaN
    1     0.0
    2     NaN
    3     NaN
    4     NaN
    5     NaN
    6     NaN
    7     NaN
    8     NaN
    9     NaN
    10    NaN
    11    NaN
    12    NaN
    13    NaN
    14    NaN
    15    NaN
    16    NaN
    17    NaN
0          NaN
1          NaN
2     0.384357
3          NaN
4          NaN
5          NaN
6          NaN
7          NaN
8          NaN
9          NaN
10         NaN
11         NaN
12         NaN
13         NaN
14         NaN
15         NaN
16         NaN
17         NaN

....

I noticed that my previous results got wipe out when new loop start. So ultimately I only have last perc value. I should provide some position info when df['perc'] happen such as:

df['perc'][([(df['Country']==country) & (df['Date']==year)])]=(df['GDP'][(df['Country']==country) & (df['Date']==year)]).astype(float)/df['GDP'][df['Date']==year].sum()

But it doesn't work. How can I dynamically insert value?

Ideally, I should have:

    Date        Country     GDP    perc
0   2011  United States   345.0    0.81
1   2012  United States     0.0    0.0
2   2013  United States   457.0    0.23
3   2014  United States   577.0    xx
4   2015  United States     0.0    xx
5   2016  United States   657.0    xx
6   2011             UK    35.0    xx
7   2012             UK    64.0    xx
8   2013             UK    54.0    xx
9   2014             UK    67.0    xx
10  2015             UK   687.0    xx
11  2016             UK     0.0    xx
12  2011          China    34.0    xx
13  2012          China    54.0    xx
14  2013          China   678.0    xx
15  2014          China   355.0    xx
16  2015          China  5678.0    xx
17  2016          China   345.0    xx

You can using transform sum here

df.GDP/df.groupby('Date').GDP.transform('sum')
Out[161]: 
0     0.833333
1     0.000000
2     0.384357
3     0.577578
4     0.000000
5     0.655689
6     0.084541
7     0.542373
8     0.045416
9     0.067067
10    0.107934
11    0.000000
12    0.082126
13    0.457627
14    0.570227
15    0.355355
16    0.892066
17    0.344311
Name: GDP, dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM