循环只占用最后一个值

Question

I have a dataFrame with country-specific population for each year and a pandas Series with the world population for each year. 我有一个每年都有特定国家人口的数据框架和一个每年世界人口的熊猫系列。 This is the Series I am using: 这是我正在使用的系列：

pop_tot = df3.groupby('Year')['population'].sum()
Year     
1990    4.575442e+09
1991    4.659075e+09
1992    4.699921e+09
1993    4.795129e+09
1994    4.862547e+09
1995    4.949902e+09
...     ...
2017    6.837429e+09

and this is the DataFrame I am using 这是我正在使用的DataFrame

        Country      Year   HDI     population
0       Afghanistan 1990    NaN     1.22491e+07
1       Albania     1990    0.645   3.28654e+06
2       Algeria     1990    0.577   2.59124e+07
3       Andorra     1990    NaN     54509
4       Angola      1990    NaN     1.21714e+07
...     ...         ...     ...     ...
4096    Uzbekistan  2017    0.71    3.23872e+07 
4097    Vanuatu     2017    0.603   276244  
4098    Zambia      2017    0.588   1.70941e+07 
4099    Zimbabwe    2017    0.535   1.65299e+07

I want to calculate the proportion of the world's population that the population of that country represents for each year, so I loop over the Series and the DataFrame as follows: 我想计算一年中该国人口所代表的世界人口比例，因此我按如下方式对系列和数据框进行循环：

j = 0
for i in range(len(df3)):
    if df3.iloc[i,1]==pop_tot.index[j]:
        df3['pop_tot']=pop_tot[j] #Sanity check
        df3['weighted']=df3['population']/pop_tot[j]
        *df3.iloc[i,2]
    else:
        j=j+1

However, the DataFrame that I get in return is not the expected one. 但是，我获得的DataFrame不是预期的。 I end up dividing all the values by the total population of 2017, thus giving me proportions which are not the correct ones for that year (ie for this first rows, pop_tot should be 4.575442e+09 as it corresponds to 1990 according to the Series above and not 6.837429e+09 which corresponds to 2017). 我最终将所有数值除以2017年的总人口数，从而给出了当年不正确的比例（即，对于第一行，pop_tot应该是4.575442e + 09，因为它对应于1990年根据系列以上而不是6.837429e + 09，相当于2017年）。

     Country   Year HDI   population  pop_tot      weighted
  0  Albania   1990 0.645 3.28654e+06 6.837429e+09 0.000257158
  1  Algeria   1990 0.577 2.59124e+07 6.837429e+09 0.00202753
  2  Argentina 1990 0.704 3.27297e+07 6.837429e+09 0.00256096

I can't see however what's the mistake in the loop. 然而，我无法看到循环中的错误是什么。 Thanks in advance. 提前致谢。

Answer 1

You don't need loop, you can use groupby.transform to create the column pop_tot in df3 directly. 您不需要循环，您可以使用groupby.transform直接在df3创建列pop_tot 。 then for the column weighted just do column operation, such as: 然后为列weighted只做列操作，如：

df3['pop_tot'] = df3.groupby('Year')['population'].transform(sum)
df3['weighted'] = df3['population']/df3['pop_tot']

As @roganjosh pointed out, the problem with your method is that you replace the whole columns pop_tot and weighted everytime your condition if is met, so at the last iteration where this condition is met, the year being probably 2017, you define the value of the column pop_tot being the one of 2017 and calculate the weithed with this value as well. 正如@roganjosh指出的那样，你的方法的问题在于， if满足你的条件，你每次替换整个列pop_tot并weighted ，所以在满足这个条件的最后一次迭代，年可能是2017年，你定义的值列pop_tot是2017年之一，并且也使用此值计算weithed。

Answer 2

You dont have to loop, its slower and can make things really complex quite fast. 你不必循环，它的速度慢，可以让事情变得非常复杂。 Use pandas and numpys vectorized solutions like this for example: 像这样使用pandas和numpys矢量化解决方案：

df['pop_tot'] = df.population.sum()
df['weighted'] =  df.population / df.population.sum()

print(df)
       Country  Year    HDI  population     pop_tot  weighted
0  Afghanistan  1990    NaN  12249100.0  53673949.0  0.228213
1      Albania  1990  0.645   3286540.0  53673949.0  0.061232
2      Algeria  1990  0.577  25912400.0  53673949.0  0.482774
3      Andorra  1990    NaN     54509.0  53673949.0  0.001016
4       Angola  1990    NaN  12171400.0  53673949.0  0.226766

Edit after OP's comment OP评论后编辑

df['pop_tot'] = df.groupby('Year').population.transform('sum')

df['weighted'] =  df.population / df['pop_tot']

print(df)
       Country  Year    HDI  population     pop_tot  weighted
0  Afghanistan  1990    NaN  12249100.0  53673949.0  0.228213
1      Albania  1990  0.645   3286540.0  53673949.0  0.061232
2      Algeria  1990  0.577  25912400.0  53673949.0  0.482774
3      Andorra  1990    NaN     54509.0  53673949.0  0.001016
4       Angola  1990    NaN  12171400.0  53673949.0  0.226766

note 注意
I used the small dataset you gave as example: 我使用了您提供的小数据集作为示例：

    Country     Year    HDI     population
0   Afghanistan 1990    NaN     12249100.0
1   Albania     1990    0.645   3286540.0
2   Algeria     1990    0.577   25912400.0
3   Andorra     1990    NaN     54509.0
4   Angola      1990    NaN     12171400.0

循环只占用最后一个值

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-03-11 00:15:05

解决方案2
0 2019-03-11 00:18:07

循环只占用最后一个值

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-03-11 00:15:05

解决方案2 0 2019-03-11 00:18:07

解决方案1
3 已采纳 2019-03-11 00:15:05

解决方案2
0 2019-03-11 00:18:07