[英]Loop only takes last value
I have a dataFrame with country-specific population for each year and a pandas Series with the world population for each year. 我有一个每年都有特定国家人口的数据框架和一个每年世界人口的熊猫系列。 This is the Series I am using:
这是我正在使用的系列:
pop_tot = df3.groupby('Year')['population'].sum()
Year
1990 4.575442e+09
1991 4.659075e+09
1992 4.699921e+09
1993 4.795129e+09
1994 4.862547e+09
1995 4.949902e+09
... ...
2017 6.837429e+09
and this is the DataFrame I am using 这是我正在使用的DataFrame
Country Year HDI population
0 Afghanistan 1990 NaN 1.22491e+07
1 Albania 1990 0.645 3.28654e+06
2 Algeria 1990 0.577 2.59124e+07
3 Andorra 1990 NaN 54509
4 Angola 1990 NaN 1.21714e+07
... ... ... ... ...
4096 Uzbekistan 2017 0.71 3.23872e+07
4097 Vanuatu 2017 0.603 276244
4098 Zambia 2017 0.588 1.70941e+07
4099 Zimbabwe 2017 0.535 1.65299e+07
I want to calculate the proportion of the world's population that the population of that country represents for each year, so I loop over the Series and the DataFrame as follows: 我想计算一年中该国人口所代表的世界人口比例,因此我按如下方式对系列和数据框进行循环:
j = 0
for i in range(len(df3)):
if df3.iloc[i,1]==pop_tot.index[j]:
df3['pop_tot']=pop_tot[j] #Sanity check
df3['weighted']=df3['population']/pop_tot[j]
*df3.iloc[i,2]
else:
j=j+1
However, the DataFrame that I get in return is not the expected one. 但是,我获得的DataFrame不是预期的。 I end up dividing all the values by the total population of 2017, thus giving me proportions which are not the correct ones for that year (ie for this first rows, pop_tot should be 4.575442e+09 as it corresponds to 1990 according to the Series above and not 6.837429e+09 which corresponds to 2017).
我最终将所有数值除以2017年的总人口数,从而给出了当年不正确的比例(即,对于第一行,pop_tot应该是4.575442e + 09,因为它对应于1990年根据系列以上而不是6.837429e + 09,相当于2017年)。
Country Year HDI population pop_tot weighted
0 Albania 1990 0.645 3.28654e+06 6.837429e+09 0.000257158
1 Algeria 1990 0.577 2.59124e+07 6.837429e+09 0.00202753
2 Argentina 1990 0.704 3.27297e+07 6.837429e+09 0.00256096
I can't see however what's the mistake in the loop. 然而,我无法看到循环中的错误是什么。 Thanks in advance.
提前致谢。
You don't need loop, you can use groupby.transform
to create the column pop_tot
in df3
directly. 您不需要循环,您可以使用
groupby.transform
直接在df3
创建列pop_tot
。 then for the column weighted
just do column operation, such as: 然后为列
weighted
只做列操作,如:
df3['pop_tot'] = df3.groupby('Year')['population'].transform(sum)
df3['weighted'] = df3['population']/df3['pop_tot']
As @roganjosh pointed out, the problem with your method is that you replace the whole columns pop_tot
and weighted
everytime your condition if
is met, so at the last iteration where this condition is met, the year being probably 2017, you define the value of the column pop_tot
being the one of 2017 and calculate the weithed with this value as well. 正如@roganjosh指出的那样,你的方法的问题在于,
if
满足你的条件,你每次替换整个列pop_tot
并weighted
,所以在满足这个条件的最后一次迭代,年可能是2017年,你定义的值列pop_tot
是2017年之一,并且也使用此值计算weithed。
You dont have to loop, its slower and can make things really complex quite fast. 你不必循环,它的速度慢,可以让事情变得非常复杂。 Use
pandas
and numpys
vectorized solutions like this for example: 像这样使用
pandas
和numpys
矢量化解决方案:
df['pop_tot'] = df.population.sum()
df['weighted'] = df.population / df.population.sum()
print(df)
Country Year HDI population pop_tot weighted
0 Afghanistan 1990 NaN 12249100.0 53673949.0 0.228213
1 Albania 1990 0.645 3286540.0 53673949.0 0.061232
2 Algeria 1990 0.577 25912400.0 53673949.0 0.482774
3 Andorra 1990 NaN 54509.0 53673949.0 0.001016
4 Angola 1990 NaN 12171400.0 53673949.0 0.226766
Edit after OP's comment OP评论后编辑
df['pop_tot'] = df.groupby('Year').population.transform('sum')
df['weighted'] = df.population / df['pop_tot']
print(df)
Country Year HDI population pop_tot weighted
0 Afghanistan 1990 NaN 12249100.0 53673949.0 0.228213
1 Albania 1990 0.645 3286540.0 53673949.0 0.061232
2 Algeria 1990 0.577 25912400.0 53673949.0 0.482774
3 Andorra 1990 NaN 54509.0 53673949.0 0.001016
4 Angola 1990 NaN 12171400.0 53673949.0 0.226766
note 注意
I used the small dataset you gave as example: 我使用了您提供的小数据集作为示例:
Country Year HDI population
0 Afghanistan 1990 NaN 12249100.0
1 Albania 1990 0.645 3286540.0
2 Algeria 1990 0.577 25912400.0
3 Andorra 1990 NaN 54509.0
4 Angola 1990 NaN 12171400.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.