[英]Groupby function in pandas dataframe of Python does not seem to work
我有一张表格,其中包含15个国家/地区的各种信息(例如,能源供应,可再生能源供应的比例)。 我必须创建一个数据框,其中包含每个洲的国家/地区数量,洲际水平,各个国家/地区的平均数,标准差和总人口信息。 数据帧由上述表格的数据组成。 我的问题是,在将15个国家/地区映射到各自的大陆后,我似乎无法在大陆级别汇总数据。 我必须使用预定义的词典来解决此任务。 您能帮我吗? 请在下面找到我的代码:
def answer_eleven():
import numpy as np
import pandas as pd
Top15 = answer_one()
Top15['Country Name'] = Top15.index
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
Top15['Continent'] = pd.Series(ContinentDict)
#Top15['size'] = Top15['Country'].count()
Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
#columns_to_keep = ['Continent', 'Population']
#Top15 = Top15[columns_to_keep]
#Top15 = Top15.set_index('Continent').groupby(level=0)['Population'].agg({'sum': np.sum})
Top15.set_index(['Continent'], inplace = True)
Top15['size'] = Top15.groupby(['Continent'])['Country Name'].count()
Top15['sum'] = Top15.groupby(['Continent'])['Population'].sum()
Top15['mean'] = Top15.groupby(['Continent'])['Population'].mean()
Top15['std'] = Top15.groupby(['Continent'])['Population'].std()
columns_to_keep = ['size', 'sum', 'mean', 'std']
Top15 = Top15[columns_to_keep]
#Top15['Continent Name'] = Top15.index
#Top15.groupby(['Continent'], level = 0, sort = True)['size'].count()
return Top15.iloc[:5]
answer_eleven()
我相信您需要agg
来汇总字典:
def answer_eleven():
Top15 = answer_one()
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
Top15 = Top15.groupby(ContinentDict)['Population'].agg(['size','sum','mean','std'])
return Top15
df = answer_eleven()
print (df)
sum mean std size
Country Name
Asia 2.771785e+09 9.239284e+08 6.913019e+08 3
Australia 2.331602e+07 2.331602e+07 NaN 1
Europe 4.579297e+08 7.632161e+07 3.464767e+07 6
North America 3.528552e+08 1.764276e+08 1.996696e+08 2
South America 2.059153e+08 2.059153e+08 NaN 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.