[英]Groupby function in pandas dataframe of Python does not seem to work
我有一張表格,其中包含15個國家/地區的各種信息(例如,能源供應,可再生能源供應的比例)。 我必須創建一個數據框,其中包含每個洲的國家/地區數量,洲際水平,各個國家/地區的平均數,標准差和總人口信息。 數據幀由上述表格的數據組成。 我的問題是,在將15個國家/地區映射到各自的大陸后,我似乎無法在大陸級別匯總數據。 我必須使用預定義的詞典來解決此任務。 您能幫我嗎? 請在下面找到我的代碼:
def answer_eleven():
import numpy as np
import pandas as pd
Top15 = answer_one()
Top15['Country Name'] = Top15.index
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
Top15['Continent'] = pd.Series(ContinentDict)
#Top15['size'] = Top15['Country'].count()
Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
#columns_to_keep = ['Continent', 'Population']
#Top15 = Top15[columns_to_keep]
#Top15 = Top15.set_index('Continent').groupby(level=0)['Population'].agg({'sum': np.sum})
Top15.set_index(['Continent'], inplace = True)
Top15['size'] = Top15.groupby(['Continent'])['Country Name'].count()
Top15['sum'] = Top15.groupby(['Continent'])['Population'].sum()
Top15['mean'] = Top15.groupby(['Continent'])['Population'].mean()
Top15['std'] = Top15.groupby(['Continent'])['Population'].std()
columns_to_keep = ['size', 'sum', 'mean', 'std']
Top15 = Top15[columns_to_keep]
#Top15['Continent Name'] = Top15.index
#Top15.groupby(['Continent'], level = 0, sort = True)['size'].count()
return Top15.iloc[:5]
answer_eleven()
我相信您需要agg
來匯總字典:
def answer_eleven():
Top15 = answer_one()
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
Top15 = Top15.groupby(ContinentDict)['Population'].agg(['size','sum','mean','std'])
return Top15
df = answer_eleven()
print (df)
sum mean std size
Country Name
Asia 2.771785e+09 9.239284e+08 6.913019e+08 3
Australia 2.331602e+07 2.331602e+07 NaN 1
Europe 4.579297e+08 7.632161e+07 3.464767e+07 6
North America 3.528552e+08 1.764276e+08 1.996696e+08 2
South America 2.059153e+08 2.059153e+08 NaN 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.