繁体   English   中英

Python的熊猫数据框中的Groupby函数似乎不起作用

[英]Groupby function in pandas dataframe of Python does not seem to work

我有一张表格,其中包含15个国家/地区的各种信息(例如,能源供应,可再生能源供应的比例)。 我必须创建一个数据框,其中包含每个洲的国家/地区数量,洲际水平,各个国家/地区的平均数,标准差和总人口信息。 数据帧由上述表格的数据组成。 我的问题是,在将15个国家/地区映射到各自的大陆后,我似乎无法在大陆级别汇总数据。 我必须使用预定义的词典来解决此任务。 您能帮我吗? 请在下面找到我的代码:

def answer_eleven():

import numpy as np
import pandas as pd

Top15 = answer_one()
Top15['Country Name'] = Top15.index

ContinentDict  = {'China':'Asia', 
                  'United States':'North America', 
                  'Japan':'Asia', 
                  'United Kingdom':'Europe', 
                  'Russian Federation':'Europe', 
                  'Canada':'North America', 
                  'Germany':'Europe', 
                  'India':'Asia',
                  'France':'Europe', 
                  'South Korea':'Asia', 
                  'Italy':'Europe', 
                  'Spain':'Europe', 
                  'Iran':'Asia',
                  'Australia':'Australia', 
                  'Brazil':'South America'}

Top15['Continent'] = pd.Series(ContinentDict)
#Top15['size'] = Top15['Country'].count()
Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
#columns_to_keep = ['Continent', 'Population']
#Top15 = Top15[columns_to_keep]
#Top15 = Top15.set_index('Continent').groupby(level=0)['Population'].agg({'sum': np.sum})
Top15.set_index(['Continent'], inplace = True)
Top15['size'] = Top15.groupby(['Continent'])['Country Name'].count()
Top15['sum'] = Top15.groupby(['Continent'])['Population'].sum()
Top15['mean'] = Top15.groupby(['Continent'])['Population'].mean()
Top15['std'] = Top15.groupby(['Continent'])['Population'].std()
columns_to_keep = ['size', 'sum', 'mean', 'std']
Top15 = Top15[columns_to_keep]
#Top15['Continent Name'] = Top15.index
#Top15.groupby(['Continent'], level = 0, sort = True)['size'].count()

return Top15.iloc[:5]
answer_eleven()

我相信您需要agg来汇总字典:

def answer_eleven():

    Top15 = answer_one()
    ContinentDict  = {'China':'Asia',
                      'United States':'North America',
                      'Japan':'Asia',
                      'United Kingdom':'Europe',
                      'Russian Federation':'Europe',
                      'Canada':'North America',
                      'Germany':'Europe',
                      'India':'Asia',
                      'France':'Europe',
                      'South Korea':'Asia',
                      'Italy':'Europe',
                      'Spain':'Europe',
                      'Iran':'Asia',
                      'Australia':'Australia',
                      'Brazil':'South America'}

    Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
    Top15 = Top15.groupby(ContinentDict)['Population'].agg(['size','sum','mean','std'])
    return Top15

df = answer_eleven()
print (df)

                        sum          mean           std  size
Country Name                                                 
Asia           2.771785e+09  9.239284e+08  6.913019e+08     3
Australia      2.331602e+07  2.331602e+07           NaN     1
Europe         4.579297e+08  7.632161e+07  3.464767e+07     6
North America  3.528552e+08  1.764276e+08  1.996696e+08     2
South America  2.059153e+08  2.059153e+08           NaN     1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM