简体   繁体   English

Python的熊猫数据框中的Groupby函数似乎不起作用

[英]Groupby function in pandas dataframe of Python does not seem to work

I have a table with various information (eg energy supply, proportion of renewable energy supply) to 15 countries. 我有一张表格,其中包含15个国家/地区的各种信息(例如,能源供应,可再生能源供应的比例)。 I have to create a dataframe with information on continent level to the number of countries on each continent and the mean, standard deviation and sum of the population of the respective countries on those continents. 我必须创建一个数据框,其中包含每个洲的国家/地区数量,洲际水平,各个国家/地区的平均数,标准差和总人口信息。 The dataframe consists of the data of the table mentioned above. 数据帧由上述表格的数据组成。 My problem is that I can't seem to aggregate the data on continent level after mapping the 15 countries to their respective continent. 我的问题是,在将15个国家/地区映射到各自的大陆后,我似乎无法在大陆级别汇总数据。 I have to use a predefined dictionary to solve this task. 我必须使用预定义的词典来解决此任务。 Could you please help me in this? 您能帮我吗? Please find my Code below: 请在下面找到我的代码:

def answer_eleven():

import numpy as np
import pandas as pd

Top15 = answer_one()
Top15['Country Name'] = Top15.index

ContinentDict  = {'China':'Asia', 
                  'United States':'North America', 
                  'Japan':'Asia', 
                  'United Kingdom':'Europe', 
                  'Russian Federation':'Europe', 
                  'Canada':'North America', 
                  'Germany':'Europe', 
                  'India':'Asia',
                  'France':'Europe', 
                  'South Korea':'Asia', 
                  'Italy':'Europe', 
                  'Spain':'Europe', 
                  'Iran':'Asia',
                  'Australia':'Australia', 
                  'Brazil':'South America'}

Top15['Continent'] = pd.Series(ContinentDict)
#Top15['size'] = Top15['Country'].count()
Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
#columns_to_keep = ['Continent', 'Population']
#Top15 = Top15[columns_to_keep]
#Top15 = Top15.set_index('Continent').groupby(level=0)['Population'].agg({'sum': np.sum})
Top15.set_index(['Continent'], inplace = True)
Top15['size'] = Top15.groupby(['Continent'])['Country Name'].count()
Top15['sum'] = Top15.groupby(['Continent'])['Population'].sum()
Top15['mean'] = Top15.groupby(['Continent'])['Population'].mean()
Top15['std'] = Top15.groupby(['Continent'])['Population'].std()
columns_to_keep = ['size', 'sum', 'mean', 'std']
Top15 = Top15[columns_to_keep]
#Top15['Continent Name'] = Top15.index
#Top15.groupby(['Continent'], level = 0, sort = True)['size'].count()

return Top15.iloc[:5]
answer_eleven()

I believe you need agg for aggregate by dictionary: 我相信您需要agg来汇总字典:

def answer_eleven():

    Top15 = answer_one()
    ContinentDict  = {'China':'Asia',
                      'United States':'North America',
                      'Japan':'Asia',
                      'United Kingdom':'Europe',
                      'Russian Federation':'Europe',
                      'Canada':'North America',
                      'Germany':'Europe',
                      'India':'Asia',
                      'France':'Europe',
                      'South Korea':'Asia',
                      'Italy':'Europe',
                      'Spain':'Europe',
                      'Iran':'Asia',
                      'Australia':'Australia',
                      'Brazil':'South America'}

    Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
    Top15 = Top15.groupby(ContinentDict)['Population'].agg(['size','sum','mean','std'])
    return Top15

df = answer_eleven()
print (df)

                        sum          mean           std  size
Country Name                                                 
Asia           2.771785e+09  9.239284e+08  6.913019e+08     3
Australia      2.331602e+07  2.331602e+07           NaN     1
Europe         4.579297e+08  7.632161e+07  3.464767e+07     6
North America  3.528552e+08  1.764276e+08  1.996696e+08     2
South America  2.059153e+08  2.059153e+08           NaN     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM