简体   繁体   中英

Groupby function in pandas dataframe of Python does not seem to work

I have a table with various information (eg energy supply, proportion of renewable energy supply) to 15 countries. I have to create a dataframe with information on continent level to the number of countries on each continent and the mean, standard deviation and sum of the population of the respective countries on those continents. The dataframe consists of the data of the table mentioned above. My problem is that I can't seem to aggregate the data on continent level after mapping the 15 countries to their respective continent. I have to use a predefined dictionary to solve this task. Could you please help me in this? Please find my Code below:

def answer_eleven():

import numpy as np
import pandas as pd

Top15 = answer_one()
Top15['Country Name'] = Top15.index

ContinentDict  = {'China':'Asia', 
                  'United States':'North America', 
                  'Japan':'Asia', 
                  'United Kingdom':'Europe', 
                  'Russian Federation':'Europe', 
                  'Canada':'North America', 
                  'Germany':'Europe', 
                  'India':'Asia',
                  'France':'Europe', 
                  'South Korea':'Asia', 
                  'Italy':'Europe', 
                  'Spain':'Europe', 
                  'Iran':'Asia',
                  'Australia':'Australia', 
                  'Brazil':'South America'}

Top15['Continent'] = pd.Series(ContinentDict)
#Top15['size'] = Top15['Country'].count()
Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
#columns_to_keep = ['Continent', 'Population']
#Top15 = Top15[columns_to_keep]
#Top15 = Top15.set_index('Continent').groupby(level=0)['Population'].agg({'sum': np.sum})
Top15.set_index(['Continent'], inplace = True)
Top15['size'] = Top15.groupby(['Continent'])['Country Name'].count()
Top15['sum'] = Top15.groupby(['Continent'])['Population'].sum()
Top15['mean'] = Top15.groupby(['Continent'])['Population'].mean()
Top15['std'] = Top15.groupby(['Continent'])['Population'].std()
columns_to_keep = ['size', 'sum', 'mean', 'std']
Top15 = Top15[columns_to_keep]
#Top15['Continent Name'] = Top15.index
#Top15.groupby(['Continent'], level = 0, sort = True)['size'].count()

return Top15.iloc[:5]
answer_eleven()

I believe you need agg for aggregate by dictionary:

def answer_eleven():

    Top15 = answer_one()
    ContinentDict  = {'China':'Asia',
                      'United States':'North America',
                      'Japan':'Asia',
                      'United Kingdom':'Europe',
                      'Russian Federation':'Europe',
                      'Canada':'North America',
                      'Germany':'Europe',
                      'India':'Asia',
                      'France':'Europe',
                      'South Korea':'Asia',
                      'Italy':'Europe',
                      'Spain':'Europe',
                      'Iran':'Asia',
                      'Australia':'Australia',
                      'Brazil':'South America'}

    Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
    Top15 = Top15.groupby(ContinentDict)['Population'].agg(['size','sum','mean','std'])
    return Top15

df = answer_eleven()
print (df)

                        sum          mean           std  size
Country Name                                                 
Asia           2.771785e+09  9.239284e+08  6.913019e+08     3
Australia      2.331602e+07  2.331602e+07           NaN     1
Europe         4.579297e+08  7.632161e+07  3.464767e+07     6
North America  3.528552e+08  1.764276e+08  1.996696e+08     2
South America  2.059153e+08  2.059153e+08           NaN     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM