简体   繁体   中英

python pandas multi-indexed dataframe selection

Altough I found multiple questions on the topic, I could not find a solution for this one in particular.

I am playing around with this CSV file, which contais a subselection of TBC dat from the WHO: http://dign.eu/temp/tbc.csv

import pandas as pd
df = pd.read_csv('tbc.csv', index_col=['country', 'year'])

This gives a nicely formatted DataFrame, sorted on country and year, showing one of the parameters.

Now, for this case I would like the mean value of "param" for each country over all avaiable years. Using df.mean() gives me an overall value, and df.mean(axis=1) removes all indices which makes the results useless.

Obviously I can do this using a loop, but I guess there is a smarter way. But how?

If I understand you correctly you want to pass the level to the mean function:

In [182]:

df.mean(level='country')
Out[182]:
                                                             param
country                                                           
Afghanistan                                           8391.312500 
Albania                                               183.888889  
Algeria                                               8024.588235 
American Samoa                                        1.500000    
....
West Bank and Gaza Strip                              12.538462   
Yemen                                                 4029.166667 
Zambia                                                13759.266667
Zimbabwe                                              12889.666667

[219 rows x 1 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM