简体   繁体   中英

Python(Numpy)- df apply error - IndexError: tuple index out of range

from pandas import DataFrame,Series
import numpy

def avg_bronze_medal():
    countries=['Russian Fed','Norway','Canada']
    gold=[13,11,10]
    silver=[11,5,10]
    bronze=[9,10,5]
    medal_counts={'country_name':Series(countries),'gold':Series(gold),'silver':Series(silver),'bronze':Series(bronze)}
    df=DataFrame(medal_counts)
    print df
    print df['gold'].apply(numpy.mean, axis=1)

Last line is giving error as "IndexError: tuple index out of range". I need to use apply function in data frame and it should get average of columns gold,bronze and silver. In above example, I used only gold column. Please help me in fixing the error.

To get the mean of all three columns at the same time:

df[['gold', 'bronze', 'silver']].mean(axis=1)

But it confuses me as to why you would need the average medals awarded in the tournament... But I guess you need it for some reason!


Some additional notes the OP should be aware of:

.apply is a method that works on rows or columns (default). If you call df.apply(func) the function, func will be applied to all columns, one column at a time. df.apply(func, axis=1) will apply func to all rows, one at a time. In case of pd.Series since there is only one column, .apply always works on rows. .apply is useful if you have a complex custom function that you need to apply to either rows or columns. Some statistical measures, such as sum, mean, standard deviation, are common and have vectorized functions of their own. Therefore one can directly call them, like in the answer above.

Please read the docs linked in the above paragraph for further information.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM