简体   繁体   中英

Increase limit of value when infinity is reached in Pandas

Data Structure:

HEIGHT Category
   51        1
   45        1
   89        2

Objective: Calculate Geometric Mean

import pandas as pd
import numpy as np
df = pd.read_csv('BaseFish',delimiter=',')
df.dropna(axis = 0)
df = df[df.HEIGHT != 0]
table = pd.pivot_table(df,values = 'HEIGHT',index = 'Category',aggfunc=(np.prod,np.count_nonzero))
table.insert(2,'GMEAN',0)
table['GMEAN']=table['prod']**(1/table['count_nonzero'])

Problem: Categories with a large number of data point produces np.prod = infinity. Hence the final GMEAN is also infinity.

My python knowledge is very basic and the only reason I am using it because the number of data points exceeds excels limit.

There is no need to use a pivot table here. You can group by category and then compute the geometric mean per category.

from scipy.stats import gmean
df.groupby('category').height.apply(gmean)

Or without importing spicy.stats :

gmean = lambda group: group.prod()**(1/len(group))
df.groupby('category').height.apply(gmean)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM