简体   繁体   中英

Getting mean, max, min from pandas dataframe

I have the following dataframe which is the result of performing a standard pandas correlation:

df.corr()

           abc     xyz     jkl
abc        1       0.2    -0.01
xyz       -0.34    1       0.23
jkl        0.5     0.4     1

I have a few things that need to be done with these correlations, however these calculations need to exclude all the cells where the value is 1. The 1 values are the cells where the item has a perfect correlation with itself, therefore I am not interested in it.:

  • Determine the maximum correlation pair. The result is 'jkl' and 'abc' which has a correlation of 0.5

  • Determine the minimum correlation pair. The result is 'abc' and 'xyz' which has a correlation of -0.34

  • Determine the average/mean for the whole dataframe (again this needs to exclude all the values which are 1). The result would be (0.2 + -0.01 + -0.34 + 0.23 + 0.5 + 0.4) / 6 = 0,163333333

Check this:

from numpy import unravel_index,fill_diagonal,nanargmax,nanargmin
from bottleneck import nanmean

a = df(columns=['abc','xyz', 'jkl'])
a.loc['abc'] = [1, 0.2 , -0.01]
a.loc['xyz'] = [-0.34, 1, 0.23]
a.loc['jkl'] = [0.5, 0.4, 1]

b = a.values.copy()

fill_diagonal(b, None)

imax = unravel_index(nanargmax(b), b.shape)
imin = unravel_index(nanargmin(b), b.shape)
print(a.index[imax[0]],a.columns[imax[1]])
print(a.index[imin[0]],a.columns[imin[1]])
print(nanmean(b))

Please don't forget to copy your data, otherwise np.fill_diagonal will erase its diagonal values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM