简体   繁体   中英

Apply Function to every group: TypeError: unhashable type: 'numpy.ndarray'

I am trying to do a curve fit for every group and get the resluts for c,a,b for every group.

I tried it this way:

x=df.T.iloc[1]
y=df.T.iloc[2]

def logifunc(x,c,a,b):
    return c / (1 + (a) * np.exp(-b*(x)))

df.groupby('Seriennummer').apply(curve_fit(logifunc, x, y, p0=[110,400,-2]))

But I get the Error:

TypeError: unhashable type: 'numpy.ndarray'

This is a part of my df with one million rows:

    Seriennummer    mrwSmpVWi   mrwSmpP
1915    701091.0    1.8   4.0
1916    701085.0    2.0   2.0
1917    701089.0    1.7   0.0
1918    701087.0    1.8   3.0
1919    701090.0    1.8   0.0
1920    701088.0    2.4   0.0
1921    701086.0    2.7   5.0
1922    701092.0    1.1   0.0
1923    701085.0    2.0   2.0
1924    701089.0    2.0   10.0
1925    701091.0    0.8   0.0
1926    701087.0    2.3   10.0
1927    701090.0    1.6   1.0
1928    701092.0    2.2   6.0
1929    701086.0    1.5   0.0
1930    701088.0    2.1   3.0

A weird point in your code is that:

  • although you perform grouping by Seriennummer ,
  • then, for each group you attempt to perform curve fitting on data from full your DataFrame.

To get proper result, you should perform curve fitting to the current group only. Something like:

import scipy.optimize as opt

result = df.groupby('Seriennummer').apply(lambda grp:
    opt.curve_fit(logifunc, grp.mrwSmpVWi, grp.mrwSmpP, p0=[110, 400, -2]))

My lambda function is something like a wrapper mentioned in the other answer and other parameters are hard-coded in this function.

As your data sample includes only 2 rows for each group, I prepared my own DataFrame:

      Seriennummer  mrwSmpVWi  mrwSmpP
1915      701091.0        1.8      4.0
1916      701091.0        1.6      3.4
1917      701091.0        1.4      3.0
1918      701091.0        1.0      1.5
1919      701091.0        0.8      0.0
1920      701085.0        2.0      2.0
1921      701085.0        2.5      3.0
1922      701085.0        3.0      3.5
1923      701085.0        3.6      4.2

and ran the above code, with no error.

To print results in an easy to assess way, I ran:

for k, v in result.iteritems():
    print(f'Group {k:}:\n{v[0]}\n{v[1]}')

getting:

Group 701085.0:
[ 4.66854588 24.45419288  1.47315989]
[[ 3.43664761e-01 -1.05587500e+01 -2.65359878e-01]
 [-1.05587500e+01  4.60108288e+02  1.03214386e+01]
 [-2.65359878e-01  1.03214386e+01  2.40785819e-01]]
Group 701091.0:
[  3.89988734 617.72482118   5.54935645]
[[ 3.42006760e-01 -6.02519226e+02 -1.11651569e+00]
 [-6.02519226e+02  2.43770095e+06  3.83083902e+03]
 [-1.11651569e+00  3.83083902e+03  6.28930797e+00]]

First repeat the above procedure on my data, then on your own.

Edit following the comment as of 11:03Z

Read the documentation of scipy.optimize.curve_fit . The description of the result (of each call) contains:

  • popt - Optimal values for the parameters (of the curve fitted),
  • pcov2 - The estimated covariance of popt.

If you want only popt for each group and don't care about pcov2 , then the lambda function should return only the first element from its (2-element) result:

result = df.groupby('Seriennummer').apply(lambda grp: opt.curve_fit(
    logifunc, grp.mrwSmpVWi, grp.mrwSmpP, p0=[110, 400, -2])[0])

(note [0] added at the end).

A few notes:

  1. Notice that the parameter your are passing to pandas GroupBy object is actually the result of invoking curve_fit function which returns an ndarray . The first argument of GroupBy.apply is a callable that needs to return a pandas object (DataFrame, Series of scalar), that is the reason you are getting that error.

  2. I am not sure exactly of what you are trying to do but I assume that it's making a curve fit for every group based on the function you have written.

If that is the case I suggest you to wrap that functionality in another function and pass it to the apply method.

def wrapper(df-of-group-by, *args):
    # somehow work with your given DataFrame to achieve what you are looking for
    # you can also print what-ever and export images
    # the important thing is that you return a DataFrame back

# usage:
ohlala.groupby('Seriennummer').apply(wrapper, YOUR-ARGS)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM