I am trying to do a curve fit for every group and get the resluts for c,a,b for every group.
I tried it this way:
x=df.T.iloc[1]
y=df.T.iloc[2]
def logifunc(x,c,a,b):
return c / (1 + (a) * np.exp(-b*(x)))
df.groupby('Seriennummer').apply(curve_fit(logifunc, x, y, p0=[110,400,-2]))
But I get the Error:
TypeError: unhashable type: 'numpy.ndarray'
This is a part of my df with one million rows:
Seriennummer mrwSmpVWi mrwSmpP
1915 701091.0 1.8 4.0
1916 701085.0 2.0 2.0
1917 701089.0 1.7 0.0
1918 701087.0 1.8 3.0
1919 701090.0 1.8 0.0
1920 701088.0 2.4 0.0
1921 701086.0 2.7 5.0
1922 701092.0 1.1 0.0
1923 701085.0 2.0 2.0
1924 701089.0 2.0 10.0
1925 701091.0 0.8 0.0
1926 701087.0 2.3 10.0
1927 701090.0 1.6 1.0
1928 701092.0 2.2 6.0
1929 701086.0 1.5 0.0
1930 701088.0 2.1 3.0
A weird point in your code is that:
To get proper result, you should perform curve fitting to the current group only. Something like:
import scipy.optimize as opt
result = df.groupby('Seriennummer').apply(lambda grp:
opt.curve_fit(logifunc, grp.mrwSmpVWi, grp.mrwSmpP, p0=[110, 400, -2]))
My lambda function is something like a wrapper mentioned in the other answer and other parameters are hard-coded in this function.
As your data sample includes only 2 rows for each group, I prepared my own DataFrame:
Seriennummer mrwSmpVWi mrwSmpP
1915 701091.0 1.8 4.0
1916 701091.0 1.6 3.4
1917 701091.0 1.4 3.0
1918 701091.0 1.0 1.5
1919 701091.0 0.8 0.0
1920 701085.0 2.0 2.0
1921 701085.0 2.5 3.0
1922 701085.0 3.0 3.5
1923 701085.0 3.6 4.2
and ran the above code, with no error.
To print results in an easy to assess way, I ran:
for k, v in result.iteritems():
print(f'Group {k:}:\n{v[0]}\n{v[1]}')
getting:
Group 701085.0:
[ 4.66854588 24.45419288 1.47315989]
[[ 3.43664761e-01 -1.05587500e+01 -2.65359878e-01]
[-1.05587500e+01 4.60108288e+02 1.03214386e+01]
[-2.65359878e-01 1.03214386e+01 2.40785819e-01]]
Group 701091.0:
[ 3.89988734 617.72482118 5.54935645]
[[ 3.42006760e-01 -6.02519226e+02 -1.11651569e+00]
[-6.02519226e+02 2.43770095e+06 3.83083902e+03]
[-1.11651569e+00 3.83083902e+03 6.28930797e+00]]
First repeat the above procedure on my data, then on your own.
Read the documentation of scipy.optimize.curve_fit . The description of the result (of each call) contains:
If you want only popt for each group and don't care about pcov2 , then the lambda function should return only the first element from its (2-element) result:
result = df.groupby('Seriennummer').apply(lambda grp: opt.curve_fit(
logifunc, grp.mrwSmpVWi, grp.mrwSmpP, p0=[110, 400, -2])[0])
(note [0]
added at the end).
A few notes:
Notice that the parameter your are passing to pandas GroupBy
object is actually the result of invoking curve_fit
function which returns an ndarray
. The first argument of GroupBy.apply
is a callable that needs to return a pandas object (DataFrame, Series of scalar), that is the reason you are getting that error.
I am not sure exactly of what you are trying to do but I assume that it's making a curve fit for every group based on the function you have written.
If that is the case I suggest you to wrap that functionality in another function and pass it to the apply method.
def wrapper(df-of-group-by, *args):
# somehow work with your given DataFrame to achieve what you are looking for
# you can also print what-ever and export images
# the important thing is that you return a DataFrame back
# usage:
ohlala.groupby('Seriennummer').apply(wrapper, YOUR-ARGS)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.