[英]Linear Regression based on Groupby
I have a df like this: 我有这样的df:
Allotment Year NDVI A_Annex Bachelor
A_Annex 1984 1.0 0.40 0.60
A_Annex 1984 1.5 0.56 0.89
A_Annex 1984 2.0 0.78 0.76
A_Annex 1985 3.4 0.89 0.54
A_Annex 1985 1.6 0.98 0.66
A_Annex 1986 2.5 1.10 0.44
A_Annex 1986 1.7 0.87 0.65
Bachelor 1984 8.9 0.40 0.60
Bachelor 1984 6.5 0.56 0.89
Bachelor 1984 4.2 0.78 0.76
Bachelor 1985 2.4 0.89 0.54
Bachelor 1985 1.7 0.98 0.66
Bachelor 1986 8.9 1.10 0.44
Bachelor 1986 9.6 0.87 0.65
and I want to run a regression based on a groupby. 我想基于groupby进行回归。 I want to regress each unique Allotment
and its NDVI
value with its associated column. 我想将每个唯一Allotment
及其NDVI
值与其关联的列进行回归。 So I want to regress the column A_Annex
with the Allotment
A_Annex
and its associated NDVI
. 所以我想退步列A_Annex
与Allotment
A_Annex
及其相关NDVI
。 And then I want to do the same thing but with Bachelor
. 然后我想和Bachelor
一起做同样的事情。 Essentially I want to match the columns with the associated Allotment
and then regress the values in the column with the corresponding NDVI
values. 本质上,我想将列与关联的Allotment
进行匹配,然后将列中的值与相应的NDVI
值进行回归。
I could do this for one Allotment like this: 我可以这样分配:
stat=merge.groupby(['Allotment']).apply(lambda x: sp.stats.linregress(x['A_Annex'], x['NDVI']))
but I would need to continue to change the x value in sp.stats.linregress(x['A_Annex'], x['NDVI']))
and I would like to avoid that. 但我需要继续更改sp.stats.linregress(x['A_Annex'], x['NDVI']))
的x值,我想避免这种情况。
Are you after something like this? 你是在追求这样的东西吗?
r = {annex: pd.ols(x=group['A_Annex'], y=group['NDVI'])
for annex, group in df.groupby('Allotment')}
>>> r
{'A_Annex':
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 7
Number of Degrees of Freedom: 2
R-squared: 0.3774
Adj R-squared: 0.2529
Rmse: 0.6785
F-stat (1, 5): 3.0307, p-value: 0.1422
Degrees of Freedom: model 1, resid 5
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x 1.9871 1.1415 1.74 0.1422 -0.2501 4.2244
intercept 0.3731 0.9454 0.39 0.7094 -1.4798 2.2260
---------------------------------End of Summary---------------------------------,
'Bachelor':
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 7
Number of Degrees of Freedom: 2
R-squared: 0.0650
Adj R-squared: -0.1220
Rmse: 3.4787
F-stat (1, 5): 0.3478, p-value: 0.5810
Degrees of Freedom: model 1, resid 5
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x -3.4511 5.8522 -0.59 0.5810 -14.9213 8.0191
intercept 8.7796 4.8467 1.81 0.1298 -0.7200 18.2792
---------------------------------End of Summary---------------------------------}
You can then extract the model parameters as follows: 然后可以按以下方式提取模型参数:
>>> {k: r[k].sm_ols.params for k in r}
{'A_Annex': array([ 1.9871432 , 0.37310585]),
'Bachelor': array([-3.45111992, 8.77960702])}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.