[英]Migrating a logistic regression from R to rpy2
I'm trying to use ryp2 to do a logistic regression. 我正在尝试使用ryp2进行逻辑回归。 I managed to execute it, but don't know how to extract the coefficients and p-values from the result.
我设法执行它,但不知道如何从结果中提取系数和p值。 I don't want to print the values on the screen bu create a function to use them independently.
我不想在屏幕上打印值,创建一个独立使用它们的功能。
import rpy2.robjects as ro
mydata = ro.r['data.frame']
read = ro.r['read.csv']
head = ro.r['head']
summary = ro.r['summary']
mydata = read("http://www.ats.ucla.edu/stat/data/binary.csv")
#cabecalho = head(mydata)
formula = 'admit ~ gre + gpa + rank'
mylogit = ro.r.glm(formula=ro.r(formula), data=mydata,family=ro.r('binomial(link="logit")'))
#What NEXT?
I don't known how you can get the p-values, but for any others it should be something like this: 我不知道你如何获得p值,但对于任何其他人,它应该是这样的:
In [24]:
#what is stored in mylogit?
mylogit.names
Out[24]:
<StrVector - Python:0x10a01a0e0 / R:0x10353ab20>
['coef..., 'resi..., 'fitt..., ..., 'meth..., 'cont..., 'xlev...]
In [25]:
#looks like the first item is the coefficients
mylogit.names[0]
Out[25]:
'coefficients'
In [26]:
#OK, let's get the the coefficients.
mylogit[0]
Out[26]:
<FloatVector - Python:0x10a01a5f0 / R:0x1028bcc80>
[-3.449548, 0.002294, 0.777014, -0.560031]
In [27]:
#be careful that the index from print is R index, starting with 1. I don't see p values here
print mylogit.names
[1] "coefficients" "residuals" "fitted.values"
[4] "effects" "R" "rank"
[7] "qr" "family" "linear.predictors"
[10] "deviance" "aic" "null.deviance"
[13] "iter" "weights" "prior.weights"
[16] "df.residual" "df.null" "y"
[19] "converged" "boundary" "model"
[22] "call" "formula" "terms"
[25] "data" "offset" "control"
[28] "method" "contrasts" "xlevels"
The P values for each terms: 每个术语的P值:
In [55]:
#p values:
list(summary(mylogit)[-6])[-4:]
Out[55]:
[0.0023265825120094407,
0.03564051883525258,
0.017659683902155117,
1.0581094283250368e-05]
And: 和:
In [56]:
#coefficients
list(summary(mylogit)[-6])[:4]
Out[56]:
[-3.449548397668471,
0.0022939595044433334,
0.7770135737198545,
-0.5600313868499897]
In [57]:
#S.E.
list(summary(mylogit)[-6])[4:8]
Out[57]:
[1.1328460085495897,
0.001091839095422917,
0.327483878497867,
0.12713698917130048]
In [58]:
#Z value
list(summary(mylogit)[-6])[8:12]
Out[58]:
[-3.0450285137032984,
2.1010050968680347,
2.3726773277632214,
-4.4049445444662885]
Or more generally: 或者更一般地说:
In [60]:
import numpy as np
In [62]:
COEF=np.array(summary(mylogit)[-6]) #it has a shape of (number_of_terms, 4)
In [63]:
COEF[:, -1] #p-value
Out[63]:
array([ 2.32658251e-03, 3.56405188e-02, 1.76596839e-02,
1.05810943e-05])
In [66]:
COEF[:, 0] #coefficients
Out[66]:
array([ -3.44954840e+00, 2.29395950e-03, 7.77013574e-01,
-5.60031387e-01])
In [68]:
COEF[:, 1] #S.E.
Out[68]:
array([ 1.13284601e+00, 1.09183910e-03, 3.27483878e-01,
1.27136989e-01])
In [69]:
COEF[:, 2] #Z
Out[69]:
array([-3.04502851, 2.1010051 , 2.37267733, -4.40494454])
You can also summary(mylogit).rx2('coefficient')
(or rx
), if you know that coefficient
is in the summary vector. 如果您知道
coefficient
在摘要向量中,您还可以summary(mylogit).rx2('coefficient')
(或rx
)。
This isn't quite an answer to what you asked, but if your question is more generally "how to move a logistic regression to Python", why not use statsmodels? 这不是你问的答案,但如果你的问题更普遍是“如何将逻辑回归转移到Python”,为什么不使用statsmodels?
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
model = smf.glm('admit ~ gre + gpa + rank', df, family=sm.families.Binomial()).fit()
print model.summary()
This prints: 这打印:
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: admit No. Observations: 400
Model: GLM Df Residuals: 396
Model Family: Binomial Df Model: 3
Link Function: logit Scale: 1.0
Method: IRLS Log-Likelihood: -229.72
Date: Sat, 29 Mar 2014 Deviance: 459.44
Time: 11:56:19 Pearson chi2: 399.
No. Iterations: 5
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept -3.4495 1.133 -3.045 0.002 -5.670 -1.229
gre 0.0023 0.001 2.101 0.036 0.000 0.004
gpa 0.7770 0.327 2.373 0.018 0.135 1.419
rank -0.5600 0.127 -4.405 0.000 -0.809 -0.311
==============================================================================
While there are still some statistical procedures that only have a good implementation in R, for straightforward things like linear models, it's probably a lot easier to use statsmodels than to fight with RPy2, since all of the introspection, built-in documentation, tab completion (in IPython), etc. will work directly on statsmodels objects. 虽然仍然有一些统计程序只在R中有一个很好的实现,但对于线性模型等简单的事情,使用statsmodel可能比使用RPy2更容易,因为所有的内省,内置文档,选项卡完成(在IPython中)等将直接在statsmodels对象上工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.