将逻辑回归从R迁移到rpy2

Question

I'm trying to use ryp2 to do a logistic regression. 我正在尝试使用ryp2进行逻辑回归。 I managed to execute it, but don't know how to extract the coefficients and p-values from the result. 我设法执行它，但不知道如何从结果中提取系数和p值。 I don't want to print the values on the screen bu create a function to use them independently. 我不想在屏幕上打印值，创建一个独立使用它们的功能。

import rpy2.robjects as ro
mydata = ro.r['data.frame']
read = ro.r['read.csv']
head = ro.r['head']
summary = ro.r['summary']

mydata = read("http://www.ats.ucla.edu/stat/data/binary.csv")
#cabecalho = head(mydata)
formula = 'admit ~ gre + gpa + rank'
mylogit = ro.r.glm(formula=ro.r(formula), data=mydata,family=ro.r('binomial(link="logit")'))
#What NEXT?

Answer 1

I don't known how you can get the p-values, but for any others it should be something like this: 我不知道你如何获得p值，但对于任何其他人，它应该是这样的：

In [24]:
#what is stored in mylogit?
mylogit.names
Out[24]:
<StrVector - Python:0x10a01a0e0 / R:0x10353ab20>

['coef..., 'resi..., 'fitt..., ..., 'meth..., 'cont..., 'xlev...]
In [25]:
#looks like the first item is the coefficients
mylogit.names[0]
Out[25]:
'coefficients'
In [26]:
#OK, let's get the the coefficients.
mylogit[0]
Out[26]:
<FloatVector - Python:0x10a01a5f0 / R:0x1028bcc80>
[-3.449548, 0.002294, 0.777014, -0.560031]
In [27]:
#be careful that the index from print is R index, starting with 1. I don't see p values here
print mylogit.names
 [1] "coefficients"      "residuals"         "fitted.values"    
 [4] "effects"           "R"                 "rank"             
 [7] "qr"                "family"            "linear.predictors"
[10] "deviance"          "aic"               "null.deviance"    
[13] "iter"              "weights"           "prior.weights"    
[16] "df.residual"       "df.null"           "y"                
[19] "converged"         "boundary"          "model"            
[22] "call"              "formula"           "terms"            
[25] "data"              "offset"            "control"          
[28] "method"            "contrasts"         "xlevels"

Edit 编辑

The P values for each terms: 每个术语的P值：

In [55]:
#p values:
list(summary(mylogit)[-6])[-4:]
Out[55]:
[0.0023265825120094407,
 0.03564051883525258,
 0.017659683902155117,
 1.0581094283250368e-05]

And: 和：

In [56]:
#coefficients 
list(summary(mylogit)[-6])[:4]
Out[56]:
[-3.449548397668471,
 0.0022939595044433334,
 0.7770135737198545,
 -0.5600313868499897]
In [57]:
#S.E.
list(summary(mylogit)[-6])[4:8]
Out[57]:
[1.1328460085495897,
 0.001091839095422917,
 0.327483878497867,
 0.12713698917130048]
In [58]:
#Z value
list(summary(mylogit)[-6])[8:12]
Out[58]:
[-3.0450285137032984,
 2.1010050968680347,
 2.3726773277632214,
 -4.4049445444662885]

Or more generally: 或者更一般地说：

In [60]:

import numpy as np
In [62]:

COEF=np.array(summary(mylogit)[-6]) #it has a shape of (number_of_terms, 4)
In [63]:

COEF[:, -1] #p-value
Out[63]:
array([  2.32658251e-03,   3.56405188e-02,   1.76596839e-02,
         1.05810943e-05])
In [66]:

COEF[:, 0] #coefficients
Out[66]:
array([ -3.44954840e+00,   2.29395950e-03,   7.77013574e-01,
        -5.60031387e-01])
In [68]:

COEF[:, 1] #S.E.
Out[68]:
array([  1.13284601e+00,   1.09183910e-03,   3.27483878e-01,
         1.27136989e-01])
In [69]:

COEF[:, 2] #Z
Out[69]:
array([-3.04502851,  2.1010051 ,  2.37267733, -4.40494454])

You can also summary(mylogit).rx2('coefficient') (or rx ), if you know that coefficient is in the summary vector. 如果您知道coefficient在摘要向量中，您还可以summary(mylogit).rx2('coefficient') （或rx ）。

Answer 2

This isn't quite an answer to what you asked, but if your question is more generally "how to move a logistic regression to Python", why not use statsmodels? 这不是你问的答案，但如果你的问题更普遍是“如何将逻辑回归转移到Python”，为什么不使用statsmodels？

import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
model = smf.glm('admit ~ gre + gpa + rank', df, family=sm.families.Binomial()).fit()
print model.summary()

This prints: 这打印：

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  admit   No. Observations:                  400
Model:                            GLM   Df Residuals:                      396
Model Family:                Binomial   Df Model:                            3
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -229.72
Date:                Sat, 29 Mar 2014   Deviance:                       459.44
Time:                        11:56:19   Pearson chi2:                     399.
No. Iterations:                     5                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     -3.4495      1.133     -3.045      0.002        -5.670    -1.229
gre            0.0023      0.001      2.101      0.036         0.000     0.004
gpa            0.7770      0.327      2.373      0.018         0.135     1.419
rank          -0.5600      0.127     -4.405      0.000        -0.809    -0.311
==============================================================================

While there are still some statistical procedures that only have a good implementation in R, for straightforward things like linear models, it's probably a lot easier to use statsmodels than to fight with RPy2, since all of the introspection, built-in documentation, tab completion (in IPython), etc. will work directly on statsmodels objects. 虽然仍然有一些统计程序只在R中有一个很好的实现，但对于线性模型等简单的事情，使用statsmodel可能比使用RPy2更容易，因为所有的内省，内置文档，选项卡完成（在IPython中）等将直接在statsmodels对象上工作。

将逻辑回归从R迁移到rpy2

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-03-27 16:24:36

Edit 编辑

解决方案2
2 2014-03-29 18:59:27

将逻辑回归从R迁移到rpy2

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-03-27 16:24:36

Edit 编辑

解决方案2 2 2014-03-29 18:59:27

解决方案1
2 已采纳 2014-03-27 16:24:36

解决方案2
2 2014-03-29 18:59:27