I'm trying to use ryp2 to do a logistic regression. I managed to execute it, but don't know how to extract the coefficients and p-values from the result. I don't want to print the values on the screen bu create a function to use them independently.
import rpy2.robjects as ro
mydata = ro.r['data.frame']
read = ro.r['read.csv']
head = ro.r['head']
summary = ro.r['summary']
mydata = read("http://www.ats.ucla.edu/stat/data/binary.csv")
#cabecalho = head(mydata)
formula = 'admit ~ gre + gpa + rank'
mylogit = ro.r.glm(formula=ro.r(formula), data=mydata,family=ro.r('binomial(link="logit")'))
#What NEXT?
I don't known how you can get the p-values, but for any others it should be something like this:
In [24]:
#what is stored in mylogit?
mylogit.names
Out[24]:
<StrVector - Python:0x10a01a0e0 / R:0x10353ab20>
['coef..., 'resi..., 'fitt..., ..., 'meth..., 'cont..., 'xlev...]
In [25]:
#looks like the first item is the coefficients
mylogit.names[0]
Out[25]:
'coefficients'
In [26]:
#OK, let's get the the coefficients.
mylogit[0]
Out[26]:
<FloatVector - Python:0x10a01a5f0 / R:0x1028bcc80>
[-3.449548, 0.002294, 0.777014, -0.560031]
In [27]:
#be careful that the index from print is R index, starting with 1. I don't see p values here
print mylogit.names
[1] "coefficients" "residuals" "fitted.values"
[4] "effects" "R" "rank"
[7] "qr" "family" "linear.predictors"
[10] "deviance" "aic" "null.deviance"
[13] "iter" "weights" "prior.weights"
[16] "df.residual" "df.null" "y"
[19] "converged" "boundary" "model"
[22] "call" "formula" "terms"
[25] "data" "offset" "control"
[28] "method" "contrasts" "xlevels"
The P values for each terms:
In [55]:
#p values:
list(summary(mylogit)[-6])[-4:]
Out[55]:
[0.0023265825120094407,
0.03564051883525258,
0.017659683902155117,
1.0581094283250368e-05]
And:
In [56]:
#coefficients
list(summary(mylogit)[-6])[:4]
Out[56]:
[-3.449548397668471,
0.0022939595044433334,
0.7770135737198545,
-0.5600313868499897]
In [57]:
#S.E.
list(summary(mylogit)[-6])[4:8]
Out[57]:
[1.1328460085495897,
0.001091839095422917,
0.327483878497867,
0.12713698917130048]
In [58]:
#Z value
list(summary(mylogit)[-6])[8:12]
Out[58]:
[-3.0450285137032984,
2.1010050968680347,
2.3726773277632214,
-4.4049445444662885]
Or more generally:
In [60]:
import numpy as np
In [62]:
COEF=np.array(summary(mylogit)[-6]) #it has a shape of (number_of_terms, 4)
In [63]:
COEF[:, -1] #p-value
Out[63]:
array([ 2.32658251e-03, 3.56405188e-02, 1.76596839e-02,
1.05810943e-05])
In [66]:
COEF[:, 0] #coefficients
Out[66]:
array([ -3.44954840e+00, 2.29395950e-03, 7.77013574e-01,
-5.60031387e-01])
In [68]:
COEF[:, 1] #S.E.
Out[68]:
array([ 1.13284601e+00, 1.09183910e-03, 3.27483878e-01,
1.27136989e-01])
In [69]:
COEF[:, 2] #Z
Out[69]:
array([-3.04502851, 2.1010051 , 2.37267733, -4.40494454])
You can also summary(mylogit).rx2('coefficient')
(or rx
), if you know that coefficient
is in the summary vector.
This isn't quite an answer to what you asked, but if your question is more generally "how to move a logistic regression to Python", why not use statsmodels?
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
model = smf.glm('admit ~ gre + gpa + rank', df, family=sm.families.Binomial()).fit()
print model.summary()
This prints:
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: admit No. Observations: 400
Model: GLM Df Residuals: 396
Model Family: Binomial Df Model: 3
Link Function: logit Scale: 1.0
Method: IRLS Log-Likelihood: -229.72
Date: Sat, 29 Mar 2014 Deviance: 459.44
Time: 11:56:19 Pearson chi2: 399.
No. Iterations: 5
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept -3.4495 1.133 -3.045 0.002 -5.670 -1.229
gre 0.0023 0.001 2.101 0.036 0.000 0.004
gpa 0.7770 0.327 2.373 0.018 0.135 1.419
rank -0.5600 0.127 -4.405 0.000 -0.809 -0.311
==============================================================================
While there are still some statistical procedures that only have a good implementation in R, for straightforward things like linear models, it's probably a lot easier to use statsmodels than to fight with RPy2, since all of the introspection, built-in documentation, tab completion (in IPython), etc. will work directly on statsmodels objects.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.