I am trying to use rpy2 to let me use some r functionality in python. Here is a simple regression I want to do. I create a data frame, convert it to R data frame and then try using R's lm. But the R data frame cannot be found (see below). Where should I look to troubleshoot?
FYI I am using python 2.7.3, rpy2-2.3.2, pandas version '0.10.1' and R2.15.3
>>> import rpy2
>>> import pandas as pd
>>> import pandas.rpy.common as com
>>> datframe = pd.DataFrame({'a' : [1, 2, 3], 'b' : [3, 4, 5]})
>>> r_df = com.convert_to_r_dataframe(datframe)
>>> r_df
(DataFrame - Python:0x32547e8 / R:0x345d640)
[IntVector, IntVector]
a: (class 'rpy2.robjects.vectors.IntVector')
(IntVector - Python:0x3254e18 / R:0x345d608)
[ 1, 2, 3]
b: (class 'rpy2.robjects.vectors.IntVector')
(IntVector - Python:0x3254e60 / R:0x345d5d0)
[ 3, 4, 5]
>>> print type(r_df)
(class 'rpy2.robjects.vectors.DataFrame')
>>> from rpy2.robjects import r
>>> r('lmout <- lm(r_df$a ~ r_df$b)')
Error in eval(expr, envir, enclos) : object 'r_df' not found
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
r('lmout <- lm(r_df$a ~ r_df$b)')
File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/__init__.py", line 236, in __call__
res = self.eval(p)
File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 86, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 35, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
RRuntimeError: Error in eval(expr, envir, enclos) : object 'r_df' not found
When calling
r('lmout <- lm(r_df$a ~ r_df$b)')
the embedded R will look for a variable r_df
, and no such variable is made visible to R in your code example.
When doing
r_df = com.convert_to_r_dataframe(datframe)
you are creating the variable r_df
on the Python side but while the actual data in now in R, there is no symbol (name) associated with it known to R. That data structure remains anonymous. (btw, you may want to use the automagic conversion of pandas data frames shipping with rpy2-2.3.3).
To create a variable name in R's "global environment", add this:
from rpy2.robjects import globalenv
globalenv['r_df'] = r_df
Now your lm()
call should work.
try this, (not sure which header do the magic, though....)
import rpy2.robjects as robjects
from rpy2.robjects import DataFrame, Formula
import rpy2.robjects.numpy2ri as npr
import numpy as np
from rpy2.robjects.packages import importr
def my_linear_fit_using_r(X,Y,verbose=True):
# ## FITTINGS: RPy implementation ###
r_correlation = robjects.r('function(x,y) cor.test(x,y)')
# r_quadfit = robjects.r('function(x,y) lm(y~I(x)+I(x^2))')
r_linfit = robjects.r('function(x,y) lm(y~x)')
r_get_r2=robjects.r('function(x) summary(x)$r.squared')
lin=r_linfit(robjects.FloatVector(X),robjects.FloatVector(Y))
coef_lin=robjects.r.coef(lin)
a=coef_lin[0]
b=coef_lin[1]
r2=r_get_r2(lin)
ci=robjects.r.confint(lin) # confidence intervals
lwr_a=ci[0]
lwr_b=ci[1]
upr_a=ci[2]
upr_b=ci[3]
if verbose:
print robjects.r.summary(lin)
# print robjects.r.summary(quad)
return (a,b,r2[0],lwr_a,upr_a,lwr_b,upr_b)
Just a remark, for simple regressions you can do it completely in Python, use ols
from statsmodels
:
from statsmodels.formula.api import ols
lmout = ols('a ~ b', datframe).fit()
lmout.summary()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.