简体   繁体   中英

Error calling a R function from python using rpy2 with survival library

When calling a function in the survival package in R from within python with the rpy2 interface I get the following error:

RRuntimeError: Error in formula[[2]] : subscript out of bounds

Any pointer to solve the issue please?

Thanks

Code:

import pandas as pd
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
R = ro.r
from rpy2.robjects import pandas2ri

pandas2ri.activate()


## install the survival package
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1) # select the first mirror in the list
utils.install_packages(StrVector('survival'))


#Load the library and example data set
survival=importr('survival')
infert = R('infert')

## Linear model works fine
reslm=R.lm('case~spontaneous+induced',data=infert)

#Run the example clogit function, which fails
rescl=R.clogit('case~spontaneous+induced+strata(stratum)',data=infert)

After trying around, I found out, there is a difference, whether you offer the R instance of rpy2 the full R-code string to execute, or not.

Thus, you can make your function run, by giving as much as possible as R code:

#Run the example clogit function, which fails
rescl=R.clogit('case~spontaneous+induced+strata(stratum)',data=infert)

#But give the R code to be executed as one complete string - this works:
rescl=R('clogit(case ~ spontaneous + induced + strata(stratum), data = infert)')

If you capture the return value to a variable within R, you can inspect the data and get out the critical information of the model by the usual functions in R.

Eg

R('rescl.in.R <- clogit(case ~ spontaneous + induced + strata(stratum), data = infert)')

R('str(rescl.in.R)')

# or:
R('coef(rescl.in.R)')
## array([1.98587552, 1.40901163])

R('names(rescl.in.R)') 
## array(['coefficients', 'var', 'loglik', 'score', 'iter',
##        'linear.predictors', 'residuals', 'means', 'method', 'n', 'nevent',
##        'terms', 'assign', 'wald.test', 'y', 'formula', 'xlevels', 'call',
##        'userCall'], dtype='<U17')

It helps a lot - at least in this first phase of using rpy2 (for me, too), to have your r instance open and trying the code in parallel which you do, since the output in R is far more readable and you know and see what you are doing and what you could address. In Python, the output is stripped off of important informations (like the name etc) - and in addition, it is not pretty-printed.

This fails when including the strata() function within the formula because it's not evaluated in the right environment. In R, formulas are special language constructs and so they need to be treated separately by rpy2.

So, for your example, this would look like:

rescl = R.clogit(ro.Formula('case ~ spontaneous + induced + strata(stratum)'),
                 data = infert)

See the documentation for rpy2.robjects.Formula for more details. That documentation also discusses the pros & cons of this approach vs that provided by @Gwang-jin-kim

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM