简体   繁体   中英

Calling R functions in rpy2 error - “argument is missing”

I'm facing some issues in using rpy2 package in Python. Actually, I am trying to call a function called upliftRF (of the library "uplift" in R) by passing some arguments. As stated on page 27 of https://cran.r-project.org/web/packages/uplift/uplift.pdf , one of the arguments of the function can be x or a formula that describes the model to fit based on a dataframe ("data" parameter in arguments). When executing the code of page 29 in R, everything is running without any problems. However, I have some issues in rpy2. Here is my code :

import pandas.rpy.common as com
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
uplift = importr('uplift')
kwargs = {'n': 1000, 'p' : 20, 'rho' : 0, 'sigma' : np.sqrt(2), 'beta.den': 4}
dd = uplift.sim_pte(**kwargs)
ddPD = pandas2ri.ri2py(dd)
ddPD['treat'] = [1 if x==1 else 0 for x in ddPD['treat']]
dd = com.convert_to_r_dataframe(ddPD) 
kwargs2 = {'formula':'y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat)',
         'mtry':3,'ntree':200,'split_method':'KL','minsplit':200,'data':dd}

fit1 = uplift.upliftRF(**kwargs2)

Then, I get this error :

RRuntimeError: Error in is.data.frame(x) : argument "x" is missing, with no default

However, "x" is not a mandatory parameter of the function.

I guess that the error will be the same for any other R function that has one argument which is not mandatory at all.

Thank you for your help !

import pandas.rpy.common as com
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
uplift = importr('uplift')

Next, you should be able to use the most-common way to call Python functions because importr is "translating" named parameters in the definition of the R function into syntactically-valid Python names.

dd = uplift.sim_pte(n = 1000, p = 20, rho = 0,
                    sigma = np.sqrt(2), beta_den = 4)

At this point you appear to have an R data.frame . Going to pandas to add a column, then back to R, is definitely possible:

ddPD = pandas2ri.ri2py(dd) 
ddPD['treat'] = [1 if x==1 else 0 for x in ddPD['treat']]
dd = com.convert_to_r_dataframe(ddPD)

However, unless there is a good reason I'd recommend to stick to one conversion scheme when shuttling between pandas and rpy2 . The one defined in pandas or the one defined in rpy2 as consistency across is presumably less tested. The error RRuntimeError: Error: $ operator is invalid for atomic vectors might come from this.

The alternative to going to pandas is to use the eminently expressive R package dplyr . rpy2 is providing a tailored interface to it since version 2.7.0:

from rpy2.robjects.lib import dplyr
dd = (dplyr.DataFrame(dd)
      .mutate(treat = 'ifelse(treat==1, 1, 0)')

It was already pointed out in your answer that the formula should be declared as such (formulas are language objects in R, but there is no equivalent at the language level in Python). When writing this as a common Python call:

fit1 = uplift.upliftRF(formula = robjects.Formula('y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat)'),
                       mtry = 3,
                       ntree = 200,
                       split_method = 'KL',
                       minsplit = 200,
                       data = dd)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM