简体   繁体   English

使用 Python 创建 R 的公式

[英]Creating R's formula using Python

I am writing a program that interacts with R using Python.我正在编写一个使用 Python 与 R 交互的程序。 Basically, I have some R libraries that I want to ingest into my Python code.基本上,我有一些 R 库,我想将它们引入到我的 Python 代码中。 After downloading rpy2 , I define my R functions that I want to use in a separate .R file script.下载rpy2 ,我定义了我想在单独的.R文件脚本中使用的 R 函数。

The R function requires that we pass the formula to it for applying some oversampling technique. R 函数要求我们将公式传递给它以应用一些oversampling技术。 Below is the R function that I wrote:下面是我写的R函数:

WFRandUnder <- function(target_variable, other, train, rel, thr.rel, C.perc, repl){
    a <- target_variable
    b <- '~'
    form_begin <- paste(a, b, sep=' ')
    fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
    undersampled = RandUnderRegress(fmla, train, rel, thr.rel, C.perc, repl)
    return(undersampled)
}

I am passing, from python, the target variable name, as well as a list containing all the other columns' names.我正在从 python 传递目标变量名称,以及包含所有其他列名称的列表。 As I want it to be as follows: my_target_variable ~ all other columns因为我希望它如下: my_target_variable ~ all other columns

However in these line:但是在这些行中:

a <- target_variable
    b <- '~'
    form_begin <- paste(a, b, sep=' ')
    fmla <- as.formula(paste(form_begin, paste(other, collapse= "+"))) 

The formula does not always get formulated if I have many columns in my data.如果我的数据中有很多列,则公式并不总是得到制定。 What should I do to make it always work?我该怎么做才能使它始终有效? I am concatenating all columns'names with a + operator.我用+运算符连接所有列的名称。

Thanks to @nicola, I was able to solve this problem by doing the following:感谢@nicola,我能够通过执行以下操作来解决这个问题:

create_formula <- function(target_variable, other){
    # y <- target_variable
    # tilda <- '~'
    # form_begin <- paste(y, tilda, sep=' ')
    # fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
    # return(fmla)
    y <- target_variable
    fmla = as.formula(paste(y, '~ .'))
    return(fmla)
}

I call this function from my python program using rpy2 .我使用rpy2从我的 python 程序调用这个函数。 This issues no problem because whenever we use this formula, we will be attaching the data itself to it, so it won't possess a problem.这没有问题,因为每当我们使用这个公式时,我们都会将数据本身附加到它上面,所以它不会有问题。 A sample code to demonstrate what I'm saying:一个示例代码来演示我在说什么:

        if self.smogn:
            smogned = runit.WFDIBS(

                 # here is the formula call (get_formula is a python function that calls create_formula defined above in R)
                fmla=get_formula(self.target_variable, self.other),

                # here is the data 
                dat=df_combined,

                method=self.phi_params['method'][0],
                npts=self.phi_params['npts'][0],
                controlpts=self.phi_params['control.pts'],
                thrrel=self.thr_rel,
                Cperc=self.Cperc,
                k=self.k,
                repl=self.repl,
                dist=self.dist,
                p=self.p,
                pert=self.pert)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM