简体   繁体   中英

Creating Dataframe in python, passing it as parameter to function in R, but the dataframe columns aren't accessable. Using RPy2

The input values in the python function is given below. input_X is converted to dict with the keys stored as "0" and "1" with each iteration (to be accessed in R).

Y = [1,1,1,1,1,1,0,0,0,0,0,0]
input_X = [[3,4,3,4,3,1,5,4,6,7,5,3], [4,5,6,5,4,5,6,7,8,7,8,7]]
X = {}
    for i in range(len(input_X)):
        X[str(i)]=input_X[i]

X changed to R Dataframe:

RX = robjects.DataFrame(X)

It calls R function:

    r('''
           source('r_test.r')
    ''')
    r_getname = robjects.globalenv['logistic_regression']
    x=r_getname(RY,RX)
    return str(x)

Now coming to R:

logistic_regression = function(y,x){
    print(x["1"])
}

Gives an error that column "1" doesn't exist. So what's the right way to approach this?

You have two issues in your code. First, you need to first convert the python arrays to R integer vectors. Eg:

import rpy2.robjects as robjects

Y = [1,1,1,1,1,1,0,0,0,0,0,0]
input_X = [[3,4,3,4,3,1,5,4,6,7,5,3], [4,5,6,5,4,5,6,7,8,7,8,7]]
X = {}
    for i in range(len(input_X)):
        X[str(i)]=robjects.IntVector(input_X[i])

robjects.r('''
logistic_regression = function(x){
    print(colnames(x))
}
''')

xr = robjects.DataFrame(X)
robjects.r.logistic_regression(xr)

Second, note that this prints X0 and X1, not 0 and 1, since column names can't start with numbers without check.names=F in the constructor (but this parameter doesn't exist in Rpy2 DataFrame function).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM