Creating Dataframe in python, passing it as parameter to function in R, but the dataframe columns aren't accessable. Using RPy2

Question

The input values in the python function is given below. input_X is converted to dict with the keys stored as "0" and "1" with each iteration (to be accessed in R).

Y = [1,1,1,1,1,1,0,0,0,0,0,0]
input_X = [[3,4,3,4,3,1,5,4,6,7,5,3], [4,5,6,5,4,5,6,7,8,7,8,7]]
X = {}
    for i in range(len(input_X)):
        X[str(i)]=input_X[i]

X changed to R Dataframe:

RX = robjects.DataFrame(X)

It calls R function:

    r('''
           source('r_test.r')
    ''')
    r_getname = robjects.globalenv['logistic_regression']
    x=r_getname(RY,RX)
    return str(x)

Now coming to R:

logistic_regression = function(y,x){
    print(x["1"])
}

Gives an error that column "1" doesn't exist. So what's the right way to approach this?

Answer 1

You have two issues in your code. First, you need to first convert the python arrays to R integer vectors. Eg:

import rpy2.robjects as robjects

Y = [1,1,1,1,1,1,0,0,0,0,0,0]
input_X = [[3,4,3,4,3,1,5,4,6,7,5,3], [4,5,6,5,4,5,6,7,8,7,8,7]]
X = {}
    for i in range(len(input_X)):
        X[str(i)]=robjects.IntVector(input_X[i])

robjects.r('''
logistic_regression = function(x){
    print(colnames(x))
}
''')

xr = robjects.DataFrame(X)
robjects.r.logistic_regression(xr)

Second, note that this prints X0 and X1, not 0 and 1, since column names can't start with numbers without check.names=F in the constructor (but this parameter doesn't exist in Rpy2 DataFrame function).

Creating Dataframe in python, passing it as parameter to function in R, but the dataframe columns aren't accessable. Using RPy2

Question

1 answers

solution1
2 ACCPTED 2017-02-24 10:48:21

Creating Dataframe in python, passing it as parameter to function in R, but the dataframe columns aren't accessable. Using RPy2

Question

1 answers

solution1 2 ACCPTED 2017-02-24 10:48:21

solution1
2 ACCPTED 2017-02-24 10:48:21