简体   繁体   中英

dynamically calling R library from python using rpy2

based on https://stackoverflow.com/a/44827220/1639834 :

I have an R routine that I need to call from my python code in a dynamic way. For this I intended to use rpy2.

First the R code I would like to make use of from python (first time R user):

setting up dummy data to showcase R routine usage

 set.seed(101)
 data_sample <- c(5+ 3*rt(1000,df=5),
        10+1*rt(10000,df=20))

 num_components <- 2

the routine itself

library(teigen)
 tt <- teigen(data_sample,
        Gs=num_components,  
        scale=FALSE,dfupdate="numeric",
        models=c("univUU") 
 )

df = c(tt$parameters$df)
mean = c(tt$parameters$mean)
scale = c(tt$parameters$sigma)

The arguments data_sample and num_components are computed dynamically by my python code where num_components it just an integer and data_sample a numpy array.

As end-goal I would like to have df , mean and scale back in "python world" as lists or numpy arrays to further process them and use them down the road in my program logic.

My first experiment to tackle this with rpy2 so far:

import rpy2
from rpy2.robjects.packages import importr
from rpy2 import robjects as ro

numpy_t_mix_samples = get_student_t_data(n_samples=10000)

r_t_mix_samples = ro.FloatVector(numpy_t_mix_samples)

teigen = importr('teigen')
rres = teigen.teigen(r_t_mix_samples, Gs=2, scale=False, dfupdate="numeric", models=c("univUU"))

Here the argument for Gs are still hardcoded but should as laid out above later be dynamic.

rres then prints mostly incomprehensible output (i gues because it is not being casted yet properly with rpy2):

R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
  iter: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x11e3fdd08 / R:0x7ff7cced0a28>
[156.000000]
  fuzzy: <class 'rpy2.robjects.vectors.Matrix'>
  R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x11e3fd8c8 / R:0x118e78000>
[0.000000, 0.917546, 0.004050, ..., 0.077300, 0.076273, 0.091252]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
  ...
  iter: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x11d632508 / R:0x7ff7cfa81658>
[-25365.912426]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]

All in all I am looking to have the same results as in the original R example in the first code box, just that the df, mean and scale variables are python lists/numpy arrays. The fact that I don't know R at all makes using rpy2 quite difficult and maybe there is more elegant way to call this routine dynamically and get the results back in python world.

Consider using the x.names.index('myname') to reference nested named elements in R objects. See rpy2 docs . And as a reminder and demonstrated below you can still reference both R and Python nested objects with number indexing.

To reproduce your R object with exact random data we need to run the set.seed on R side as there is no easy way to find the equivalent random number generator across languages. See related post . Finally, base R's as.vector() is used to cast array objects to vectors. All returns in Python are R FloatVectors: <class 'rpy2.robjects.vectors.FloatVector'> .

Python

from rpy2.robjects.packages import importr

base = importr('base')
stats = importr('stats')
teigen = importr('teigen')

base.set_seed(101)
data_sample = base.as_numeric([(5+3*i) for i in stats.rt(1000,df=5)] + \
                              [(10+1*i) for i in stats.rt(10000,df=20)])

num_components = 2

rres = teigen.teigen(data_sample, Gs=num_components, scale=False, 
                     dfupdate="numeric", models="univUU")

# BY NUMBER INDEX
df = rres[2][0]
mean = base.as_vector(rres[2][1])
scale = base.as_vector(rres[2][3])

print(df)
# [1]  3.578491 47.059841
print(mean)
# [1]  4.939179 10.002038
print(scale)
# [1] 8.763076 1.041588


# BY NAME INDEX 
# (i.e., find corresponding number to name in R object)
params = rres[rres.names.index('parameters')]

df = params[params.names.index('df')]
mean = base.as_vector(params[params.names.index('mean')])
scale = base.as_vector(params[params.names.index('sigma')])

print(df)
# [1]  3.578491 47.059841
print(mean)
# [1]  4.939179 10.002038
print(scale)
# [1] 8.763076 1.041588

R (equivalent script)

library(teigen)

set.seed(101)
data_sample <- c(5+ 3*rt(1000,df=5),
                 10+1*rt(10000,df=20))
num_components <- 2

tt <- teigen(data_sample, Gs=num_components, scale=FALSE, 
             dfupdate="numeric", models="univUU")    

# BY NUMBER INDEX
df = tt[[3]][[1]]
mean = as.vector(tt[[3]][[2]])
scale = as.vector(tt[[3]][[4]])

print(df)
# [1]  3.578491 47.059841     
print(mean)
# [1]  4.939179 10.002038     
print(scale)
# [1] 8.763076 1.041588

# BY NAME INDEX
df = c(tt$parameters$df)
mean = c(tt$parameters$mean)
scale = c(tt$parameters$sigma)

print(df)
# [1]  3.578491 47.059841    
print(mean)
# [1]  4.939179 10.002038    
print(scale)
# [1] 8.763076 1.041588

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM