简体   繁体   中英

Improve Runtime for calling R function in python function using rpy2

I primarily program in python (using jupyter notebooks) but on occasion need to use an R function. I currently do this by using rpy2 and R magic, which works fine. Now I would like to write a function which will summarize part of my analysis procedure into one wrapper function (so I don't always need to run all of the code cells but can simply execute the function once). As part of this procedure I need to call an R function. I adapted my code to import the R function to python using the rpy2.robjects interface with importr. This works but is extremely slow (more than triple the run time for an already lengthy procedure) which makes this simply not feasible analysiswise. I am assuming this has to do with me accessing R through the high-level interface of rpy2 instead of the low-level interface. I am unsure of how to use the low-level interface within a function call though and would need some help adapting my code.

I've tried looking into the rpy2 documentation but am struggling to understand it.

This is my code for executing the R function call from within python using R magic.

Activating rpy2 R magic

%load_ext rpy2.ipython

Load my required libaries

%%R

library(scran)

Actually call the R function

%%R -i data_mat -i input_groups -o size_factors

size_factors = computeSumFactors(data_mat, clusters=input_groups, min.mean=0.1)

This is my alternative code to import the R function using rpy2 importr.

from rpy2.robjects.packages import importr
scran = importr('scran')
computeSumFactors = scran.computeSumFactors
size_factors = computeSumFactors(data_mat, clusters=input_groups, min_mean=0.1)

For some reason this second approach is orders of magnitude slower. Any help would be much apreciated.

The only difference between the two that I would see have an influence on the observe execution speed is conversion.

When running in an "R magic" code cell (prefixed with %%R ), in your example the result of calling computeSumFactors() is an R object bound to the symbol size_factors in R. In the other case, the result of calling the function computeSumFactors() will go through the conversion system (and there what exactly happens depends on what are the active converters you have) before the result is bound to the Python symbol size_factors .

Conversion can be costly: you should consider trying to deactivate numpy / pandas conversion (the localconverter context manager can be a convenient way to temporarily use minimal conversion for a code block).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM