At the moment I'm getting my feet wet with with the rpy2
package (it's rather cool).
But I'm running in a similar issue as discussed in this question , where the type conversion between an R data.frame and a python pandas dataframe is rather messy:
import rpy2.robjects as ro
from rpy2.robjects.conversion import localconverter
from rpy2.robjects import pandas2ri
# float -> NaN
# int -> -2147483648
# Strings -> None
# Bool -> int
ro.r(
"""
f <- function() {
return(data.frame(int = c(4L, NA),
float = c(1.2, NA),
chr = c("A", NA),
bool = c(TRUE, FALSE)))
}
f()
"""
)
r_f = ro.globalenv["f"]
res = r_f()
with localconverter(ro.default_converter + pandas2ri.converter):
pd_from_r_df = ro.conversion.rpy2py(res)
print(pd_from_r_df)
results in:
python -u "/workspace/SLW/Python_examples/MWE_2.py"
int float chr bool
1 4 1.2 A 1
2 -2147483648 NaN None 0
As you can see the integer gets blown up and the boolean turned into an integer. Since this feels like such an ordinary case, I'm sure a lot of people already ran into it and maybe designed their own custom converter like advised here . Do you maybe have the code of such an converter for me? So I can start with something in my hands, since at the moment I have no clue how to go about writing my own custom converter. Maybe in the long run such an converter could become a part of the rpy2
package.
Here is a simple, completely custom converter from R to pandas DataFrame:
from rpy2.robjects.conversion import localconverter, get_conversion
from rpy2 import rinterface as ri
import rpy2.robjects as ro
from rpy2.rinterface_lib import na_values
import pandas as pd
# create your own rules for df columns
df_rules = ro.default_converter
@df_rules.rpy2py.register(ri.IntSexpVector)
def to_int(obj):
return [int(v) if v != na_values.NA_Integer else pd.NA for v in obj]
@df_rules.rpy2py.register(ri.FloatSexpVector)
def to_float(obj):
return [float(v) if v != na_values.NA_Real else pd.NA for v in obj]
@df_rules.rpy2py.register(ri.StrSexpVector)
def to_str(obj):
return [str(v) if v != na_values.NA_Character else pd.NA for v in obj]
@df_rules.rpy2py.register(ri.BoolSexpVector)
def to_bool(obj):
return [bool(v) if v != na_values.NA_Logical else pd.NA for v in obj]
# define the top-level converter
def toDataFrame(obj):
cv = get_conversion() # get the converter from current context
return pd.DataFrame(
{str(k): cv.rpy2py(obj[i]) for i, k in enumerate(obj.names)}
)
# associate the converter with R data.frame class
df_rules.rpy2py_nc_map[ri.ListSexpVector].update({"data.frame": toDataFrame})
# code in OP
ro.r(
"""
f <- function() {
return(data.frame(int = c(4L, NA),
float = c(1.2, NA),
chr = c("A", NA),
bool = c(TRUE, FALSE)))
}
f()
"""
)
r_f = ro.globalenv["f"]
res = r_f()
with localconverter(df_rules): # use the defined rules here
pd_from_r_df = res
print(pd_from_r_df)
This prints
int float chr bool
0 4 1.2 A True
1 <NA> NaN <NA> False
So it's not perfect: R's float
and bool
NA values are not capturable by the custom converter (at least in the way I've implemented here) but int
and str
are working as intended.
Also, it may not be the fastest solution as I believe padas2ri
module uses numpy's buffer copy mechanism, but atm I don't know how to deal with the NAs in such conversion.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.