Reading a pickle file (PANDAS Python Data Frame) in R

Question

Is there an easy way to read pickle files (.pkl) from Pandas Dataframe into R?

One possibility is to export to CSV and have R read the CSV but that seems really cumbersome for me because my dataframes are rather large. Is there an easier way to do so?

Thanks!

Answer 1

Reticulate was quite easy and super smooth as suggested by russellpierce in the comments.

install.packages('reticulate')

After which I created a Python script like this from examples given in their documentation.

Python file:

import pandas as pd

def read_pickle_file(file):
    pickle_data = pd.read_pickle(file)
    return pickle_data

And then my R file looked like:

require("reticulate")

source_python("pickle_reader.py")
pickle_data <- read_pickle_file("C:/tsa/dataset.pickle")

This gave me all my data in R stored earlier in pickle format.

You can also do this all in-line in R without leaving your R editor (provided your system python can reach pandas)... eg

library(reticulate)
pd <- import("pandas")
pickle_data <- pd$read_pickle("dataset.pickle")

Answer 2

Edit: If you can install and use the {reticulate} package, then this answer is probably outdated. See the other answers below for an easier path.

You could load the pickle in python and then export it to R via the python package rpy2 (or similar). Once you've done so, your data will exist in an R session linked to python. I suspect that what you'd want to do next would be to use that session to call R and saveRDS to a file or RAM disk. Then in RStudio you can read that file back in. Look at the R packages rJython and rPython for ways in which you could trigger the python commands from R.

Alternatively, you could write a simple python script to load your data in Python (probably using one of the R packages noted above) and write a formatted data stream to stdout. Then that entire system call to the script (including the argument that specifies your pickle) can use used as an argument to fread in the R package data.table . Alternatively, if you wanted to keep to standard functions, you could use combination of system(..., intern=TRUE) and read.table .

As usual, there are /many/ ways to skin this particular cat. The basic steps are:

Load the data in python
Express the data to R (eg, exporting the object via rpy2 or writing formatted text to stdout with R ready to receive it on the other end)
Serialize the expressed data in R to an internal data representation (eg, exporting the object via rpy2 or fread )
(optional) Make the data in that session of R accessible to another R session (ie, the step to close the loop with rpy2, or if you've been using fread then you're already done).

Answer 3

To add to the answer above: you might need to point to a different conda env to get to pandas:

use_condaenv("name_of_conda_env", conda = "<<result_of `which conda`>>")
pd <- import('pandas')

df <- pd$read_pickle(paste0(outdir, "df.pkl"))

Reading a pickle file (PANDAS Python Data Frame) in R

Question

3 answers

solution1
31 2018-07-17 08:13:45

solution2
12 ACCPTED 2016-02-01 00:20:03

solution3
3 2020-02-17 13:59:02

Reading a pickle file (PANDAS Python Data Frame) in R

Question

3 answers

solution1 31 2018-07-17 08:13:45

solution2 12 ACCPTED 2016-02-01 00:20:03

solution3 3 2020-02-17 13:59:02

solution1
31 2018-07-17 08:13:45

solution2
12 ACCPTED 2016-02-01 00:20:03

solution3
3 2020-02-17 13:59:02