简体   繁体   English

在 R 中读取泡菜文件(PANDAS Python 数据帧)

[英]Reading a pickle file (PANDAS Python Data Frame) in R

Is there an easy way to read pickle files (.pkl) from Pandas Dataframe into R?有没有一种简单的方法可以将 Pandas Dataframe 中的 pickle 文件 (.pkl) 读取到 R 中?

One possibility is to export to CSV and have R read the CSV but that seems really cumbersome for me because my dataframes are rather large.一种可能性是导出到 CSV 并让 R 读取 CSV,但这对我来说似乎很麻烦,因为我的数据框相当大。 Is there an easier way to do so?有没有更简单的方法来做到这一点?

Thanks!谢谢!

Reticulate was quite easy and super smooth as suggested by russellpierce in the comments.正如 russellpierce 在评论中所建议的那样, Reticulate非常简单且非常平滑。

install.packages('reticulate')

After which I created a Python script like this from examples given in their documentation.之后,我从他们的文档中给出的示例中创建了一个这样的 Python 脚本。

Python file:蟒文件:

import pandas as pd

def read_pickle_file(file):
    pickle_data = pd.read_pickle(file)
    return pickle_data

And then my R file looked like:然后我的 R 文件看起来像:

require("reticulate")

source_python("pickle_reader.py")
pickle_data <- read_pickle_file("C:/tsa/dataset.pickle")

This gave me all my data in R stored earlier in pickle format.这给了我之前以pickle格式存储在R中的所有数据。

You can also do this all in-line in R without leaving your R editor (provided your system python can reach pandas)... eg你也可以在不离开你的 R 编辑器的情况下在 R 中执行所有这些操作(前提是你的系统 python 可以访问熊猫)......例如

library(reticulate)
pd <- import("pandas")
pickle_data <- pd$read_pickle("dataset.pickle")

Edit: If you can install and use the {reticulate} package, then this answer is probably outdated.编辑:如果您可以安装和使用 {reticulate} 包,那么这个答案可能已经过时了。 See the other answers below for an easier path.有关更简单的路径,请参阅下面的其他答案。

You could load the pickle in python and then export it to R via the python package rpy2 (or similar).您可以在 python 中加载泡菜,然后通过 python 包rpy2 (或类似包)将其导出到 R。 Once you've done so, your data will exist in an R session linked to python.完成此操作后,您的数据将存在于链接到 python 的 R 会话中。 I suspect that what you'd want to do next would be to use that session to call R and saveRDS to a file or RAM disk.我怀疑您接下来要做的是使用该会话来调用 R 并将 RDS 保存到文件或 RAM 磁盘。 Then in RStudio you can read that file back in. Look at the R packages rJython and rPython for ways in which you could trigger the python commands from R.然后在 RStudio 中,您可以重新读取该文件。查看 R 包rJythonrPython了解可以从 R 触发 python 命令的方法。

Alternatively, you could write a simple python script to load your data in Python (probably using one of the R packages noted above) and write a formatted data stream to stdout.或者,您可以编写一个简单的 Python 脚本来在 Python 中加载您的数据(可能使用上面提到的 R 包之一)并将格式化的数据流写入标准输出。 Then that entire system call to the script (including the argument that specifies your pickle) can use used as an argument to fread in the R package data.table .然后,对脚本的整个系统调用(包括指定 pickle 的参数)可以用作 R 包data.table fread的参数。 Alternatively, if you wanted to keep to standard functions, you could use combination of system(..., intern=TRUE) and read.table .或者,如果您想保持标准功能,您可以使用system(..., intern=TRUE)read.table

As usual, there are /many/ ways to skin this particular cat.像往常一样,有/许多/方法可以给这只特定的猫剥皮。 The basic steps are:基本步骤是:

  1. Load the data in python在python中加载数据
  2. Express the data to R (eg, exporting the object via rpy2 or writing formatted text to stdout with R ready to receive it on the other end)将数据表达到 R(例如,通过 rpy2 导出对象或将格式化文本写入标准输出,R 准备在另一端接收它)
  3. Serialize the expressed data in R to an internal data representation (eg, exporting the object via rpy2 or fread )将 R 中表达的数据序列化为内部数据表示(例如,通过 rpy2 或fread导出对象)
  4. (optional) Make the data in that session of R accessible to another R session (ie, the step to close the loop with rpy2, or if you've been using fread then you're already done). (可选)使另一个 R 会话可以访问该 R 会话中的数据(即,使用 rpy2 关闭循环的步骤,或者如果您一直在使用fread那么您已经完成了)。

To add to the answer above: you might need to point to a different conda env to get to pandas:要添加到上面的答案:您可能需要指向不同的 conda env 才能访问 Pandas:

use_condaenv("name_of_conda_env", conda = "<<result_of `which conda`>>")
pd <- import('pandas')

df <- pd$read_pickle(paste0(outdir, "df.pkl"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python性能问题 - 从特定的Pickle文件中读取相同Pandas数据帧的多个方法 - Python Performance concern - Multiple methods reading same Pandas Data Frame from a particular Pickle file 在 python 中读取 RDa 文件作为 Pandas 数据框 - reading RDa file in python as a pandas data frame 在将 python pickle 数据写入文件之前读取它 - Reading python pickle data before writing it to a file 在 tar.z 文件中读取为 python 3.7.4 中的 pandas 数据帧? - reading in tar.z file as pandas data frame in python 3.7.4? 将用Feather存储的熊猫数据帧读入R - Reading Pandas data frame stored with Feather into R 将Pickle数据(python)读入R后数字发生变化 - Numbers change after reading Pickle data (python) into R 如何使用pandas read_pickle从qrc资源文件中读取包含pandas数据框的pickle文件? - How can I read pickle file containing pandas data frame from qrc resource file with pandas read_pickle? 在CloudML中的Tensorflow中读取熊猫泡菜文件 - Reading a pandas pickle file in Tensorflow in CloudML 将R数据框加载到Python中并转换为Pandas数据框 - Load R data frame into Python and convert to Pandas data frame 将数据文件和名称文件作为单个 pandas 数据帧读入 - Reading in data file and names file as a single pandas data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM