简体   繁体   English

将Pickle数据(python)读入R后数字发生变化

[英]Numbers change after reading Pickle data (python) into R

I have a large dataset with unix epoch dates embedded in lists/dicts currently stored as a pickle file. 我有一个大数据集,其中Unix纪元日期嵌入在当前存储为pickle文件的列表/字典中。 I tried to import the pickle file into R using the reticulate package vis py_load_object() function. 我尝试使用网状软件包vis py_load_object()函数将pickle文件导入R。 Other than, the unix epoch dates (in milliseconds), all else seems fine. 除了unix纪元日期(以毫秒为单位)以外,其他所有内容似乎都不错。

I get very strange integer conversions. 我得到非常奇怪的整数转换。 For example, epoch date of 694137600000 is read as -1647101952 in R. I was wondering if there is an explanation and a work-around. 例如,纪元694137600000在R中读为-1647101952。我想知道是否有解释和解决方法。

Thanks! 谢谢!

It is very hard to help you without a minimal reproducible example but here are some ideas: 没有最小的可复制示例,很难为您提供帮助,但以下是一些建议:

  • You can un-pickle and convert the file to pandas data frame inside your Python script. 您可以解刺并将文件转换为Python脚本中的pandas数据框。 The source_python function from reticulate will import it as an R data frame. 来自reticulatesource_python函数会将其作为R数据帧导入。 Please refer to the documentation for additional information on type conversions: rstudio/reticulate 请参阅文档以获取有关类型转换的其他信息: rstudio / reticulate
  • It is always possible to un-pickle the file and export as a common format such as csv using Python and then import it into R. This way, you can bypass reticulate , which is not always an efficient option. 总是有可能解开文件并使用Python将其导出为通用格式(例如csv ,然后将其导入R。这样,您就可以绕过reticulate ,这并不总是一种有效的选择。

Please also note that you may need some help when it comes to handle 13-digit numbers in R. The package bit64 would be of interest to you. 另请注意,在处理R中的13位数字时,您可能需要一些帮助bit64软件包将对您很感兴趣。

The problem is that the values are being treated as 32 bit integers by reticulate - you can see the problem with the python snippet below: 问题是网状结构将值视为32位整数-您可以在下面的python代码段中看到问题:

In [1]: v = 694137600000

In [2]: v.bit_length()
Out[2]: 40

In [3]: import ctypes

In [4]: ctypes.c_int(v)
Out[4]: c_long(-1647101952)

In [5]: _.value
Out[5]: -1647101952

In [6]: ctypes.c_int64(v)
Out[6]: c_longlong(694137600000)

In [7]: ctypes.c_int32(v)
Out[7]: c_long(-1647101952)

One of the easiest workarounds is to, in python, unpickle your file and save as a .csv file but you should find that if you convert the pickled data to a pandas data frame and then access it from R it will be converted to an R dataframe - unless the date/time is the first column, (see here for why). 最简单的解决方法之一是在python中解开文件并将其另存为.csv文件,但是您应该发现,如果将腌制的数据转换为熊猫数据框,然后从R访问,它将被转换为R数据框-除非日期/时间是第一列,否则(请参见此处以了解原因)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM