简体   繁体   English

您能否将表示 RDS 文件的 R 原始向量转换回 R object 而无需往返磁盘?

[英]Can you convert an R raw vector representing an RDS file back into an R object without a round trip to disk?

I have an RDS file that is uploaded and then download via curl::curl_fetch_memory() (via httr ) - this gives me a raw vector in R.我有一个上传的 RDS 文件,然后通过curl::curl_fetch_memory() (通过httr )下载 - 这给了我 R 中的原始向量。

Is there a way to read that raw vector representing the RDS file to return the original R object?有没有办法读取表示 RDS 文件的原始向量以返回原始 R object? Or does it always have to be written to disk first?还是必须先将其写入磁盘?

I have a setup similar to below:我有一个类似于下面的设置:

saveRDS(mtcars, file = "obj.rds")
# upload the obj.rds file 
...
# download it again via httr::write_memory()
...

obj
#   [1] 1f 8b 08 00 00 00 00 00 00 03 ad 56 4f 4c 1c 55 18 1f ca 02 bb ec b2 5d 
# ...
is.raw(obj)
#[1] TRUE

It seems readRDS() should be used to uncompress it, but it takes a connection object and I don't know how to make a connection object from an R raw vector - rawConnection() looked promising but gave:似乎应该使用readRDS()来解压缩它,但它需要一个连接 object 而我不知道如何从 ZE1E1D3D40573127E9EE0480CAFDE6331BD59EB2AC96F8911C4B666Z 建立一个连接rawConnection()看起来很有希望原始向量 -

rawConnection(obj)
#A connection with                           
#description "obj"          
#class       "rawConnection"
#mode        "r"            
#text        "binary"       
#opened      "opened"       
#can read    "yes"          
#can write   "no"     
readRDS(rawConnection(obj))
#Error in readRDS(rawConnection(obj)) : unknown input format

Looking through readRDS it looks like it uses gzlib() underneath but couldn't get that to work with the raw vector object.通过readRDS看起来它在下面使用gzlib()但无法使其与原始向量 object 一起使用。

If its download via httr::write_disk() -> curl::curl_fetch_disk() -> readRDS() then its all good but this is a round trip to disk and I wondered if it could be optimised for big files.如果它是通过httr::write_disk() -> curl::curl_fetch_disk() -> readRDS()的,那么一切都很好,但这是到磁盘的往返,我想知道它是否可以针对大文件进行优化。

By default, RDS file streams are gzipped.默认情况下,RDS 文件流是 gzip 压缩的。 To read a raw connection you need to manually wrap it into a gzcon :要读取原始连接,您需要手动将其包装到gzcon

con = rawConnection(obj)
result = readRDS(gzcon(con))

This works even when the stream isn't gzipped.即使 stream没有被压缩,这也有效。 But unfortunately it fails if a different supported compression method (eg 'bzip2' ) was used to create the RDS file.但不幸的是,如果使用不同的受支持压缩方法(例如'bzip2' )来创建 RDS 文件,它会失败。 Unfortunately R doesn't seem to have a gzcon equivalent for bzip2 or xz.不幸的是,R 似乎没有与 bzip2 或 xz 等效的gzcon For those formats, the only recourse seems to be to write the data to disk.对于这些格式,唯一的办法似乎是将数据写入磁盘。

I had exactly the same problem, and for me, the above answer with gzcon did not work, however, I could directly load the raw object into R's memory using the rawConnection :我遇到了完全相同的问题,对我来说,上面的gzcon答案不起作用,但是,我可以使用 rawConnection 将原始 object 直接加载到 R 的rawConnection中:

load(rawConnection(obj))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM