简体   繁体   English

从 Rdata 文件中获取特定对象

[英]Get specific object from Rdata file

I have a Rdata file containing various objects:我有一个包含各种对象的Rdata文件:

 New.Rdata
  |_ Object 1  (e.g. data.frame)
  |_ Object 2  (e.g. matrix)
  |_...
  |_ Object n

Of course I can load the data frame with load('New.Rdata') , however, is there a smart way to load only one specific object out of this file and discard the others?当然,我可以使用load('New.Rdata')加载数据框,但是,是否有一种聪明的方法可以仅从该文件中加载一个特定对象并丢弃其他对象?

.RData files don't have an index (the contents are serialized as one big pairlist). .RData 文件没有索引(内容被序列化为一个大对列表)。 You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.您可以破解一种方法来遍历配对列表并仅分配您喜欢的条目,但这并不容易,因为您无法在 R 级别执行此操作。

However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index.但是,您可以简单地将 .RData 文件转换为延迟加载数据库,该数据库分别序列化每个条目并创建索引。 The nice thing is that the loading will be on-demand:好消息是加载将按需进行:

# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")

Loading the DB then only loads the index but not the contents.加载数据库然后只加载索引而不是内容。 The contents are loaded as they are used:内容在使用时加载:

lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb

Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.就像load()一样,您可以指定要加载的环境,这样您就不需要污染全局工作区等。

You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.您可以使用attach而不是load将数据对象附加到搜索路径,然后您可以复制您感兴趣的一个对象并分离 .Rdata 对象。

This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.这仍然加载所有内容,但比将所有内容加载到全局工作区(可能覆盖您不想覆盖的内容)然后摆脱您不想要的所有内容更简单。

Simon Urbanek's answer is very, very nice. Simon Urbanek 的回答非常非常好。 A drawback is that it doesn't seem to work if an object to be saved is too large:一个缺点是,如果要保存的对象太大,它似乎不起作用:

tools:::makeLazyLoadDB(
  local({
    x <- 1:1e+09
   cat("size:", object.size(x) ,"\n")
   environment()
  }), "lazytest")
size: 4e+09 
Error: serialization is too large to store in a raw vector

I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap.我猜这是由于 R 的当前实现的限制(我有 2.15.2)而不是物理内存和交换用完。 The saves package might be an alternative for some uses, however.但是, saves包可能是某些用途的替代方案。

A function is useful to extract a single object without loading everything in the RData file.函数可用于提取单个对象,而无需加载 RData 文件中的所有内容。

extractorRData <- function(file, object) {
      #' Function for extracting an object from a .RData file created by R's save() command
      #' Inputs: RData file, object name
      E <- new.env()
      load(file=file, envir=E)
      return(get(object, envir=E, inherits=F))
    }

See full answer here.在此处查看完整答案。 https://stackoverflow.com/a/65964065/4882696 https://stackoverflow.com/a/65964065/4882696

This blog post gives an a neat practice that prevents this sort of issue in the first problem.这篇博文提供了一种巧妙的做法,可以防止第一个问题中出现此类问题。 The gist of it is to use saveRDS(), loadRDS() functions instead of the regular save(), load() functions.它的要点是使用saveRDS(), loadRDS()函数而不是常规的save(), load()函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM