简体   繁体   English

在rmongodb中运行mongo.cursor.to.data.frame函数的警告消息

[英]Warning message on running mongo.cursor.to.data.frame function in rmongodb

When I run the the query 当我运行查询

mongo.cursor.to.data.frame(cursor)

to fetch the documents in a collection to a data frame in R using rmongodb , I am getting thewarning message: 使用rmongodb将集合中的文档提取到R中的数据帧中,我得到警告消息:

In mongo.cursor.to.data.frame(cursor) : This fails for most NoSQL data structures. I am working on a new solution

I checked some articles about rmongodb and I could find this message mentioned there too. 我检查了一些有关rmongodb文章,我也可以在那里找到此消息。 Does this warning mean that there might be some issues in the resulting data frame? 此警告是否表示结果数据框中可能存在一些问题?

The source code shows where the issues could arise 源代码显示了可能出现问题的地方

mongo.cursor.to.data.frame <- function(cursor, nullToNA=TRUE, ...){

  warning("This fails for most NoSQL data structures. I am working on a new solution")

  res <- data.frame()
  while ( mongo.cursor.next(cursor) ){
    val <- mongo.bson.to.list(mongo.cursor.value(cursor))

    if( nullToNA == TRUE )
      val[sapply(val, is.null)] <- NA

    # remove mongo.oid -> data.frame can not deal with that!
    val <- val[sapply(val, class) != 'mongo.oid']

    res <- rbind.fill(res, as.data.frame(val, ... ))

  }
  return( as.data.frame(res) )
}

We can see it's using plyr::rbind.fill to rbind data.frames. 我们可以看到它正在使用plyr::rbind.fillrbind data.frames。 So this all comes down to what is passed into rbind.fill , namely val . 因此,这全部归结为传递给rbind.fill ,即val

And val is the result of val <- mongo.bson.to.list(mongo.cursor.value(cursor)) . valval <- mongo.bson.to.list(mongo.cursor.value(cursor))

So as long as as.data.frame(val, ...) can handle the list structure you pass into it you're ok. 因此,只要as.data.frame(val, ...)可以处理传递给它的列表结构,就可以了。

However, it's quite easy to conceive a NoSQL data structure that will fail this: 但是,构想出将失败的NoSQL数据结构非常容易:

## consider the JSON structure
## [{"a":[1,2,3],"b":["a","b","c"]},{"d":[1.1,2.2,3.3],"e":[["nested","list"]]}] 

##Which in R is the same as
lst = list(list(a = c(1L,2L,3L),
                b = c("a","b","c")),
           list(d = c(1.1, 2.2, 3.3),
                e = list(c("nested", "list"))))

## this errors when coerced to a data.frame
as.data.frame(lst)
Error in data.frame(d = c(1.1, 2.2, 3.3), e = list(c("nested", "list")),  : 
  arguments imply differing number of rows: 3, 2

It's at this point I should mention the mongolite package, which is generally faster, but again returns a data.frame . 在这一点上,我应该提到mongolite包,它通常更快,但是再次返回data.frame

And there's also my extension to mongolite, mongolitedt (not yet on CRAN) that is quicker still and retrieving data, but again is limited by the result has to be coerced into a data.table 还有我对mongolite, mongolitedt (尚未在CRAN上使用)的扩展,它可以更快地恢复数据,但又受到结果的限制,必须将其强制为data.table

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM