简体   繁体   中英

Warning message on running mongo.cursor.to.data.frame function in rmongodb

When I run the the query

mongo.cursor.to.data.frame(cursor)

to fetch the documents in a collection to a data frame in R using rmongodb , I am getting thewarning message:

In mongo.cursor.to.data.frame(cursor) : This fails for most NoSQL data structures. I am working on a new solution

I checked some articles about rmongodb and I could find this message mentioned there too. Does this warning mean that there might be some issues in the resulting data frame?

The source code shows where the issues could arise

mongo.cursor.to.data.frame <- function(cursor, nullToNA=TRUE, ...){

  warning("This fails for most NoSQL data structures. I am working on a new solution")

  res <- data.frame()
  while ( mongo.cursor.next(cursor) ){
    val <- mongo.bson.to.list(mongo.cursor.value(cursor))

    if( nullToNA == TRUE )
      val[sapply(val, is.null)] <- NA

    # remove mongo.oid -> data.frame can not deal with that!
    val <- val[sapply(val, class) != 'mongo.oid']

    res <- rbind.fill(res, as.data.frame(val, ... ))

  }
  return( as.data.frame(res) )
}

We can see it's using plyr::rbind.fill to rbind data.frames. So this all comes down to what is passed into rbind.fill , namely val .

And val is the result of val <- mongo.bson.to.list(mongo.cursor.value(cursor)) .

So as long as as.data.frame(val, ...) can handle the list structure you pass into it you're ok.

However, it's quite easy to conceive a NoSQL data structure that will fail this:

## consider the JSON structure
## [{"a":[1,2,3],"b":["a","b","c"]},{"d":[1.1,2.2,3.3],"e":[["nested","list"]]}] 

##Which in R is the same as
lst = list(list(a = c(1L,2L,3L),
                b = c("a","b","c")),
           list(d = c(1.1, 2.2, 3.3),
                e = list(c("nested", "list"))))

## this errors when coerced to a data.frame
as.data.frame(lst)
Error in data.frame(d = c(1.1, 2.2, 3.3), e = list(c("nested", "list")),  : 
  arguments imply differing number of rows: 3, 2

It's at this point I should mention the mongolite package, which is generally faster, but again returns a data.frame .

And there's also my extension to mongolite, mongolitedt (not yet on CRAN) that is quicker still and retrieving data, but again is limited by the result has to be coerced into a data.table

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM