简体   繁体   中英

How to convert a list of lists to a dataframe - non-identical lists

I have a list where each element is a named list, but the elements are not the same everywhere. I have read solutions on how to convert lists of lists to dataframes here and here , but none of this works when the lists are not identical.

Example - note I have mixed types as well, it is fine if the solution coerces everything to character.

lisnotOK <- list(list(a=1, b=2, c="hi"), list(b=2, c="hello", d="nope"))

The result should simply have NA where a column cannot be filled by a list, just like rbind.fill from plyr, or rbind_all from dplyr.

Example

lisOK <- list(list(a=1, b=2, c="hi"), list(a=3, b=5, c="bye"))

# One of many solutions
do.call(rbind.data.frame, lisOK)

# gives
   a b   c
2  1 2  hi
21 3 5 bye

Any solution that uses rbind , or tries to make lisnotOK into a matrix will fail, whereas any examples in the posts linked above don't work, even when I try to use rbind_all or rbind.fill .

One solution is an ugly for loop where each successive list is changed to a dataframe, and uses rbind_all to bind to a dataframe.

Does anyone know an efficient solution?

Any function using data.frame(.) on each element of the list before binding would be terribly inefficient (not to mention unnecessary). Here's another way using data.table 's rbindlist (from v1.9.3) which you can get here .

require(data.table) ## 1.9.3
rbindlist(lisnotOK, fill=TRUE)
#     a b     c    d
# 1:  1 2    hi   NA
# 2: NA 2 hello nope

It works on list-of-lists (as in this question), data.frames and data.tables.

If not this, then I'd go with Ananda's list2mat function (if your types are all identical).


Benchmarks on Ananda's L2 data:

fun1 <- function(inList) ldply(inList, as.data.frame)
fun2 <- function(inList) list2mat(inList)
fun3 <- function(inList) rbindlist(inList, fill=TRUE)
fun4 <- function(inList) rbind_all(lapply(inList, as.data.frame))

microbenchmark(fun1(L2), fun2(L2), fun3(L2), fun4(L2), times = 10)
# Unit: milliseconds
#      expr         min          lq      median          uq         max neval
#  fun1(L2) 1927.857847 2161.432665 2221.999940 2276.241366 2366.649614    10
#  fun2(L2)   12.039652   12.167613   12.361629   12.483751   16.040885    10
#  fun3(L2)    1.225929    1.374395    1.473621    1.510876    1.858597    10
#  fun4(L2) 1435.153576 1457.053482 1492.334965 1548.547706 1630.443430    10

Note: I've used as.data.frame(.) instead of data.frame(.) (former is slightly faster).

Considering that you are OK with the resulting matrix being all of the same type (say, character ), you can try to write your own function, like this:

list2mat <- function(inList) {
  UL <- unlist(inList)
  Nam <- unique(names(UL))
  M <- matrix(NA_character_, 
              nrow = length(inList), ncol = length(Nam), 
              dimnames = list(NULL, Nam))
  Row <- rep(seq_along(inList), sapply(inList, length))
  Col <- match(names(UL), Nam)
  M[cbind(Row, Col)] <- UL
  M
}

Usage would be:

list2mat(lisnotOK)
#      a   b   c       d     
# [1,] "1" "2" "hi"    NA    
# [2,] NA  "2" "hello" "nope"

This should be pretty fast since everything is pre-allocated and you are making use of matrix indexing.


Update: Benchmarks (since you said efficiency was a concern)

fun1 <- function(inList) ldply(inList, data.frame)
fun2 <- function(inList) list2mat(inList)

library(microbenchmark)
microbenchmark(fun1(lisnotOK), fun2(lisnotOK))
# Unit: microseconds
#            expr      min        lq    median       uq      max neval
#  fun1(lisnotOK) 4193.808 4340.0585 4523.3000 4912.233 7600.341   100
#  fun2(lisnotOK)  163.784  182.3865  211.2515  236.910  363.489   100

L2 <- unlist(replicate(1000, lisnotOK, simplify=FALSE), recursive=FALSE)
microbenchmark(fun1(L2), fun2(L2), times = 10)
# Unit: milliseconds
#      expr        min         lq     median         uq        max neval
#  fun1(L2) 3032.71572 3106.79006 3196.17178 3306.11756 3609.67445    10
#  fun2(L2)   24.16817   24.86991   25.65569   27.44128   29.41908    10

Use lapply to convert your list elements to data.frame s and rbind_all that:

rbind_all(lapply(lisnotOK,data.frame))
   a b     c    d
1  1 2    hi <NA>
2 NA 2 hello nope
Warning message:
In rbind_all(lapply(lisnotOK, data.frame)) :
  Unequal factor levels: coercing to character

Or from plyr , ldply with data.frame :

ldply(lisnotOK,data.frame)
   a b     c    d
1  1 2    hi <NA>
2 NA 2 hello nope

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM