简体   繁体   English

data.frame的命名列表列表

[英]list of named lists to data.frame

I have a list of named lists of the following form from a JSON object: 我有一个JSON对象的以下表单的命名列表列表:

my_list = list(list(a = 10, b = "blah"), 
               list(a = 15, b = "stuff"))

Each element of the outer list is a named list and I want to convert it to a data.frame of the following form with the column names intact: 外部列表的每个元素都是一个命名列表,我想将它转换为以下形式的data.frame,列名完整:

a   b 
10  "blah" 
15  "stuff"

On the surface, I can achieve this by doing to_df = data.frame(do.call(rbind, my_list)) . 从表面上看,我可以通过执行to_df = data.frame(do.call(rbind, my_list))来实现这一点。

However, if I were to try to extract an individual column using to_df$a or to_df[,1] I would get a list instead of a vector as normally expected from a data.frame: 但是,如果我尝试使用to_df$ato_df[,1]来提取单个列to_df[,1]我会得到一个列表而不是像data.frame通常所期望的那样的向量:

> to_df[,1]
[[1]]
[1] 10

[[2]]
[1] 15

Instead of: 代替:

> to_df[,1]
[1] 10 15

An old post on the R mailing list suggested the following solution: to_df = as.data.frame(t(sapply(my_list, rbind))) . R邮件列表上的旧帖子提出了以下解决方案: to_df = as.data.frame(t(sapply(my_list, rbind))) But not only does this not transfer over the column names, it still has the same issue of returning a list instead of a vector when looking at individual columns using to_df[,1] . 但是这不仅不会转移到列名,它仍然具有使用to_df[,1]查看单个列时返回列表而不是向量的相同问题。

What's the best way to achieve this? 实现这一目标的最佳方法是什么? Is there a dplyr way? 有一种dplyr方式吗?

EDIT: Thanks for all the solutions, it appears the trick is to lapply and transform each element of the list to a data.frame and then bind them together using dplyr or do.call . 编辑:感谢所有的解决方案,似乎诀窍是将列表的每个元素lapply并转换为data.frame ,然后使用dplyr或do.call将它们绑定在一起。 Alternatively, data.table does most of the work with a single call to rbindlist . 另外, data.table也最具有单一通话的工作,以rbindlist

I prefer rbindlist from the data.table package. 我更喜欢rbindlistdata.table包。 It's simple, fast, and returns a data frame/table. 它简单,快速,并返回数据框/表。

data.table::rbindlist(my_list)
#     a     b
# 1: 10  blah
# 2: 15 stuff

Another advantage of rbindlist() is that it will automatically fill in missing values with NA . rbindlist()另一个优点是它会自动用NA填充缺失值。

To remove the data.table class, you can just wrap in as.data.frame() 要删除data.table类,只需包装as.data.frame()

as.data.frame(data.table::rbindlist(my_list))

It looks like you can do this with bind_rows from the development version of dplyr , dplyr_0.4.2.9002, as of two days ago. 看起来您可以使用bind_rows的开发版本dplyr_0.4.2.9002中的bind_rows执行此操作,截至两天前。

library(dplyr)
bind_rows(my_list)

Source: local data frame [2 x 2]

   a     b
1 10  blah
2 15 stuff

在基地R你可能会这样做

df<-do.call(rbind,lapply(my_list,data.frame))

Fast pure base R way to do it if the columns are of different types and you want to preserve the types 如果列具有不同类型并且您希望保留类型,则使用快速纯碱base R方式

# sample data
set.seed(46823239)
list_of_lists <- 
  replicate(
    100, list(a = rnorm(100), b = sample.int(100, 100, replace = TRUE), 
              c = factor(sample(letters, 100, replace = TRUE))), 
    simplify = FALSE)
str( # show first two lists
  list_of_lists[1:2])
#R> List of 2
#R>  $ :List of 3
#R>   ..$ a: num [1:100] -0.0439 -0.4487 -0.5682 -0.8062 1.5074 ...
#R>   ..$ b: int [1:100] 59 91 63 87 61 72 92 77 62 41 ...
#R>   ..$ c: Factor w/ 26 levels "a","b","c","d",..: 4 16 5 14 25 17 25 4 4 20 ...
#R>  $ :List of 3
#R>   ..$ a: num [1:100] 0.356 1.239 -0.926 -0.673 -1.168 ...
#R>   ..$ b: int [1:100] 62 21 90 20 41 99 57 6 83 22 ...
#R>   ..$ c: Factor w/ 26 levels "a","b","c","d",..: 15 16 17 6 3 13 21 16 3 11 ...

# define functions to stack
f1 <- function(x){
  . <- function(...){
    args <- list(...)
    if(is.factor(args[[1]]))
      # see https://stackoverflow.com/a/3449403/5861244
      return(factor(do.call(c, lapply(args, as.character))))

    do.call(c, args)
  }

  out <- NULL
  for(i in 1:length(x[[1]]))
    out <- c(out, list(do.call(., lapply(x, "[[", i))))

  out <- data.frame(out)
  names(out) <- names(x[[1]])
  out
}

f2 <- function(x)
  # simple alternative from http://r.789695.n4.nabble.com/Convert-list-of-lists-lt-gt-data-frame-td860048.html
  do.call(rbind, lapply(x, data.frame))

# show output
all.equal( # yields the same
  f1(list_of_lists), f2(list_of_lists))
#R> [1] TRUE
all.equal(
  f1(list_of_lists), data.table::rbindlist(list_of_lists), 
  check.attributes = FALSE)
#R> [1] TRUE
out <- f1(list_of_lists)
head(out, 5)
#R>             a  b c
#R> 1 -0.04391595 59 d
#R> 2 -0.44866652 91 p
#R> 3 -0.56815817 63 e
#R> 4 -0.80622044 87 n
#R> 5  1.50736514 61 y
sapply(out, class)
#R>         a         b         c 
#R> "numeric" "integer"  "factor"

# benchmark
microbenchmark::microbenchmark(
  f1(list_of_lists), f2(list_of_lists), data.table::rbindlist(list_of_lists))
#R> Unit: microseconds
#R>                                  expr       min         lq      mean     median        uq        max neval
#R>                     f1(list_of_lists)  1259.850  1426.3685  1633.127  1531.0590  1643.257   7086.211   100
#R>                     f2(list_of_lists) 31348.099 34293.8720 61224.476 37003.7930 92775.162 153318.869   100
#R>  data.table::rbindlist(list_of_lists)   652.246   786.7645  1040.994   872.6905  1022.221   4063.994   100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM