简体   繁体   中英

Neatest way to build a data frame from a list of lists in R

I have a list of sub-lists that I wish to convert to a data frame (specifically as a tibble); for example:

myList <- list(
        list(var1=1,var2=2,var3=3,var4=4,var5=5,var6=6),
        list(var1=4,var2=5,var3=6,var4=7,var5=8,var6=9),
        list(var1=7,var2=8,var3=9,var4=1,var5=2,var6=3)
)

Using the following code, I can extract chosen variables to a tibble data frame

myDF <- tbl_df(cbind(
  var1 = lapply(myList, '[[', "var1"),
  var2 = lapply(myList, '[[', "var2"),
  var5 = lapply(myList, '[[', "var5"),
  var6 = lapply(myList, '[[', "var6")
))  

But it is quite verbose. Is there a more succinct way (perhaps using a purrr map function) that can pull chosen sub-elements out of each list and populate them into a row?

Further, if the sub-lists contain lists themselves, how best to extract elements of those lists; eg:

 myList <- list(
        list(var1=1,var2=2,var3=3,list4=list(varA="a",varB="b")),
        list(var1=4,var2=5,var3=6,list4=list(varA="c",varB="d")),
        list(var1=7,var2=8,var3=9,list4=list(varA="e",varB="f"))
)    

How could I get something like the following to work:

myDF <- tbl_df(cbind(
  var1 = lapply(myList, '[[', "var1"),
  var2 = lapply(myList, '[[', "var2"),
  var4 = lapply(myList, '[[', "list4$varA")
)) 

Where I want to extract a specific element from list 4, but using $ notation to drill down to the next level does not work?

Since data frames are just lists, if your list isnt nested more than once.

library(tidyverse)
myList %>%
  map(as.data.frame) %>%
  bind_rows() %>%
  select(var1, var2, var5, var6)

#    var1 var2 var5 var6
# 1    1    2    5    6
# 2    4    5    8    9
# 3    7    8    2    3

Or even the following, bind_rows() actually works on a list of lists.

myList %>%
  bind_rows() %>%
  select(var1, var2, var5, var6)

#    var1  var2  var5  var6
#    <dbl> <dbl> <dbl> <dbl>
# 1  1.00  2.00  5.00  6.00
# 2  4.00  5.00  8.00  9.00
# 3  7.00  8.00  2.00  3.00

However sometimes it may be the case where each list element has only some common elements and you want to select only those specifically

myList %>%
  map(as.data.frame) %>%
  map(~ select(.x, var1, var2, var5, var6)) %>%
  bind_rows()

#    var1 var2 var5 var6
# 1    1    2    5    6
# 2    4    5    8    9
# 3    7    8    2    3

For cases where the lists are nested more than once investigate using flatten() from purrr

myList2 <- list(
  list(var1=1,var2=2,var3=3,list4=list(varA="a",varB="b")),
  list(var1=4,var2=5,var3=6,list4=list(varA="c",varB="d")),
  list(var1=7,var2=8,var3=9,list4=list(varA="e",varB="f"))
)  

myList2 %>%
  map(flatten) %>%
  bind_rows()

#   var1  var2  var3 varA  varB 
#   <dbl> <dbl> <dbl> <chr> <chr>
# 1  1.00  2.00  3.00 a     b    
# 2  4.00  5.00  6.00 c     d    
# 3  7.00  8.00  9.00 e     f  

and apply select() as desired, the names will be the names of the respective elements. Be very careful with duplicate names in different elements as it will only take one.

There may be situations where the enframe() function from tibble is also useful.

For the first case, a possible base-R solution:

> data.frame(do.call(rbind, myList))[c("var1", "var2", "var5", "var5")]
var1 var2 var5 var6
1    1    2    5    6
2    4    5    8    9
3    7    8    2    3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM