简体   繁体   中英

Stack nested lists of dataframes with purrr

I have a list of lists of dataframes, something like this:

I just edited to change the data so that the length of the list and nested list are not equal.

test <- list(list(cars1 = head(mtcars), iris1 = head(iris)),
             list(cars2 = tail(mtcars), iris2 = tail(iris)),
             list(cars3 = tail(mtcars), iris3 = tail(iris)))

This gives me what I want with a mix of lapply() and purrr .

lapply(1:2, function(x) purrr::map_dfr(test, ~ .[[x]]))

Is there a way to do this more efficiently in a single line just in purrr ? This seems like a fairly common task.

Here is an option with purrr

library(dplyr)
library(stringr)
library(purrr)
test %>%
   flatten %>%
   split(str_remove(names(.), '\\d+')) %>%
   map(bind_rows)
#$cars
#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#7  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#8  30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#9  15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#10 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#11 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#12 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
#13 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#14 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#15 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#16 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#17 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#18 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

#$iris
#   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1           5.1         3.5          1.4         0.2    setosa
#2           4.9         3.0          1.4         0.2    setosa
#3           4.7         3.2          1.3         0.2    setosa
#4           4.6         3.1          1.5         0.2    setosa
#5           5.0         3.6          1.4         0.2    setosa
#6           5.4         3.9          1.7         0.4    setosa
#7           6.7         3.3          5.7         2.5 virginica
#8           6.7         3.0          5.2         2.3 virginica
#9           6.3         2.5          5.0         1.9 virginica
#10          6.5         3.0          5.2         2.0 virginica
#11          6.2         3.4          5.4         2.3 virginica
#12          5.9         3.0          5.1         1.8 virginica
#13          6.7         3.3          5.7         2.5 virginica
#14          6.7         3.0          5.2         2.3 virginica
#15          6.3         2.5          5.0         1.9 virginica
#16          6.5         3.0          5.2         2.0 virginica
#17          6.2         3.4          5.4         2.3 virginica
#18          5.9         3.0          5.1         1.8 virginica

Or another option is

map_dfr(test, enframe) %>%
     group_split(name = str_remove(name, "\\d+")) %>%
     map( ~ unnest(.x, value))

You could try a base R solution.

sapply(seq(el(lengths(test))), function(x) do.call(rbind, lapply(test, `[[`, x)))
# [[1]]
# mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Porsche 914-2     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
# Lotus Europa      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
# Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
# Ferrari Dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
# Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
# Volvo 142E        21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
# Porsche 914-21    26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
# Lotus Europa1     30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
# Ford Pantera L1   15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
# Ferrari Dino1     19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
# Maserati Bora1    15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
# Volvo 142E1       21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
# 
# [[2]]
# Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
# 1             5.1         3.5          1.4         0.2    setosa
# 2             4.9         3.0          1.4         0.2    setosa
# 3             4.7         3.2          1.3         0.2    setosa
# 4             4.6         3.1          1.5         0.2    setosa
# 5             5.0         3.6          1.4         0.2    setosa
# 6             5.4         3.9          1.7         0.4    setosa
# 145           6.7         3.3          5.7         2.5 virginica
# 146           6.7         3.0          5.2         2.3 virginica
# 147           6.3         2.5          5.0         1.9 virginica
# 148           6.5         3.0          5.2         2.0 virginica
# 149           6.2         3.4          5.4         2.3 virginica
# 150           5.9         3.0          5.1         1.8 virginica
# 1451          6.7         3.3          5.7         2.5 virginica
# 1461          6.7         3.0          5.2         2.3 virginica
# 1471          6.3         2.5          5.0         1.9 virginica
# 1481          6.5         3.0          5.2         2.0 virginica
# 1491          6.2         3.4          5.4         2.3 virginica
# 1501          5.9         3.0          5.1         1.8 virginica

It's slow though. Regarding performance, it's worth looking at data.table .

sapply(seq(el(lengths(test))), function(x) data.table::rbindlist(lapply(test, `[[`, x)))

Or---a little awkward, but fast:

Map(function(x) 
  data.table::rbindlist(unlist(test, recursive=F)[x]), list(c(1, 3, 5), c(1, 3, 5) + 1))

And here comes the microbenchmark:

library(dplyr)
library(stringr)
library(purrr)
microbenchmark::microbenchmark(
  OP=lapply(seq(el(lengths(test))), function(x) purrr::map_dfr(test, ~ .[[x]])),
  sapply=sapply(seq(el(lengths(test))), function(x) 
    do.call(rbind, lapply(test, `[[`, x))),
  stringr=test %>%
    flatten %>%
    split(str_remove(names(.), '\\d+')) %>%
    map(bind_rows),
  unlistDT=Map(function(x) do.call(rbind, unlist(test, recursive=F)[x]), list(c(1, 3, 5), c(1, 3, 5) + 1)),
  sapplyDT=sapply(seq(el(lengths(test))), function(x) 
    data.table::rbindlist(lapply(test, `[[`, x))),
  MapUnlistDT=Map(function(x) data.table::rbindlist(unlist(test, recursive=F)[x]), list(c(1, 3, 5), c(1, 3, 5) + 1))
)
# Unit: microseconds
#                expr      min        lq      mean    median        uq      max neval  cld
# OP          504.664  522.6505  557.3472  530.6880  542.0415 2328.392   100  b  
# sapply     1003.970 1022.8495 1083.9883 1038.2850 1061.5030 3638.017   100    d
# stringr     740.156  788.6325  812.7278  805.7265  824.3520 1164.452   100   c 
# unlistDT    997.591 1015.1950 1069.0347 1031.2690 1042.7505 3659.193   100    d
# sapplyDT    319.178  334.4860  455.9246  348.7740  361.4040 8678.784   100 ab  
# MapUnlistDT 285.244  305.5285  347.5572  321.0920  331.8080 2772.333   100 a   

This solution assumes at least two things. But both assumptions are fine given my original use.

  1. sub list items are all named the same
  2. list only has two levels

You can flip the list inside-out with transpose() and then map() to bind rows.

library(purrr)

test <- list(list(cars = head(mtcars), iris = head(iris)),
             list(cars = tail(mtcars), iris = tail(iris)),
             list(cars = tail(mtcars), iris = tail(iris)))

map(transpose(test), bind_rows)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM