[英]Modify, extract, and concatenate list sub-elements into a data.frame in R with tidyverse
I'm trying to find an elegant way to work with list structures in R. In particular, in this case, I'd like to extract sub-elements from a list, modify them based on their associated data in that list, and concatenate them into a data frame. 我试图找到一种在R中使用列表结构的优雅方法。特别是在这种情况下,我想从列表中提取子元素,然后根据该列表中的关联数据对其进行修改,然后进行连接他们变成一个数据帧。 Perhaps easier with an example:
举个例子也许更容易:
mystruct <- structure(list(dataset1 = structure(list(data1 = structure(list(
a = c(1, 2, 3), b = c(4, 5, 6)), .Names = c("a", "b"), row.names = c(NA,
-3L), class = "data.frame"), data2 = c("a", "b", "c", "d", "e"
)), .Names = c("data1", "data2")), dataset2 = structure(list(
data1 = structure(list(a = c(7, 8, 9), b = c(10, 11, 12)), .Names = c("a",
"b"), row.names = c(NA, -3L), class = "data.frame"), data2 = c("f",
"g", "h", "i", "j")), .Names = c("data1", "data2"))), .Names = c("dataset1",
"dataset2"))
I can concatenate data1 elements like this: 我可以像这样串联data1元素:
> mystruct %>% map_dfr(~.x$data1)
a b
1 1 4
2 2 5
3 3 6
4 7 10
5 8 11
6 9 12
But I would like to add a "dataset" column, which is populated by the name of the list element from whence the data was taken: 但是我想添加一个“数据集”列,该列由获取数据的列表元素的名称填充:
dataset a b
1 dataset1 1 4
2 dataset1 2 5
3 dataset1 3 6
4 dataset2 7 10
5 dataset2 8 11
6 dataset2 9 12
Is there a way to do this nicely with the tidyverse? 有没有一种方法可以很好地使用tidyverse? I'd also be open to data.table solutions.
我也愿意接受data.table解决方案。
Thanks, Allie 谢谢,艾莉
Provide an .id
parameter to map_df
, which will create a column giving the name of the list: 向
map_df
提供一个.id
参数,这将创建一个给出列表名称的列:
map_df(mystruct, 'data1', .id='dataset')
# dataset a b
#1 dataset1 1 4
#2 dataset1 2 5
#3 dataset1 3 6
#4 dataset2 7 10
#5 dataset2 8 11
#6 dataset2 9 12
Or map_dfr
should work as well: 或者
map_dfr
应该也可以工作:
map_dfr(mystruct, 'data1', .id='dataset')
map_dfr
has an .id
argument: map_dfr
具有.id
参数:
mystruct %>% map_dfr(~ .x$data1, .id = "id")
giving: 给予:
id a b
1 dataset1 1 4
2 dataset1 2 5
3 dataset1 3 6
4 dataset2 7 10
5 dataset2 8 11
6 dataset2 9 12
Restructure as a "tidy" table with list columns... 重组为带有列表列的“整洁”表...
library(data.table)
tabstruct = rbindlist(lapply(mystruct, lapply, list), id = TRUE)
# .id data1 data2
# 1: dataset1 <data.frame> a,b,c,d,e
# 2: dataset2 <data.frame> f,g,h,i,j
Then "unnest" data1: 然后“嵌套”数据1:
tabstruct[, rbindlist(setNames(data1, .id), id=TRUE)]
# .id a b
# 1: dataset1 1 4
# 2: dataset1 2 5
# 3: dataset1 3 6
# 4: dataset2 7 10
# 5: dataset2 8 11
# 6: dataset2 9 12
Or unnest data2: 或不必要的数据2:
tabstruct[, .(val = unlist(data2)), by=.id]
# .id val
# 1: dataset1 a
# 2: dataset1 b
# 3: dataset1 c
# 4: dataset1 d
# 5: dataset1 e
# 6: dataset2 f
# 7: dataset2 g
# 8: dataset2 h
# 9: dataset2 i
# 10: dataset2 j
Here is an option to do this on multiple datasets in the list
这是对
list
多个数据集执行此操作的选项
map(c('data1', 'data2'), ~
map2_df(mystruct, .x, ~ .x[[.y]], .id = 'id'))
#[[1]]
# id a b
#1 dataset1 1 4
#2 dataset1 2 5
#3 dataset1 3 6
#4 dataset2 7 10
#5 dataset2 8 11
#6 dataset2 9 12
#[[2]]
# A tibble: 5 x 3
# id dataset1 dataset2
# <chr> <chr> <chr>
#1 1 a f
#2 1 b g
#3 1 c h
#4 1 d i
#5 1 e j
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.