将列表列拆分为多列

Question

I have a data table where the last column is a column of lists.我有一个数据表，其中最后一列是一列列表。 Below is how it looks:下面是它的外观：

Col1 | Col2 | ListCol
--------------------------
 na  |  na  | [obj1, obj2]
 na  |  na  | [obj1, obj2]
 na  |  na  | [obj1, obj2]

What I want is我想要的是

Col1 | Col2 | Col3  | Col4
--------------------------
 na  |  na  | obj1  | obj2
 na  |  na  | obj1  | obj2
 na  |  na  | obj1  | obj2

I know that all the lists have the same amount of elements.我知道所有列表都具有相同数量的元素。

Edit:编辑：

Every element in ListCol is a list with two elements. ListCol 中的每个元素都是一个包含两个元素的列表。

Answer 1

Here is one approach, using unnest and tidyr::spread ...这是一种方法，使用unnest和tidyr::spread ...

library(dplyr)
library(tidyr)

#example df
df <- tibble(a=c(1, 2, 3), b=list(c(2, 3), c(4, 5), c(6, 7)))

df %>% unnest(b) %>% 
       group_by(a) %>% 
       mutate(col=seq_along(a)) %>% #add a column indicator
       spread(key=col, value=b)

      a   `1`   `2`
  <dbl> <dbl> <dbl>
1    1.    2.    3.
2    2.    4.    5.
3    3.    6.    7.

Answer 2

Currently, the tidyverse answer would be:目前，tidyverse 的答案是：

library(dplyr)
library(tidyr)
data %>% unnest_wider(ListCol)

Answer 3

Comparison of two great answers两个很棒的答案的比较

There are two great one liner suggestions in this thread:该线程中有两个很棒的单行建议：

(1) `cbind(df[1], t(data.frame(df$b)))` (1) `cbind(df[1], t(data.frame(df$b)))`

This is from @Onyambu using base R .这是来自@Onyambu使用base R 。 To get to this answer one needs to know that a dataframe is a list and needs a bit of creativity.要得到这个答案，需要知道dataframe是一个列表，需要一点创造力。

(2) `df %>% unnest_wider(b)` (2) `df %>% unnest_wider(b)`

This is from @iago using tidyverse .这是来自@iago使用tidyverse 。 You need extra packages and to know all the nest verbs, but one can think that it is more readable.您需要额外的包并了解所有nest动词，但可以认为它更具可读性。

Now let's compare performance现在让我们比较性能

library(dplyr)
library(tidyr)
library(purrr)
library(microbenchmark)

N <- 100
df <- tibble(a = 1:N, b = map2(1:N, 1:N, c))

tidy_foo <- function() suppressMessages(df %>% unnest_wider(b))
base_foo <- function() cbind(df[1],t(data.frame(df$b))) %>% as_tibble # To be fair
  
microbenchmark(tidy_foo(), base_foo())

Unit: milliseconds
       expr      min        lq      mean    median       uq      max neval
 tidy_foo() 102.4388 108.27655 111.99571 109.39410 113.1377 194.2122   100
 base_foo()   4.5048   4.71365   5.41841   4.92275   5.2519  13.1042   100

Aouch!哎哟！

base R solution is 20 times faster. base R解决方案快 20 倍。

Answer 4

Here's an option with data.table and base::unlist .这是一个带有data.table和base::unlist的选项。

library(data.table)

DT <- data.table(a = list(1, 2, 3),
                                 b = list(list(1, 2),
                                              list(2, 1),
                                              list(1, 1)))

for (i in 1:nrow(DT)) {
  set(
    DT,
    i = i,
    j = c('b1', 'b2'),
    value = unlist(DT[i][['b']], recursive = FALSE)
  )
}
DT

This requires a for loop on every row... Not ideal and very anti- data.table .这需要在每一行上都有一个 for 循环......不理想并且非常反对data.table 。 I wonder if there's some way to avoid creating the list column in the first place...我想知道是否有某种方法可以避免首先创建列表列...

Answer 5

@Alec data.table offers tstrsplit function to split a column into multiple columns. @Alec data.table提供tstrsplit函数将一列拆分为多列。

DT = data.table(x=c("A/B", "A", "B"), y=1:3)
DT[]

#     x y
#1: A/B 1
#2:   A 2
#3:   B 3

DT[, c("c1") := tstrsplit(x, "/", fixed=TRUE, keep=1L)][] # keep only first

#     x y c1
#1: A/B 1  A
#2:   A 2  A
#3:   B 3  B

DT[, c("c1", "c2") := tstrsplit(x, "/", fixed=TRUE)][]

#     x y c1   c2
#1: A/B 1  A    B
#2:   A 2  A <NA>
#3:   B 3  B <NA>

Answer 6

I have a data table where the last column is a column of lists.我有一个数据表，其中最后一列是列表的一列。 Below is how it looks:下面是它的外观：

Col1 | Col2 | ListCol
--------------------------
 na  |  na  | [obj1, obj2]
 na  |  na  | [obj1, obj2]
 na  |  na  | [obj1, obj2]

What I want is我想要的是

Col1 | Col2 | Col3  | Col4
--------------------------
 na  |  na  | obj1  | obj2
 na  |  na  | obj1  | obj2
 na  |  na  | obj1  | obj2

I know that all the lists have the same amount of elements.我知道所有列表都有相同数量的元素。

Edit:编辑：

Every element in ListCol is a list with two elements. ListCol中的每个元素都是一个包含两个元素的列表。

将列表列拆分为多列

问题描述

5 个解决方案

解决方案1
9 已采纳 2018-06-15 19:36:24

解决方案2
9 2020-11-06 09:15:42

解决方案3
3 2020-11-06 09:58:29

Comparison of two great answers两个很棒的答案的比较

(1) `cbind(df[1], t(data.frame(df$b)))` (1) `cbind(df[1], t(data.frame(df$b)))`

(2) `df %>% unnest_wider(b)` (2) `df %>% unnest_wider(b)`

Now let's compare performance现在让我们比较性能

Aouch!哎哟！

解决方案4
1 2018-06-15 20:36:18

解决方案5
1 2021-03-18 16:22:57

解决方案6
0 2021-03-11 22:47:16

将列表列拆分为多列

问题描述

5 个解决方案

解决方案1 9 已采纳 2018-06-15 19:36:24

解决方案2 9 2020-11-06 09:15:42

解决方案3 3 2020-11-06 09:58:29

Comparison of two great answers两个很棒的答案的比较

(1) cbind(df[1], t(data.frame(df$b))) (1) cbind(df[1], t(data.frame(df$b)))

(2) df %>% unnest_wider(b) (2) df %>% unnest_wider(b)

Now let's compare performance现在让我们比较性能

Aouch!哎哟！

解决方案4 1 2018-06-15 20:36:18

解决方案5 1 2021-03-18 16:22:57

解决方案6 0 2021-03-11 22:47:16

解决方案1
9 已采纳 2018-06-15 19:36:24

解决方案2
9 2020-11-06 09:15:42

解决方案3
3 2020-11-06 09:58:29

(1) `cbind(df[1], t(data.frame(df$b)))` (1) `cbind(df[1], t(data.frame(df$b)))`

(2) `df %>% unnest_wider(b)` (2) `df %>% unnest_wider(b)`

解决方案4
1 2018-06-15 20:36:18

解决方案5
1 2021-03-18 16:22:57

解决方案6
0 2021-03-11 22:47:16