简体   繁体   中英

Unnest list and concatenate in R

I wish to unnest (flatten?) and concatenate strings (comma separated) of text within a tibble . Example data:

library(tidyverse)

tibble(person = c("Alice", "Bob", "Mary"), 
          score = list(c("Red", "Green", "Blue"), c("Orange", "Green", "Yellow"), "Blue"))

# A tibble: 3 x 2
  person score    
  <chr>  <list>   
1 Alice  <chr [3]>
2 Bob    <chr [3]>
3 Mary   <chr [1]>

Expected output:

tibble(person = c("Alice", "Bob", "Mary"),
       score = c("Red, Green, Blue", "Orange, Green, Yellow", "Blue" ))

# A tibble: 3 x 2
  person score                
  <chr>  <chr>                
1 Alice  Red, Green, Blue     
2 Bob    Orange, Green, Yellow
3 Mary   Blue   

I suspect there's a very neat tidyverse solution to this but I've been unable to find an answer after extensive searching; I suspect I'm using the wrong search terms (unnest/concatentate). A tidyverse solution would be preferred. Thank you.

You can do:

library(dplyr)
library(purrr)

df %>%
  mutate(score = map_chr(score, toString))

# A tibble: 3 x 2
  person score                
  <chr>  <chr>                
1 Alice  Red, Green, Blue     
2 Bob    Orange, Green, Yellow
3 Mary   Blue                

If you have multiple list columns you can do:

df <- tibble(person = c("Alice", "Bob", "Mary"), 
       score1 = list(c("Red", "Green", "Blue"), c("Orange", "Green", "Yellow"), "Blue"),
       score2 = rev(list(c("Red", "Green", "Blue"), c("Orange", "Green", "Yellow"), "Blue")))

df %>%
  mutate_if(is.list, ~ map_chr(.x, toString))

# A tibble: 3 x 3
  person score1                score2               
  <chr>  <chr>                 <chr>                
1 Alice  Red, Green, Blue      Blue                 
2 Bob    Orange, Green, Yellow Orange, Green, Yellow
3 Mary   Blue                  Red, Green, Blue     

A simple way would be to unnest the data in long format and collapse it by group.

library(dplyr)

df %>%
  tidyr::unnest(score) %>%
  group_by(person) %>%
  summarise(score = toString(score))

# person score                
#  <chr>  <chr>                
#1 Alice  Red, Green, Blue     
#2 Bob    Orange, Green, Yellow
#3 Mary   Blue        

Other option would be rowwise

df %>% rowwise() %>% mutate(score = toString(score))

Base R solution 1:

df$score <- sapply(df$score, toString)

Base R solution 2:

df$score <- unlist(lapply(df$score, paste, collapse = ", "))

Data:

df <- tibble(person = c("Alice", "Bob", "Mary"), 
       score = list(c("Red", "Green", "Blue"), c("Orange", "Green", "Yellow"), "Blue"))

这是(最近的)Tidyverse 的一种通用方法,它不会干扰分组:

df %>% mutate(across(where(is.list), ~ sapply(., toString)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM