简体   繁体   中英

Combine multiple columns into vector by row with dplyr

I am trying to combine multiple columns into a single cell for each row and then remove missing values.

Sample data:

df <- data.frame(a=c("a", "b", "c", "d"),
                 b=c(NA, "a", "b", "c"),
                 c=c("a", "b", "e", "g"))

Attempt:

df %>% rowwise() %>%
mutate(collapse=as.character(paste(a,b,c, collapse=",")),
       collapse_nona=na.omit(collapse))

Output:

# A tibble: 4 x 5
  a     b     c     collapse                collapse_nona         
* <fct> <fct> <fct> <chr>                   <chr>                 
1 a     NA    a     a NA a,b a b,c b e,d c… a NA a,b a b,c b e,d …
2 b     a     b     a NA a,b a b,c b e,d c… a NA a,b a b,c b e,d …
3 c     b     e     a NA a,b a b,c b e,d c… a NA a,b a b,c b e,d …
4 d     c     g     a NA a,b a b,c b e,d c… a NA a,b a b,c b e,d …

1) I am not successfully creating cells with values for each row (the whole column appears in collapse).

2) Cells in the collapse column do not behave like a vector.

Desired output

  a     b     c     collapse                collapse_nona         
* <fct> <fct> <fct> <chr>                   <chr>                 
1 a     NA    a     a NA a                  a a
2 b     a     b     b a b                   b a b
3 c     b     e     c b e                   c b e
4 d     c     g     d c g                   d c g

Thank you

With unite , there is an option for na.rm and it is by default FALSE

library(tidyr)
library(dplyr)
df %>%
   mutate_all(as.character) %>%
   unite(collapse, a, b,c,  remove = FALSE, sep=" ") %>%
   unite(collapse_nona, a, b, c, remove = FALSE, sep=" ", na.rm = TRUE) %>%
   select(names(df), everything())
#   a    b c collapse collapse_nona
#1 a <NA> a   a NA a           a a
#2 b    a b    b a b         b a b
#3 c    b e    c b e         c b e
#4 d    c g    d c g         d c g

Or with paste and str_remove_all (from stringr ) - Note that paste/str_c are vectorized, so there is no need to loop over each row with rowwise

df %>%
     mutate(collapse = paste(a, b, c), 
            collapse_nona = str_remove_all(collapse,  "\\sNA|NA\\s"))
#  a    b c collapse collapse_nona
#1 a <NA> a   a NA a           a a
#2 b    a b    b a b         b a b
#3 c    b e    c b e         c b e
#4 d    c g    d c g         d c g

Another option is pmap to loop over each row, remove the NA elements with na.omit and then paste or str_c (from stringr )

library(dplyr)
library(stringr)
library(purrr)
df %>%
     mutate_all(as.character) %>% 
     mutate(collapse_nona = pmap_chr(., ~ c(...) %>%
                na.omit %>%
                str_c(collapse=" "))) 
#  a    b c collapse_nona
#1 a <NA> a           a a
#2 b    a b         b a b
#3 c    b e         c b e
#4 d    c g         d c g

The think the core issue is that you don't want collapse , you want sep . Then rowwise calculation is unnecessary. Also, NA will get printed as character, so you cannot remove them with na.omit

df %>% 
   mutate(collapse = paste(a,b,c, sep = " "), collapse_nona = gsub("NA", "", collapse))

  a    b c collapse collapse_nona
1 a <NA> a   a NA a          a  a
2 b    a b    b a b         b a b
3 c    b e    c b e         c b e
4 d    c g    d c g         d c g

I think this does it. You could play around with the sep argument in str_c.

library(dplyr)
library(stringr)
df %>% 
  mutate(collapse = str_c(str_replace_na(a), str_replace_na(b), str_replace_na(c), sep = " "),
         collapse_nona = str_c(str_replace_na(a, ""), str_replace_na(b, ""), str_replace_na(c,""), sep = " "))

  a    b c collapse collapse_nona
1 a <NA> a   a NA a          a  a
2 b    a b    b a b         b a b
3 c    b e    c b e         c b e
4 d    c g    d c g         d c g

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM