简体   繁体   中英

Summarise multiple variables to strings in dplyr

I wish to summarize two variables in string. Let's say this is my id

 #visit

id  source1    source2 
 1    a          t
 2    c          l
 3    c          z
 1    b          x

second dataset:

 #transaction
 id    transactions 

 1       1
 3       2
 1       2

I'd like to join these data together but convert them to string at the same time:

I can do for one variable ( let's say source 1):

library(dplyr) 
%>%  left_join(visit, transaction, by="id") 
%>%  group_by( id)
%>%  summarise( Source = toString(unique(source1)), transactions =    toString(unique(transactions)) )

This gives me the following output:

id     source       transactions
 1       a,b         1,2 
 2        c           NA
 3        c           2

But I wish to summarize for two variables: So my desire output would be something like that:

 id     source       transactions
 1       a,t > b,x   1,2 
 2       c,l         NA
 3       c,z         2

You can paste the two variables together, using both sep and collapse to combine:

visit %>% left_join(transaction) %>% 
    group_by(id) %>% 
    summarise(source = paste(unique(source1), unique(source2), sep = ', ', collapse = ' > '), 
              transaction = na_if(toString(unique(na.omit(transactions))), ''))

## # A tibble: 3 × 3
##      id      source transaction
##   <int>       <chr>       <chr>
## 1     1 a, t > b, x        1, 2
## 2     2        c, l        <NA>
## 3     3        c, z           2

Beware, though; paste and toString stupidly coerce NA s to strings. You may want to wrap in na.omit or use na_if .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM