简体   繁体   中英

Create a list of all values of a variable grouped by another variable in R

I have a data frame that contains two variables, like this:

df <- data.frame(group=c(1,1,1,2,2,3,3,4),
                  type=c("a","b","a", "b", "c", "c","b","a"))

> df
   group type
1      1    a
2      1    b
3      1    a
4      2    b
5      2    c
6      3    c
7      3    b
8      4    a

I want to produce a table showing for each group the combination of types it has in the data frame as one variable eg

  group alltypes
1     1     a, b
2     2     b, c
3     3     b, c
4     4        a

The output would always list the types in the same order (eg groups 2 and 3 get the same result) and there would be no repetition (eg group 1 is not "a, b, a").

I tried doing this using dplyr and summarize, but I can't work out how to get it to meet these two conditions - the code I tried was:

> df %>%
+   group_by(group) %>%
+   summarise(
+     alltypes = paste(type, collapse=", ")
+   )
# A tibble: 4 × 2
  group alltypes
  <dbl>    <chr>
1     1  a, b, a
2     2     b, c
3     3     c, b
4     4        a

I also tried turning type into a set of individual counts, but not sure if that's actually useful:

> df %>%
+   group_by(group, type) %>%
+   tally %>%
+   spread(type, n, fill=0)
Source: local data frame [4 x 4]
Groups: group [4]

  group     a     b     c
* <dbl> <dbl> <dbl> <dbl>
1     1     2     1     0
2     2     0     1     1
3     3     0     1     1
4     4     1     0     0

Any suggestions would be greatly appreciated.

I think you were very close. You could call the sort and unique functions to make sure your result adheres to your conditions as follows:

df %>% group_by(group) %>% 
summarize(type = paste(sort(unique(type)),collapse=", "))

returns:

# A tibble: 4 x 2
  group  type
  <int> <chr>
1     1  a, b
2     2  b, c
3     3  b, c
4     4     a

To expand on Florian's answer this could be extended to generating an ordered list based on values in your data set. An example could be determining the order of dates:

library(lubridate)
library(tidyverse)

# Generate random dates
set.seed(123)
Date = ymd("2018-01-01") + sort(sample(1:200, 10))
A = ymd("2018-01-01") + sort(sample(1:200, 10))
B = ymd("2018-01-01") + sort(sample(1:200, 10))
C = ymd("2018-01-01") + sort(sample(1:200, 10))

# Combine to data set
data = bind_cols(as.data.frame(Date), as.data.frame(A), as.data.frame(B), as.data.frame(C))

# Get order of dates for each row
data %>%
        mutate(D = Date) %>%
        gather(key = Var, value = D, -Date) %>%
        arrange(Date, D) %>%
        group_by(Date) %>%
        summarize(Ord = paste(Var, collapse=">"))

Somewhat tangential to the original question but hopefully helpful to someone.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM