简体   繁体   English

创建一个由R中另一个变量分组的变量的所有值的列表

[英]Create a list of all values of a variable grouped by another variable in R

I have a data frame that contains two variables, like this: 我有一个包含两个变量的数据框,如下所示:

df <- data.frame(group=c(1,1,1,2,2,3,3,4),
                  type=c("a","b","a", "b", "c", "c","b","a"))

> df
   group type
1      1    a
2      1    b
3      1    a
4      2    b
5      2    c
6      3    c
7      3    b
8      4    a

I want to produce a table showing for each group the combination of types it has in the data frame as one variable eg 我想生成一个表格,显示每个组在数据框中具有的类型组合作为一个变量,例如

  group alltypes
1     1     a, b
2     2     b, c
3     3     b, c
4     4        a

The output would always list the types in the same order (eg groups 2 and 3 get the same result) and there would be no repetition (eg group 1 is not "a, b, a"). 输出将始终以相同的顺序列出类型(例如,组2和3得到相同的结果)并且不会重复(例如,组1不是“a,b,a”)。

I tried doing this using dplyr and summarize, but I can't work out how to get it to meet these two conditions - the code I tried was: 我尝试使用dplyr并总结,但我无法弄清楚如何让它满足这两个条件 - 我尝试的代码是:

> df %>%
+   group_by(group) %>%
+   summarise(
+     alltypes = paste(type, collapse=", ")
+   )
# A tibble: 4 × 2
  group alltypes
  <dbl>    <chr>
1     1  a, b, a
2     2     b, c
3     3     c, b
4     4        a

I also tried turning type into a set of individual counts, but not sure if that's actually useful: 我也尝试将类型转换为一组单独的计数,但不确定它是否真的有用:

> df %>%
+   group_by(group, type) %>%
+   tally %>%
+   spread(type, n, fill=0)
Source: local data frame [4 x 4]
Groups: group [4]

  group     a     b     c
* <dbl> <dbl> <dbl> <dbl>
1     1     2     1     0
2     2     0     1     1
3     3     0     1     1
4     4     1     0     0

Any suggestions would be greatly appreciated. 任何建议将不胜感激。

I think you were very close. 我觉得你很亲密。 You could call the sort and unique functions to make sure your result adheres to your conditions as follows: 您可以调用sortunique函数,以确保您的结果符合您的条件,如下所示:

df %>% group_by(group) %>% 
summarize(type = paste(sort(unique(type)),collapse=", "))

returns: 收益:

# A tibble: 4 x 2
  group  type
  <int> <chr>
1     1  a, b
2     2  b, c
3     3  b, c
4     4     a

To expand on Florian's answer this could be extended to generating an ordered list based on values in your data set. 为了扩展Florian的答案,可以扩展为根据数据集中的值生成有序列表。 An example could be determining the order of dates: 一个例子可能是确定日期的顺序:

library(lubridate)
library(tidyverse)

# Generate random dates
set.seed(123)
Date = ymd("2018-01-01") + sort(sample(1:200, 10))
A = ymd("2018-01-01") + sort(sample(1:200, 10))
B = ymd("2018-01-01") + sort(sample(1:200, 10))
C = ymd("2018-01-01") + sort(sample(1:200, 10))

# Combine to data set
data = bind_cols(as.data.frame(Date), as.data.frame(A), as.data.frame(B), as.data.frame(C))

# Get order of dates for each row
data %>%
        mutate(D = Date) %>%
        gather(key = Var, value = D, -Date) %>%
        arrange(Date, D) %>%
        group_by(Date) %>%
        summarize(Ord = paste(Var, collapse=">"))

Somewhat tangential to the original question but hopefully helpful to someone. 与原始问题有些相似但希望对某人有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM