[英]Unique values for the 1st group then the 1st and 2nd and so on
I have a dataframe with 5 different groups : 我有一个包含5个不同组的数据框:
id group
1 L1 1
2 L2 1
3 L1 2
4 L3 2
5 L4 2
6 L3 3
7 L5 3
8 L6 3
9 L1 4
10 L4 4
11 L2 5
I would like to know if it's possible to get the unique id
from the 1st group, the 1st and the 2nd, the 1st, 2nd and 3rd and so on without for looping. 我想知道是否可以从第一组,第一组和第二组,第一,第二和第三组获得唯一
id
,依此类推而不进行循环。 I'm searching a way with dplyr
or data.table
package. 我正在用
dplyr
或data.table
包搜索。
Expected results : 预期成绩 :
group id
1 1 c("L1", "L2")
2 1,2 c("L1", "L2", "L3", "L4")
3 1,2,3 c("L1", "L2", "L3", "L4", "L5")
4 1,2,3,4 c("L1", "L2", "L3", "L4", "L5")
5 1,2,3,4,5 c("L1", "L2", "L3", "L4", "L5")
Data : 数据:
structure(list(id = c("L1", "L2", "L1", "L3", "L4", "L3", "L5",
"L6", "L1", "L4", "L2"), group = structure(c(1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("1", "2", "3", "4", "5"
), class = "factor")), .Names = c("id", "group"), row.names = c(NA,
-11L), class = "data.frame")
With base R, you can do: 使用基数R,您可以:
# create the "growing" sets of groups
combi_groups <- lapply(seq_along(unique(df$group)), function(i) unique(df$group)[1:i])
# get the unique ID for each set of groups
uniq_ID <- setNames(lapply(combi_groups, function(x) unique(df$id[df$group %in% x])),
sapply(combi_groups, paste, collapse=","))
# $`1`
# [1] "L1" "L2"
# $`1,2`
# [1] "L1" "L2" "L3" "L4"
# $`1,2,3`
# [1] "L1" "L2" "L3" "L4" "L5" "L6"
# $`1,2,3,4`
# [1] "L1" "L2" "L3" "L4" "L5" "L6"
# $`1,2,3,4,5`
# [1] "L1" "L2" "L3" "L4" "L5" "L6"
If you want to format as in your expected output: 如果要按预期输出格式化:
data.frame(group=sapply(combi_groups, paste, collapse=", "), id=sapply(uniq_ID, function(x) paste0("c(", paste0("\"", x, "\"", collapse=", "), ")")))
# group id
#1 1 c("L1", "L2")
#2 1, 2 c("L1", "L2", "L3", "L4")
#3 1, 2, 3 c("L1", "L2", "L3", "L4", "L5", "L6")
#4 1, 2, 3, 4 c("L1", "L2", "L3", "L4", "L5", "L6")
#5 1, 2, 3, 4, 5 c("L1", "L2", "L3", "L4", "L5", "L6")
Another possibility of formatting: 格式化的另一种可能性
data.frame(group=rep(names(uniq_ID), sapply(uniq_ID, length)), id=unlist(uniq_ID))
Or, if you want to have uniq_ID
in a column: 或者,如果您想在列中包含
uniq_ID
:
library(data.table)
data.table(group=sapply(combi_groups, paste, collapse=", "), id=uniq_ID)
# group id
#1: 1 L1,L2
#2: 1, 2 L1,L2,L3,L4
#3: 1, 2, 3 L1,L2,L3,L4,L5,L6
#4: 1, 2, 3, 4 L1,L2,L3,L4,L5,L6
#5: 1, 2, 3, 4, 5 L1,L2,L3,L4,L5,L6
data.table(group=sapply(combi_groups, paste, collapse=", "), id=uniq_ID)[2, id]
[[1]]
[1] "L1" "L2" "L3" "L4"
In similar vein as the answer of @Cath, but using Reduce(..., accumulate = TRUE)
to create the expanding window of groups. 与@Cath的答案类似,但使用
Reduce(..., accumulate = TRUE)
来创建组的扩展窗口。 Then loop over the set of groups with lapply
to get the unique id's for each window: 然后使用
lapply
循环遍历组,以获取每个窗口的唯一ID:
grp <- Reduce(c, unique(d$group), accumulate = TRUE)
lapply(grp, function(x) unique(d$id[d$group %in% x]))
# [[1]]
# [1] "L1" "L2"
#
# [[2]]
# [1] "L1" "L2" "L3" "L4"
#
# [[3]]
# [1] "L1" "L2" "L3" "L4" "L5" "L6"
#
# [[4]]
# [1] "L1" "L2" "L3" "L4" "L5" "L6"
#
# [[5]]
# [1] "L1" "L2" "L3" "L4" "L5" "L6"
For naming and prettification, please refer to the nice answer by @Cath. 有关命名和美化,请参阅@Cath的好答案。
Another method is to use split
and Reduce
to feed the groups to union
with accumulate=TRUE: 另一种方法是使用
split
和Reduce
将组提供给union
, union
使用accumulate = TRUE:
Reduce(union, split(df$id, df$group), accumulate=TRUE)
[[1]]
[1] "L1" "L2"
[[2]]
[1] "L1" "L2" "L3" "L4"
[[3]]
[1] "L1" "L2" "L3" "L4" "L5" "L6"
[[4]]
[1] "L1" "L2" "L3" "L4" "L5" "L6"
[[5]]
[1] "L1" "L2" "L3" "L4" "L5" "L6"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.