简体   繁体   English

第1组的唯一值,然后是第1组和第2组,依此类推

[英]Unique values for the 1st group then the 1st and 2nd and so on

I have a dataframe with 5 different groups : 我有一个包含5个不同组的数据框:

   id group
1  L1     1
2  L2     1
3  L1     2
4  L3     2
5  L4     2
6  L3     3
7  L5     3
8  L6     3
9  L1     4
10 L4     4
11 L2     5

I would like to know if it's possible to get the unique id from the 1st group, the 1st and the 2nd, the 1st, 2nd and 3rd and so on without for looping. 我想知道是否可以从第一组,第一组和第二组,第一,第二和第三组获得唯一id ,依此类推而不进行循环。 I'm searching a way with dplyr or data.table package. 我正在用dplyrdata.table包搜索。

Expected results : 预期成绩 :

    group      id
1   1          c("L1", "L2")
2   1,2        c("L1", "L2", "L3", "L4")
3   1,2,3      c("L1", "L2", "L3", "L4", "L5")
4   1,2,3,4    c("L1", "L2", "L3", "L4", "L5")
5   1,2,3,4,5  c("L1", "L2", "L3", "L4", "L5")  

Data : 数据:

structure(list(id = c("L1", "L2", "L1", "L3", "L4", "L3", "L5", 
"L6", "L1", "L4", "L2"), group = structure(c(1L, 1L, 2L, 2L, 
2L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("1", "2", "3", "4", "5"
), class = "factor")), .Names = c("id", "group"), row.names = c(NA, 
-11L), class = "data.frame")

With base R, you can do: 使用基数R,您可以:

# create the "growing" sets of groups
combi_groups <- lapply(seq_along(unique(df$group)), function(i) unique(df$group)[1:i])

# get the unique ID for each set of groups
uniq_ID <- setNames(lapply(combi_groups, function(x) unique(df$id[df$group %in% x])), 
                    sapply(combi_groups, paste, collapse=","))

# $`1`
# [1] "L1" "L2"

# $`1,2`
# [1] "L1" "L2" "L3" "L4"

# $`1,2,3`
# [1] "L1" "L2" "L3" "L4" "L5" "L6"

# $`1,2,3,4`
# [1] "L1" "L2" "L3" "L4" "L5" "L6"

# $`1,2,3,4,5`
# [1] "L1" "L2" "L3" "L4" "L5" "L6" 

If you want to format as in your expected output: 如果要按预期输出格式化:

data.frame(group=sapply(combi_groups, paste, collapse=", "), id=sapply(uniq_ID, function(x) paste0("c(", paste0("\"", x, "\"", collapse=", "), ")")))
#          group                                    id
#1             1                         c("L1", "L2")
#2          1, 2             c("L1", "L2", "L3", "L4")
#3       1, 2, 3 c("L1", "L2", "L3", "L4", "L5", "L6")
#4    1, 2, 3, 4 c("L1", "L2", "L3", "L4", "L5", "L6")
#5 1, 2, 3, 4, 5 c("L1", "L2", "L3", "L4", "L5", "L6")

Another possibility of formatting: 格式化的另一种可能性

data.frame(group=rep(names(uniq_ID), sapply(uniq_ID, length)), id=unlist(uniq_ID))

Or, if you want to have uniq_ID in a column: 或者,如果您想在列中包含uniq_ID

library(data.table)
data.table(group=sapply(combi_groups, paste, collapse=", "), id=uniq_ID)
#           group                id
#1:             1             L1,L2
#2:          1, 2       L1,L2,L3,L4
#3:       1, 2, 3 L1,L2,L3,L4,L5,L6
#4:    1, 2, 3, 4 L1,L2,L3,L4,L5,L6
#5: 1, 2, 3, 4, 5 L1,L2,L3,L4,L5,L6

data.table(group=sapply(combi_groups, paste, collapse=", "), id=uniq_ID)[2, id]
[[1]]
[1] "L1" "L2" "L3" "L4"

In similar vein as the answer of @Cath, but using Reduce(..., accumulate = TRUE) to create the expanding window of groups. 与@Cath的答案类似,但使用Reduce(..., accumulate = TRUE)来创建组的扩展窗口。 Then loop over the set of groups with lapply to get the unique id's for each window: 然后使用lapply循环遍历组,以获取每个窗口的唯一ID:

grp <- Reduce(c, unique(d$group), accumulate = TRUE)

lapply(grp, function(x) unique(d$id[d$group %in% x]))
# [[1]]
# [1] "L1" "L2"
# 
# [[2]]
# [1] "L1" "L2" "L3" "L4"
# 
# [[3]]
# [1] "L1" "L2" "L3" "L4" "L5" "L6"
# 
# [[4]]
# [1] "L1" "L2" "L3" "L4" "L5" "L6"
# 
# [[5]]
# [1] "L1" "L2" "L3" "L4" "L5" "L6"

For naming and prettification, please refer to the nice answer by @Cath. 有关命名和美化,请参阅@Cath的好答案。

Another method is to use split and Reduce to feed the groups to union with accumulate=TRUE: 另一种方法是使用splitReduce将组提供给unionunion使用accumulate = TRUE:

Reduce(union, split(df$id, df$group), accumulate=TRUE)
[[1]]
[1] "L1" "L2"

[[2]]
[1] "L1" "L2" "L3" "L4"

[[3]]
[1] "L1" "L2" "L3" "L4" "L5" "L6"

[[4]]
[1] "L1" "L2" "L3" "L4" "L5" "L6"

[[5]]
[1] "L1" "L2" "L3" "L4" "L5" "L6"

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据“ 3letters + 1st + 2nd + 4thdigit”对像“ 3letters + 4digits”这样的变量进行分组? - How to group a variable like '3letters+4digits' in terms of '3letters+ 1st + 2nd + 4thdigit'? 返回第一大和第二大数字 - Returning 1st Largest and 2nd Largest numbers r:2个子功能,第2个有效,但第一个不起作用 - r: 2 subset function, 2nd works but 1st don't 在 R 中嵌套 for 循环使得第二个索引 = 第一个索引 + 1 - Nested for loops in R such that index of the 2nd = index of the 1st + 1 匹配第一列R后从第二列绘制值 - draw values from 2nd column after matching 1st columns R 根据第二个 dataframe 中 2 列中的值向第一个 dataframe 中的某些列添加后缀 - Add suffix to some columns in 1st dataframe based on values in 2 columns in 2nd dataframe 合并 2 个 R 数据帧,保持来自第 2 个 dataframe 的匹配行和第 1 个不匹配的行 - Merge 2 R dataframes keeping matched rows from 2nd dataframe and unmatched from 1st 根据第1列中的字符串内容从第2列中提取 - Extract from 2nd column based on string content in 1st column 检索Igraph中所有节点的一级连接和二级连接 - Retrieve 1st Degree Connections and 2nd Degree Connections for All Nodes in Igraph 在 R 重叠空间多边形数据框 (spdf) 中并总结第一个 spdf 与第二个 spdf 重叠的特征数 - In R Overlap spatial polygons dataframe (spdf) and summarise number of features of 1st spdf overlapped by 2nd spdf
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM