[英]Consolidate lists that share common elements in r with their corresponding list IDs?
I have the following table of group IDs (GroupNum) with a list of values that they are associated with (NPI_list):我有下表的组 ID (GroupNum) 以及它们关联的值列表 (NPI_list):
df1 <- data.frame(GroupNum=c(41,224,1032,2754,3907,4107),
NPI_list=c('1740411552,1932387479','1710112156,1841438280',
'1629405113,1942433891','1629405113,1992083588',
'1710112156,1841438280','1740411552,1932387479'),
stringsAsFactors = F)
There are instances where there are common elements between list.在某些情况下,列表之间存在共同元素。 I need to consolidate groups with common values within each of their respective GroupNum IDs such that I get an end product similar to the following
我需要在各自的 GroupNum ID 中合并具有共同值的组,以便获得类似于以下内容的最终产品
df2 <- data.frame(GroupNum=c('41,4107','224,3907','1032,2754'),
NPI_list=c('1740411552,1932387479','1710112156,1841438280','1629405113,1992083588,1942433891'),
stringsAsFactors = F)
I have been told that there is a way to determine if there are common elements between list via python but I only have experience with R.有人告诉我,有一种方法可以通过 python 确定列表之间是否存在共同元素,但我只有 R 的经验。 I have tried a similar dplyr solution to that of Duck below but it still groups NPI_list and I need to be able to compare the individual elements within each list to that of all other list and combine the lists if there is a single match.
我已经尝试了与下面的 Duck 类似的 dplyr 解决方案,但它仍然对 NPI_list 进行分组,我需要能够将每个列表中的各个元素与所有其他列表中的元素进行比较,并在有单个匹配时合并列表。
Any advice would be helpful.任何意见将是有益的。 I am suspecting that I will need to use some sort of for loop.
我怀疑我需要使用某种 for 循环。
I think this can help you.我认为这可以帮助你。 You must have data you showed in a dataframe.
您必须拥有在 dataframe 中显示的数据。 And next time please include your data or a portion of it by using
dput(yourdata)
or dput(head(yourdata,20))
over your dataframe and paste the result in the question.下次请在
dput(yourdata)
或dput(head(yourdata,20))
包含您的数据或其中的一部分,并将结果粘贴到问题中。 It is more easy to help you with that.帮助您更容易。 Next a possible solution with similar data and using
dplyr
:接下来是具有相似数据并使用
dplyr
的可能解决方案:
library(dplyr)
#Data
df1 <- data.frame(GroupNum=c(41,224,1032,2754,3907,4107),
NPI_list=c('1740411552,1932387479','1710112156,1841438280',
'1639127913,1942433891','1629405113,1992083588',
'1710112156,1841438280','1740411552,1932387479'),
stringsAsFactors = F)
#Aggregate
df2 <- df1 %>% group_by(NPI_list) %>%
mutate(N=n(),id=cur_group_id(),id=ifelse(N==1,0,id)) %>%
ungroup() %>% group_by(id) %>%
summarise(GroupNum=paste0(GroupNum,collapse = ','),
NPI_list=paste0(unique(NPI_list),collapse = ',')) %>% ungroup() %>% select(-id)
The output will be: output 将是:
# A tibble: 3 x 2
GroupNum NPI_list
<chr> <chr>
1 1032,2754 1639127913,1942433891,1629405113,1992083588
2 224,3907 1710112156,1841438280
3 41,4107 1740411552,1932387479
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.