简体   繁体   中英

Consolidate lists that share common elements in r with their corresponding list IDs?

I have the following table of group IDs (GroupNum) with a list of values that they are associated with (NPI_list):

df1 <- data.frame(GroupNum=c(41,224,1032,2754,3907,4107),
              NPI_list=c('1740411552,1932387479','1710112156,1841438280',
                         '1629405113,1942433891','1629405113,1992083588',
                         '1710112156,1841438280','1740411552,1932387479'),
              stringsAsFactors = F)

There are instances where there are common elements between list. I need to consolidate groups with common values within each of their respective GroupNum IDs such that I get an end product similar to the following

df2 <- data.frame(GroupNum=c('41,4107','224,3907','1032,2754'),
              NPI_list=c('1740411552,1932387479','1710112156,1841438280','1629405113,1992083588,1942433891'),
              stringsAsFactors = F)

I have been told that there is a way to determine if there are common elements between list via python but I only have experience with R. I have tried a similar dplyr solution to that of Duck below but it still groups NPI_list and I need to be able to compare the individual elements within each list to that of all other list and combine the lists if there is a single match.

Any advice would be helpful. I am suspecting that I will need to use some sort of for loop.

I think this can help you. You must have data you showed in a dataframe. And next time please include your data or a portion of it by using dput(yourdata) or dput(head(yourdata,20)) over your dataframe and paste the result in the question. It is more easy to help you with that. Next a possible solution with similar data and using dplyr :

library(dplyr)
#Data
df1 <- data.frame(GroupNum=c(41,224,1032,2754,3907,4107),
                  NPI_list=c('1740411552,1932387479','1710112156,1841438280',
                             '1639127913,1942433891','1629405113,1992083588',
                             '1710112156,1841438280','1740411552,1932387479'),
                  stringsAsFactors = F)
#Aggregate
df2 <- df1 %>% group_by(NPI_list) %>% 
  mutate(N=n(),id=cur_group_id(),id=ifelse(N==1,0,id)) %>%
  ungroup() %>% group_by(id) %>%
  summarise(GroupNum=paste0(GroupNum,collapse = ','),
            NPI_list=paste0(unique(NPI_list),collapse = ',')) %>% ungroup() %>% select(-id)

The output will be:

# A tibble: 3 x 2
  GroupNum  NPI_list                                   
  <chr>     <chr>                                      
1 1032,2754 1639127913,1942433891,1629405113,1992083588
2 224,3907  1710112156,1841438280                      
3 41,4107   1740411552,1932387479  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM